Try Different Label Smoothing Values And More Epochs

I wrote an app to classify an object using transfer learning, and the model was trained on CIFAR-10 dataset, all by myself.

Label Smoothing Revisited: Strong Gains with Proper Tuning

What Changed

Initial experiments with label smoothing showed only marginal improvement. However, after increasing the smoothing factor from ε = 0.10 to ε = 0.15 and extending training from 20 to 60 epochs, validation accuracy improved substantially:

Before: 82.1%

==============================
Training model: resnet18
==============================
Epoch [01/20] Train Loss: 1.4383 | Val Loss: 2.1212 | Train Acc: 50.10% | Val Acc: 56.00%
Epoch [02/20] Train Loss: 0.4398 | Val Loss: 2.3636 | Train Acc: 85.80% | Val Acc: 59.00%
Epoch [03/20] Train Loss: 0.2316 | Val Loss: 1.4054 | Train Acc: 92.50% | Val Acc: 70.50%
Epoch [04/20] Train Loss: 0.1048 | Val Loss: 1.1404 | Train Acc: 97.50% | Val Acc: 73.50%
Epoch [05/20] Train Loss: 0.0683 | Val Loss: 0.9930 | Train Acc: 98.30% | Val Acc: 75.00%
Epoch [06/20] Train Loss: 0.0504 | Val Loss: 0.9246 | Train Acc: 99.00% | Val Acc: 76.00%
Epoch [07/20] Train Loss: 0.0265 | Val Loss: 0.8353 | Train Acc: 99.50% | Val Acc: 78.00%
Epoch [08/20] Train Loss: 0.0165 | Val Loss: 0.8222 | Train Acc: 99.80% | Val Acc: 78.10%
Epoch [09/20] Train Loss: 0.0178 | Val Loss: 0.8403 | Train Acc: 99.70% | Val Acc: 79.40%
Epoch [10/20] Train Loss: 0.0163 | Val Loss: 0.7646 | Train Acc: 99.90% | Val Acc: 80.70%
Epoch [11/20] Train Loss: 0.0093 | Val Loss: 0.7346 | Train Acc: 99.90% | Val Acc: 80.40%
Epoch [12/20] Train Loss: 0.0077 | Val Loss: 0.7318 | Train Acc: 100.00% | Val Acc: 80.50%
Epoch [13/20] Train Loss: 0.0044 | Val Loss: 0.7445 | Train Acc: 100.00% | Val Acc: 80.70%
Epoch [14/20] Train Loss: 0.0036 | Val Loss: 0.7777 | Train Acc: 100.00% | Val Acc: 79.90%
Epoch [15/20] Train Loss: 0.0104 | Val Loss: 0.7921 | Train Acc: 99.80% | Val Acc: 79.60%
Epoch [16/20] Train Loss: 0.0032 | Val Loss: 0.7568 | Train Acc: 100.00% | Val Acc: 80.00%
Epoch [17/20] Train Loss: 0.0031 | Val Loss: 0.7208 | Train Acc: 100.00% | Val Acc: 80.00%
Epoch [18/20] Train Loss: 0.0049 | Val Loss: 0.6775 | Train Acc: 99.90% | Val Acc: 80.80%
Epoch [19/20] Train Loss: 0.0033 | Val Loss: 0.6717 | Train Acc: 100.00% | Val Acc: 81.70%
Epoch [20/20] Train Loss: 0.0022 | Val Loss: 0.6731 | Train Acc: 100.00% | Val Acc: 82.10%

Best results for resnet18:
Train Loss: 0.0022 | Val Loss: 0.6731 | Train Acc: 100.00% | Val Acc: 82.10%
Training Time: 82.25 seconds

After: 84.7%

==============================
Training model: resnet18
==============================
Epoch [01/60] Train Loss: 1.7027 | Val Loss: 2.4931 | Train Acc: 51.40% | Val Acc: 61.30%
Epoch [02/60] Train Loss: 1.0935 | Val Loss: 2.0536 | Train Acc: 87.00% | Val Acc: 70.40%
Epoch [03/60] Train Loss: 0.9519 | Val Loss: 1.7454 | Train Acc: 93.60% | Val Acc: 66.40%
Epoch [04/60] Train Loss: 0.8665 | Val Loss: 1.4269 | Train Acc: 97.40% | Val Acc: 72.50%
Epoch [05/60] Train Loss: 0.8106 | Val Loss: 1.3331 | Train Acc: 99.00% | Val Acc: 75.80%
Epoch [06/60] Train Loss: 0.7870 | Val Loss: 1.2334 | Train Acc: 99.40% | Val Acc: 78.70%
Epoch [07/60] Train Loss: 0.7732 | Val Loss: 1.2510 | Train Acc: 100.00% | Val Acc: 76.80%
Epoch [08/60] Train Loss: 0.7549 | Val Loss: 1.1939 | Train Acc: 100.00% | Val Acc: 79.60%
Epoch [09/60] Train Loss: 0.7469 | Val Loss: 1.1935 | Train Acc: 100.00% | Val Acc: 79.80%
Epoch [10/60] Train Loss: 0.7382 | Val Loss: 1.1568 | Train Acc: 99.90% | Val Acc: 80.90%
Epoch [11/60] Train Loss: 0.7335 | Val Loss: 1.1533 | Train Acc: 100.00% | Val Acc: 81.00%
Epoch [12/60] Train Loss: 0.7321 | Val Loss: 1.1363 | Train Acc: 100.00% | Val Acc: 81.40%
Epoch [13/60] Train Loss: 0.7251 | Val Loss: 1.1408 | Train Acc: 99.90% | Val Acc: 81.20%
Epoch [14/60] Train Loss: 0.7214 | Val Loss: 1.1371 | Train Acc: 100.00% | Val Acc: 81.90%
Epoch [15/60] Train Loss: 0.7208 | Val Loss: 1.1296 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [16/60] Train Loss: 0.7210 | Val Loss: 1.1330 | Train Acc: 100.00% | Val Acc: 82.20%
Epoch [17/60] Train Loss: 0.7173 | Val Loss: 1.1425 | Train Acc: 100.00% | Val Acc: 81.20%
Epoch [18/60] Train Loss: 0.7185 | Val Loss: 1.1356 | Train Acc: 100.00% | Val Acc: 81.70%
Epoch [19/60] Train Loss: 0.7134 | Val Loss: 1.1291 | Train Acc: 100.00% | Val Acc: 82.10%
Epoch [20/60] Train Loss: 0.7133 | Val Loss: 1.1348 | Train Acc: 100.00% | Val Acc: 81.60%
Epoch [21/60] Train Loss: 0.7152 | Val Loss: 1.1386 | Train Acc: 100.00% | Val Acc: 81.60%
Epoch [22/60] Train Loss: 0.7149 | Val Loss: 1.1336 | Train Acc: 100.00% | Val Acc: 81.50%
Epoch [23/60] Train Loss: 0.7132 | Val Loss: 1.1316 | Train Acc: 100.00% | Val Acc: 81.80%
Epoch [24/60] Train Loss: 0.7122 | Val Loss: 1.1234 | Train Acc: 100.00% | Val Acc: 83.00%
Epoch [25/60] Train Loss: 0.7116 | Val Loss: 1.1272 | Train Acc: 100.00% | Val Acc: 81.90%
Epoch [26/60] Train Loss: 0.7095 | Val Loss: 1.1240 | Train Acc: 100.00% | Val Acc: 82.40%
Epoch [27/60] Train Loss: 0.7105 | Val Loss: 1.1233 | Train Acc: 100.00% | Val Acc: 82.50%
Epoch [28/60] Train Loss: 0.7110 | Val Loss: 1.1229 | Train Acc: 100.00% | Val Acc: 82.30%
Epoch [29/60] Train Loss: 0.7073 | Val Loss: 1.1219 | Train Acc: 100.00% | Val Acc: 83.20%
Epoch [30/60] Train Loss: 0.7080 | Val Loss: 1.1227 | Train Acc: 100.00% | Val Acc: 82.30%
Epoch [31/60] Train Loss: 0.7066 | Val Loss: 1.1214 | Train Acc: 100.00% | Val Acc: 82.50%
Epoch [32/60] Train Loss: 0.7067 | Val Loss: 1.1262 | Train Acc: 100.00% | Val Acc: 82.40%
Epoch [33/60] Train Loss: 0.7062 | Val Loss: 1.1200 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [34/60] Train Loss: 0.7069 | Val Loss: 1.1136 | Train Acc: 100.00% | Val Acc: 82.70%
Epoch [35/60] Train Loss: 0.7061 | Val Loss: 1.1134 | Train Acc: 100.00% | Val Acc: 83.10%
Epoch [36/60] Train Loss: 0.7067 | Val Loss: 1.1050 | Train Acc: 100.00% | Val Acc: 83.10%
Epoch [37/60] Train Loss: 0.7069 | Val Loss: 1.1143 | Train Acc: 100.00% | Val Acc: 83.50%
Epoch [38/60] Train Loss: 0.7088 | Val Loss: 1.1168 | Train Acc: 100.00% | Val Acc: 83.40%
Epoch [39/60] Train Loss: 0.7064 | Val Loss: 1.1266 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [40/60] Train Loss: 0.7067 | Val Loss: 1.1157 | Train Acc: 100.00% | Val Acc: 82.60%
Epoch [41/60] Train Loss: 0.7080 | Val Loss: 1.1194 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [42/60] Train Loss: 0.7052 | Val Loss: 1.1122 | Train Acc: 100.00% | Val Acc: 82.60%
Epoch [43/60] Train Loss: 0.7062 | Val Loss: 1.1072 | Train Acc: 100.00% | Val Acc: 84.30%
Epoch [44/60] Train Loss: 0.7045 | Val Loss: 1.1163 | Train Acc: 100.00% | Val Acc: 83.90%
Epoch [45/60] Train Loss: 0.7037 | Val Loss: 1.1094 | Train Acc: 100.00% | Val Acc: 84.70%
Epoch [46/60] Train Loss: 0.7048 | Val Loss: 1.1132 | Train Acc: 100.00% | Val Acc: 83.40%
Epoch [47/60] Train Loss: 0.7049 | Val Loss: 1.1107 | Train Acc: 100.00% | Val Acc: 81.60%
Epoch [48/60] Train Loss: 0.7034 | Val Loss: 1.1145 | Train Acc: 100.00% | Val Acc: 83.40%
Epoch [49/60] Train Loss: 0.7042 | Val Loss: 1.1192 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [50/60] Train Loss: 0.7040 | Val Loss: 1.1133 | Train Acc: 100.00% | Val Acc: 83.60%
Epoch [51/60] Train Loss: 0.7018 | Val Loss: 1.1110 | Train Acc: 100.00% | Val Acc: 83.70%
Epoch [52/60] Train Loss: 0.7017 | Val Loss: 1.1055 | Train Acc: 100.00% | Val Acc: 82.50%
Epoch [53/60] Train Loss: 0.7018 | Val Loss: 1.1126 | Train Acc: 100.00% | Val Acc: 83.00%
Epoch [54/60] Train Loss: 0.7011 | Val Loss: 1.1032 | Train Acc: 100.00% | Val Acc: 83.30%
Epoch [55/60] Train Loss: 0.7022 | Val Loss: 1.1077 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [56/60] Train Loss: 0.7022 | Val Loss: 1.1050 | Train Acc: 100.00% | Val Acc: 83.80%
Epoch [57/60] Train Loss: 0.7039 | Val Loss: 1.1075 | Train Acc: 100.00% | Val Acc: 84.10%
Epoch [58/60] Train Loss: 0.7024 | Val Loss: 1.1102 | Train Acc: 100.00% | Val Acc: 83.70%
Epoch [59/60] Train Loss: 0.7022 | Val Loss: 1.1027 | Train Acc: 100.00% | Val Acc: 82.90%
Epoch [60/60] Train Loss: 0.7022 | Val Loss: 1.1059 | Train Acc: 100.00% | Val Acc: 83.60%

Best results for resnet18:
Train Loss: 0.7037 | Val Loss: 1.1094 | Train Acc: 100.00% | Val Acc: 84.70%
Training Time: 241.71 seconds

This confirms that label smoothing can be highly effective when paired with sufficient training time and a well-chosen smoothing strength.

Interpreting the Training Dynamics

From the training log, several important patterns emerge:

First, training accuracy reaches ~100% as early as epoch 7, yet validation accuracy continues to improve steadily over many subsequent epochs. This is a classic indicator that the model is not merely memorizing labels, but is continuing to refine its decision boundaries under regularization pressure.

Second, despite perfect training accuracy, the training loss remains relatively high (~0.70) instead of collapsing toward zero. This behavior is expected—and desirable—when label smoothing is applied. Because the target distribution is no longer one-hot, the minimum achievable loss is strictly greater than zero. In other words, the model is prevented from becoming overconfident by design.

Third, validation loss decreases gradually and remains stable across epochs, indicating improved calibration and reduced variance. The absence of sharp validation loss spikes suggests that label smoothing is acting as an effective regularizer rather than destabilizing optimization.

Why Stronger Label Smoothing Helped

Increasing the smoothing factor to ε = 0.15 amplified several beneficial effects:

– It further penalized excessive logit separation, preventing the classifier head from dominating optimization
– It encouraged wider margins and smoother decision boundaries in feature space
– It reduced sensitivity to mislabeled or ambiguous samples, which are common in small or downsampled datasets

With longer training, these effects compound over time. Early epochs establish coarse alignment with the task, while later epochs—under label smoothing—refine class separation without collapsing into brittle, overconfident predictions.

Why This Improvement Is Not Just Noise

Unlike the earlier +0.3% gain, the jump from 82.1% to 84.7% is both larger in magnitude and supported by consistent behavior in the training curves. Validation accuracy repeatedly reaches the 84%+ range in later epochs, peaking at epoch 45, rather than appearing as a single outlier spike.

This strongly suggests that the performance gain is systematic and attributable to the combined effect of:
– Stronger label smoothing
– Extended training duration
– A stable transfer learning configuration (ResNet-18, partial unfreezing, data augmentation)

Key Takeaway

Label smoothing is not a plug-and-play trick—it interacts deeply with training duration and model capacity. When properly tuned and given sufficient epochs, it can significantly improve generalization performance, even after other regularization techniques are already in place.

In this project, label smoothing emerges as a high-impact regularizer, pushing ResNet-18 validation accuracy to a new best of 84.7%.

Any comments? Feel free to participate below in the Facebook comment section.
Post your comment below.
Anything is okay.
I am serious.