Week 12 - Dataset Refinement and Error Correction in Autonomous Steering Control

December 16, 2025

Systematic analysis and correction of steering timing errors through strategic dataset balancing and augmentation

This week focused on diagnosing and resolving critical errors identified in the autonomous steering control system. Through a methodical approach of dataset analysis, targeted augmentation, and strategic rebalancing, significant improvements were achieved in the model's ability to accurately replicate expert driving behavior, particularly during turning maneuvers.

Expert driving (blue) vs autonomous driving (red) showing timing errors

Figure 1: Expert driving (blue) vs autonomous driving (red). The initial zero error and delayed start/early termination in autonomous control are evident.

1. Error Analysis and Problem Identification:
This week's work addressed specific errors visible in Figure 1, particularly two critical issues: (1) The initial autonomous steering value should be zero (matching the expert) since driving begins in the right lane moving forward; and (2) Errors in the start and stop timing of steering commands, resulting in shorter execution times for autonomous commands compared to their expert equivalents. Additionally, the peak value of autonomous commands was consistently lower than the expert's. In practice, this means the vehicle correctly detects when a turn should be made, but the steering command starts late, doesn't reach the necessary value, and ends before the turn is completed, ultimately causing the vehicle to leave the road.

2. Dataset Composition Analysis - Week 11 Baseline:
To address these issues, the dataset was analyzed along two key variables: vehicle deviation and steering angle. The original Week 11 dataset (figures 2, 3 and 4) composition revealed significant imbalances:

DATASET GENERAL INFORMATION (Week 11):

Total samples: 112,488 | Number of bins per variable: 7

DEVIATION STATISTICS:

Minimum: -89.989 | Maximum: 89.998 | Mean: 6.875 | Std Dev: 29.544

STEERING STATISTICS:

Minimum: -0.490 | Maximum: 0.779 | Mean: 0.018 | Std Dev: 0.117

Week 11 vehicle deviation dataset composition for 7 categories

Figure 2: Week 11 dataset composition for vehicle deviation across 7 categories.

Week 11 steering dataset composition for 7 categories

Figure 3: Week 11 dataset composition for steering across 7 categories.

The statistical analysis revealed clear dataset imbalance. During Week 11, automatic balancing was performed for steering data (Figure 4), but no balancing was applied to deviation data. This partial approach limited the model's ability to learn proper correction timing.

Week 11 steering dataset balancing

Figure 4: Week 11 steering dataset balancing results.

3. Systematic Workflow for Dataset Improvement:
To comprehensively address the imbalance issues and improve model performance, the following workflow was implemented:

Step 1 - General Sample Increase: Extended normal driving time in CARLA's Town01 and Town02 environments to capture more diverse driving conditions.

Step 2 - Focus on Right and Left Turns: Added specific short driving sequences for right and left turns, starting from positions or directions requiring corrective steering.

Steps 1 and 2 results are summarized on figures 5 and 6.

DATASET GENERAL INFORMATION (Enhanced):

Total samples: 197,372 | Number of bins per variable: 7

DEVIATION STATISTICS:

Minimum: -89.999 | Maximum: 89.998 | Mean: 5.537 | Std Dev: 32.981

STEERING STATISTICS:

Minimum: -0.689 | Maximum: 0.779 | Mean: 0.018 | Std Dev: 0.138

Enhanced vehicle deviation dataset composition from Steps 1 and 2

Figure 5: Enhanced vehicle deviation dataset composition resulting from Steps 1 and 2.

Enhanced steering dataset composition from Steps 1 and 2

Figure 6: Enhanced steering dataset composition resulting from Steps 1 and 2.

Step 3 - Category-Based Balancing: Applied strategic undersampling of overrepresented categories while oversampling underrepresented ones; obtained composition is shown on figure 7. Note: Preliminary analyses suggest SMOTE balancing may yield better results, but time constraints prevented implementation this week.

Final balanced steering dataset composition

Figure 7: Final balanced steering dataset composition after Step 3.

Step 4 - Training with Balanced Dataset: The model was trained using the balanced dataset with the following characteristics and with the training loss and validation lost behavior as figure 8.:

TRAINING INFORMATION:

• Duration: 3h 42m 41.9s (13,361.9 seconds)

• Architecture: MobileNetV2 + Custom Regression

• Total parameters: 2,880,257 | Trainable parameters: 656,385

• Batch size: 32 | Learning rate: 0.001

• Epochs completed: 63 of 200 planned | Early stopping: Yes (patience: 20 epochs)

TRAINING METRICS:

• Final training loss: 0.013404 (75.4% improvement from initial 0.054513)

• Minimum training loss: 0.013287 (epoch 54)

• Final validation loss: 0.007106 | Minimum validation loss: 0.007073 (epoch 60)

PREDICTION METRICS:

• MAE: 0.061401 | MSE: 0.006675 | RMSE: 0.081699 | R²: 0.898359

Training and validation loss curves

Figure 8: Training and validation loss curves during model training.

Step 5 - Results and Comparative Analysis: The impact of the enhanced dataset and balancing strategy is clearly visible in the comparative analysis of autonomous driving performance (figures 9 and 10).

Expert vs autonomous driving with Week 11 steering-only balanced dataset

Figure 9: Expert driving vs autonomous driving with Week 11 steering-only balanced dataset.

Expert vs autonomous driving with enhanced and balanced dataset

Figure 10: Expert driving vs autonomous driving with enhanced deviation and steering balanced dataset (current week) + critical sample augmentation.

The comprehensive dataset refinement has enabled the autonomous vehicle to correctly execute turns and lane returns (figure 11), demonstrating that with more representative and balanced samples, the trained model can infer steering commands much more accurately.

Autonomous driving demonstration

Figure 11: Autonomous driving demonstration showing improved turning and lane-keeping performance.

Conclusion: This week's work demonstrates the critical importance of dataset quality and balance in imitation learning for autonomous driving. By systematically addressing the timing and magnitude errors through targeted data augmentation and strategic balancing, significant improvements were achieved in the model's ability to replicate expert steering behavior. The results indicate that further improvements can be expected with continued dataset refinement and potentially more advanced balancing techniques like SMOTE in future iterations.