Week 18 – Dataset Rebalancing and Training with Refined Data Strategy
January 27, 2026
Redesigning the Carla dataset using a three‑method balancing approach and incorporating expert guidelines for smoother and more structured driving data.
Week 18 focused on redesigning the dataset for Carla with the goal of applying three different data balancing methods (work is still in progress):
- Downsampling of over‑represented classes.
- Augmentation of under‑represented classes.
- Weighting for under‑represented classes.
The dataset reconstruction also followed the recommendations from Carlos Velasquez:
- Smooth Driving: Steering (W) was limited to the range between -0.4 (150° left) and 0.4 (150° right).
- Target Distribution:
- 60% straight driving samples
- 30% smooth turns
- 10% sharp turns
- Avoid flattening the dataset.
The initial balanced dataset composition based on these recommendations is shown in Figure 1.
Figure 1: Initial balanced dataset composition based on the recommended structure.
Training Setup:
The dataset from Figure 1 was split as follows for training:
- Training: 56,530 images (80%)
- Validation: 14,132 images (20%)
Training with the dataset balanced using the three suggested methods (Figure 2) is currently ongoing and has been slower.
Figure 2: Dataset composition after applying the three balancing methods.
Training Results with the Dataset from Figure 1
Overall, offline and online results are similar to those previously obtained, with the following characteristics:
MobileNet Results
- The ego vehicle drives stably in lanes within dense environments (large buildings, houses).
- It becomes "confused" by sidewalks on the sides and tends to drive onto them, leading to frequent road exits.
- Right turns are initiated at the correct time, but the vehicle fails to stay in the right lane after completing the turn.
PilotNet Results
- The vehicle does not stay in the right lane but oscillates between the right and left lanes.
- It is less confused than MobileNet, so it does not leave the road and can drive for longer periods.
- Takes turns too quickly and does not complete them properly, causing lane departure.
Behavior Metrics
A report was submitted regarding broken links on the BehaviorMetrics website. The tool was launched for the first time, and further exploration of its pipeline and results is planned for the following week.
KEY PROGRESS THIS WEEK:
• Dataset redesigned following a three‑method balancing strategy and expert guidelines.
• Initial balanced dataset created and used for training MobileNet and PilotNet.
• Training with the three‑method balanced dataset is still in progress.
• BehaviorMetrics tool launched; initial pipeline exploration planned for next week.
• Both models show specific failure modes: MobileNet exits on sidewalks, PilotNet oscillates and misses turn completion.
Next Steps:
Complete the training with the three‑method balanced dataset and compare results with the initial balanced version. Implement BehaviorMetrics pipelines for quantitative evaluation. Analyze model weaknesses to design targeted data augmentation or architectural adjustments.
Conclusion:
Week 18 involved a significant restructuring of the dataset to improve balance and adhere to smoother driving guidelines. Initial training results reveal persistent challenges in lane‑keeping and turn completion. The upcoming week will focus on completing the balanced training, deploying BehaviorMetrics for structured evaluation, and iterating on the data strategy to address observed failures.