Week 18 – Dataset Rebalancing and Training with Refined Data Strategy

January 27, 2026

Redesigning the Carla dataset using a three‑method balancing approach and incorporating expert guidelines for smoother and more structured driving data.

Week 18 focused on redesigning the dataset for Carla with the goal of applying three different data balancing methods (work is still in progress):

  1. Downsampling of over‑represented classes.
  2. Augmentation of under‑represented classes.
  3. Weighting for under‑represented classes.

The dataset reconstruction also followed the recommendations from Carlos Velasquez:

  1. Smooth Driving: Steering (W) was limited to the range between -0.4 (150° left) and 0.4 (150° right).
  2. Target Distribution:
    • 60% straight driving samples
    • 30% smooth turns
    • 10% sharp turns
  3. Avoid flattening the dataset.

The initial balanced dataset composition based on these recommendations is shown in Figure 1.

Initial balanced dataset composition

Figure 1: Initial balanced dataset composition based on the recommended structure.

Training Setup:
The dataset from Figure 1 was split as follows for training:

Training with the dataset balanced using the three suggested methods (Figure 2) is currently ongoing and has been slower.

Dataset balanced via three methods

Figure 2: Dataset composition after applying the three balancing methods.

Training Results with the Dataset from Figure 1

Overall, offline and online results are similar to those previously obtained, with the following characteristics:

MobileNet Results

PilotNet Results

Behavior Metrics

A report was submitted regarding broken links on the BehaviorMetrics website. The tool was launched for the first time, and further exploration of its pipeline and results is planned for the following week.

KEY PROGRESS THIS WEEK:

• Dataset redesigned following a three‑method balancing strategy and expert guidelines.

• Initial balanced dataset created and used for training MobileNet and PilotNet.

• Training with the three‑method balanced dataset is still in progress.

• BehaviorMetrics tool launched; initial pipeline exploration planned for next week.

• Both models show specific failure modes: MobileNet exits on sidewalks, PilotNet oscillates and misses turn completion.

Next Steps:
Complete the training with the three‑method balanced dataset and compare results with the initial balanced version. Implement BehaviorMetrics pipelines for quantitative evaluation. Analyze model weaknesses to design targeted data augmentation or architectural adjustments.

Conclusion:
Week 18 involved a significant restructuring of the dataset to improve balance and adhere to smoother driving guidelines. Initial training results reveal persistent challenges in lane‑keeping and turn completion. The upcoming week will focus on completing the balanced training, deploying BehaviorMetrics for structured evaluation, and iterating on the data strategy to address observed failures.