Autonomous Driving Model Training Report

Report Date: March 27, 2026 | Week 26: Dataset Optimization & Generalized Navigation

1. Enhanced Dataset Construction Methodology

Building upon the analysis performed in Week 25, this week's work focused on implementing controlled dataset generation techniques to increase the representation of extreme steering values and improve model generalization. The modifications applied to the original data collection script (based on Jorge Rodríguez's framework) introduced two critical enhancements:

Following this methodology, four distinct datasets were constructed to evaluate the impact of town diversity and class balancing:

13,500
Town01 Only (Baseline)
15,000
Town01 Balanced (Weighted)
37,000
Towns 01,02,04,12 (Raw)
41,000
Towns 01,02,04,12 (Balanced)
Dataset distribution Town01 only
Figure 1: Distribution of steering values by category obtained for Town01 only (13,500 images). The baseline dataset shows a concentration around straight driving values.
Balanced distribution Town01
Figure 2: Balanced (weighted) distribution for Town01 only (15,000 images). Weighting provides greater representation across all steering categories.
Multi-town raw distribution
Figure 3: Raw distribution for combined dataset (Towns 01, 02, 04, 12) with 37,000 samples. Greater town diversity increases overall variability.
Balanced multi-town distribution
Figure 4: Balanced (weighted) distribution for combined dataset (Towns 01, 02, 04, 12) with 41,000 images. Uniform representation across steering categories is achieved.

2. Experimental Results & Performance Evaluation

The models trained on the four datasets were evaluated on previously unseen environments to assess generalization capabilities. The most significant outcomes are summarized below:

Key Achievements

Demo: Full Circuit Completion in Town07

Video 1: Successful generalization - the vehicle completes the entire rural circuit in Town07 (unseen during training). The model demonstrates robust navigation across varying road geometries.

Demo: Urban Navigation with 90° Turns (Town01)

Video 2: Urban navigation in Town01 showing successful execution of 90° left and right turns, with stable lane keeping throughout the route.
Performance Observations: In both successful scenarios (Town07 rural circuit and Town01 urban navigation), the ego-vehicle maintained overall stability. However, a consistent behavior was noted: the vehicle tends to deviate from the right lane and occasionally invades the left lane, though without leaving the drivable road surface. This indicates that while lateral control is functional, precise lane discipline remains an area for improvement.

3. Comparative Analysis: Dataset Strategies

The systematic comparison across the four datasets yielded critical insights into the effectiveness of balancing and multi-town training:

Dataset Configuration Sample Count Town07 Generalization 90° Turn Success Stability Rating
Town01 Only (Raw) 13,500 Poor (early failures) Partial (oscillations) Low
Town01 Balanced (Weighted) 15,000 Moderate (partial circuits) Moderate (inconsistent) Medium
Towns 01,02,04,12 (Raw) 37,000 Good (most segments) Good (reliable turns) Medium-High
Towns 01,02,04,12 (Balanced) 41,000 Full circuit completion Consistent success High (minimal oscillations)

Key findings: The combination of multi-town data diversity and category-based balancing (weighting extreme steering values) proved essential for achieving generalization to unseen environments. The balanced multi-town dataset delivered superior results despite using only one-quarter of the sample count previously required, demonstrating that data quality and distribution are more critical than raw quantity.

4. Technical Analysis: Tangent Preprocessing & Balancing Effects

Two methodological innovations were central to this week's improvements:

Controlled Dagger Triggering

The ability to manually trigger Dagger events via a steering wheel button enabled targeted collection of extreme steering samples. This approach proved more effective than random Dagger activation, as it allowed the operator to focus data collection on challenging scenarios (sharp curves, recovery maneuvers) that are underrepresented in normal driving logs.

Offline Tangent Transformation

The application of the tangent function to steering values serves to:

However, the complete effect of tangent preprocessing remains under investigation. Additional controlled experiments are required to isolate its contribution from the benefits of dataset balancing and multi-town diversity.

Category-Based Weighting Strategy

Balancing was achieved by assigning higher sampling weights to under-represented steering categories. This forced the model to learn appropriate responses across the entire steering range, preventing overfitting to straight-line driving scenarios that dominate natural driving logs.

5. Conclusions & Future Work

Summary of Achievements

  • Successful generalization to Town07: The model trained on balanced multi-town data completed the full rural circuit, representing a significant advancement in robustness.
  • Reliable 90° turn execution: Urban intersections in Town01 are now navigated consistently without leaving the drivable path.
  • Improved stability: Oscillations have been substantially reduced, yielding smoother trajectory tracking.
  • Efficient dataset utilization: Superior results achieved with only 41,000 samples—approximately one-quarter of previously used dataset sizes.

Remaining Challenges

  • Lane discipline: The ego-vehicle consistently abandons the right lane and invades the left lane during navigation, though without exiting the road. Eliminating lane departures while maintaining stability is the primary objective for the next development cycle.
  • Tangent preprocessing validation: Additional ablation studies are required to conclusively determine the contribution of tangent transformation independent of balancing effects.

Proposed Work for Week 27

  1. Lane discipline optimization: Implement targeted data collection focusing on lane-keeping behavior, potentially incorporating explicit lane boundary information as an auxiliary training signal.
  2. Ablation study on tangent preprocessing: Conduct controlled experiments with and without tangent transformation to quantify its specific impact on performance.
  3. Extended validation: Evaluate the model on additional unseen towns (Town05, Town10) to further assess generalization boundaries.
  4. BehaviorMetrics integration: Implement quantitative evaluation metrics (lateral deviation, steering smoothness, turn success rate) to enable objective performance tracking.
  5. Fine-tuning on extreme maneuvers: Augment the dataset with additional recovery and sharp-turn scenarios to address remaining corner cases.
Long-term objectives:
  • Complete elimination of lane departures while maintaining the current level of stability and generalization.
  • Integration of velocity conditioning to improve speed adaptation across diverse road types.
  • Expansion of validation to include adverse weather conditions and nighttime scenarios.

Appendix: Dataset Specifications & Methodology Reference

The methodology implemented this week draws from established practices in end-to-end autonomous driving, with particular reference to the work of Jorge Rodríguez González. Key principles adopted include:

All datasets were generated using CARLA simulator version 9.14, with images captured at 1920x1080 resolution and downsampled to 640x480 for training. The model architecture remains consistent with previous weeks, employing a convolutional neural network with spatial softmax for steering prediction.

Dataset availability: The balanced multi-town dataset (41,000 samples) is available for further experimentation. Additional details on preprocessing parameters and training configurations can be found in the project repository.

Overall Assessment

Week 26 represents a breakthrough in generalization capability. The combination of controlled Dagger collection, tangent preprocessing, category-based balancing, and multi-town data diversity has yielded a model capable of completing previously impossible routes (Town07) while maintaining stability and executing complex maneuvers (90° turns). The reduction in required sample count (to 41,000 images) demonstrates that strategic dataset curation is more impactful than raw data volume.

The primary remaining challenge—lane discipline—is clearly defined and will be the focus of Week 27. With the current foundation of robust steering control and generalization, targeted improvements to lane-keeping behavior are expected to yield a fully competent autonomous driving agent.

Status: Dataset optimization phase complete. Model achieves generalization to Town07 and stable urban navigation. Next phase: lane discipline refinement and quantitative evaluation framework implementation.