Autonomous Driving Model Training Report

Report Date: March 20, 2026 | Week 25: Testing New Ideas

1. Dataset Generation: Towns 01, 02, 04, 12 with Dagger & Noise Injection

This week’s primary focus was generating a robust dataset using Dagger (Dataset Aggregation) combined with Noise Injection. The objective is to later use Town 07 exclusively for validation testing. The dataset currently comprises images from Towns 01, 02, 04, and 12, covering straight driving, right/left turns, and recovery maneuvers. The integration of Noise Injection is currently pending completion.

As a result, we obtained the dataset composition shown in Figure 1, which includes samples from soft Dagger, strong Dagger, and preliminary noise injection structures across multiple CARLA towns.

Dataset composition with Dagger and Noise Injection

Figure 1: Composition of dataset with soft Dagger, strong Dagger, and Noise Injection from Town01, Town02, Town04, and Town12.

Key insight: The dataset is structured to expose the model to diverse driving behaviors before final validation on the unseen Town07 scenario.

2. Benchmark Analysis: Jorge Rodríguez González Approach

Inspired by the remarkable results from Jorge Rodríguez González’s work (reference), we performed a deep analysis of his dataset construction methodology. The main differentiating factors are summarized below:

Technique	Description	Expected Impact
Tangent Preprocessing	Steering angle transformed using *1.1 tan(steering)** before training	Filters low-amplitude noise & amplifies turning effect
Asynchronous Data Logging @ 60Hz	Script captures frames asynchronously at a requested 60Hz rate	Higher temporal resolution, better reactive behavior
Category-based Sampling (25 bins)	Steering range -1 to 1 split into 25 steps (0.08 increments). Each bin capped at 10,000 samples.	Balanced dataset prevents overfitting to common steering angles

                Implementation note: Due to technical constraints in executing the original script, we could not replicate the exact pipeline this week. Instead, we applied post-processing modifications to the previous week’s dataset: 
                Replaced steering labels with the scaled tangent value (1.1*tan(steering)).
Applied subsampling to approximate the category-based sample limits.

            

Results after Tangent Preprocessing & Subsampling

Improved vehicle stability: The agent exhibited smoother steering transitions and reduced oscillation during straight-line driving.
Mixed performance on curves: Gentle turns maintained similar behavior compared to the original dataset, but sharper curves remained problematic, likely due to the reduced effective sample size after subsampling.

3. Technical Insights: Asynchronous Simulation & Future Implications

The impact of running CARLA asynchronously at 60Hz is not yet fully characterized. For pure dataset generation without speed-based sampling, the reactive nature of the model may mask potential issues. However, when vehicle velocity is incorporated into the control loop, asynchronous operation could introduce inconsistencies. Further experiments are required to determine the optimal synchronization strategy.

Key considerations for upcoming iterations:

Balance between synchronous deterministic behavior and asynchronous high-frequency sampling.
Integration of velocity conditioning to fully exploit the benefits of 60Hz data capture.
Evaluation of model robustness under both synchronous and asynchronous environments.

4. Comparative Progress: Week 20 to Week 25

Building on previous milestones (Week 20), where centered driving, turn augmentation, and mirror balancing significantly improved lane keeping and turn execution, Week 25 introduces more advanced preprocessing and dataset curation strategies. The table below summarizes the evolution:

Metric / Feature	Week 20 (Baseline)	Week 25 (Current)
Dataset Composition	Centered driving + Turns + Lane Recovery (Mirror Augmentation)	Dagger (soft & strong) + Noise Injection across 4 towns; Town07 reserved for validation
Steering Preprocessing	Raw steering angle (binned categories)	Scaled tangent transform: 1.1*tan(steering) + category-based subsampling
Sampling Frequency	20 fps synchronous recording	Exploring asynchronous 60Hz data capture (benchmark analysis)
Key Strengths	Balanced turn samples, reduced oscillations	Enhanced stability, noise robustness (in progress), generalized across towns
Remaining Challenges	Early steering commands, curb invasion on tight curves	Sharp turns still failing; effective sample reduction due to subsampling

Demo: Model trained with tangent preprocessing (Week 25 prototype)

Video 1: Agent behavior after applying tangent preprocessing and subsampling. Notice improved stability on straight segments; sharp curves remain a challenge.

5. Conclusions & Immediate Next Steps

Key Takeaways

Tangent preprocessing significantly enhances driving stability by filtering noise and emphasizing steering commands, but its full potential depends on maintaining sufficient sample diversity across all steering bins.
The reduction in effective samples (due to subsampling to respect category limits) likely hindered performance on sharp curves. This suggests that a larger, more balanced raw dataset is required before applying aggressive bin capping.
Asynchronous 60Hz data collection remains an open research direction; initial results are promising for reactive agents, but velocity-based architectures will require careful tuning.
Dagger + Noise Injection pipeline is structurally ready; next step is to complete noise integration and validate on Town07.

Proposed Work for Week 26

Complete Noise Injection integration into the Dagger dataset for all training towns (01,02,04,12).
Re-generate dataset with full tangent preprocessing while maintaining high sample counts per steering bin (avoid aggressive subsampling).
Implement asynchronous data capture script at 60Hz for a controlled comparison against synchronous 20Hz datasets.
Validate on Town07 using BehaviorMetrics to quantify improvements in stability, curve negotiation, and recovery maneuvers.
Analyze effect of velocity conditioning when using high-frequency asynchronous data.

                Pending Long-term Tasks:
                Full integration of BehaviorMetrics for quantitative evaluation across multiple towns.
Hyperparameter tuning for the tangent scaling factor (currently 1.1) to optimize curve performance.
Comparative study: synchronous vs. asynchronous data collection impact on end-to-end driving models.

            

Appendix: Reference Architecture & Data Curation Insights

The approach this week heavily references the methodology described in Jorge Rodríguez González’s work (Semana 10). Key insights that will guide future dataset generation include:

Bin-based filling condition: Enforce a maximum of 10,000 samples per steering bin (0.08 steps) to avoid over-representation of straight driving.
Tangent scaling rationale: The function 1.1 * tan(steering) acts as a non-linear amplifier, making the model more sensitive to steering changes while saturating extreme values to prevent erratic behavior.
Asynchronous capture: At 60Hz, the system collects more temporal samples, which could help the model anticipate curves earlier if combined with recurrent or temporal architectures.

These principles will be fully adopted in the upcoming dataset generation cycle to ensure consistency with state-of-the-art end-to-end driving pipelines.

Overall Summary

Week 25 marked a significant shift toward advanced dataset curation and preprocessing, inspired by proven techniques from the robotics community. The introduction of tangent-based steering transformation improved stability, while the analysis of asynchronous 60Hz sampling opened new avenues for temporal reasoning. The main challenge remains sharp curve negotiation, which will be tackled by increasing sample diversity and completing the noise injection pipeline. With Town07 reserved for validation, the upcoming weeks will focus on rigorous quantitative evaluation and final integration of Dagger with noise augmentation.

Status: Dataset generation pipeline (Dagger + Noise) is at 70% completion; validation on Town07 scheduled for Week 26 after full noise integration and tangent preprocessing reimplementation without aggressive subsampling.