This week’s primary focus was generating a robust dataset using Dagger (Dataset Aggregation) combined with Noise Injection. The objective is to later use Town 07 exclusively for validation testing. The dataset currently comprises images from Towns 01, 02, 04, and 12, covering straight driving, right/left turns, and recovery maneuvers. The integration of Noise Injection is currently pending completion.
As a result, we obtained the dataset composition shown in Figure 1, which includes samples from soft Dagger, strong Dagger, and preliminary noise injection structures across multiple CARLA towns.
Key insight: The dataset is structured to expose the model to diverse driving behaviors before final validation on the unseen Town07 scenario.
Inspired by the remarkable results from Jorge Rodríguez González’s work (reference), we performed a deep analysis of his dataset construction methodology. The main differentiating factors are summarized below:
| Technique | Description | Expected Impact |
|---|---|---|
| Tangent Preprocessing | Steering angle transformed using 1.1 * tan(steering) before training | Filters low-amplitude noise & amplifies turning effect |
| Asynchronous Data Logging @ 60Hz | Script captures frames asynchronously at a requested 60Hz rate | Higher temporal resolution, better reactive behavior |
| Category-based Sampling (25 bins) | Steering range -1 to 1 split into 25 steps (0.08 increments). Each bin capped at 10,000 samples. | Balanced dataset prevents overfitting to common steering angles |
The impact of running CARLA asynchronously at 60Hz is not yet fully characterized. For pure dataset generation without speed-based sampling, the reactive nature of the model may mask potential issues. However, when vehicle velocity is incorporated into the control loop, asynchronous operation could introduce inconsistencies. Further experiments are required to determine the optimal synchronization strategy.
Key considerations for upcoming iterations:
Building on previous milestones (Week 20), where centered driving, turn augmentation, and mirror balancing significantly improved lane keeping and turn execution, Week 25 introduces more advanced preprocessing and dataset curation strategies. The table below summarizes the evolution:
| Metric / Feature | Week 20 (Baseline) | Week 25 (Current) |
|---|---|---|
| Dataset Composition | Centered driving + Turns + Lane Recovery (Mirror Augmentation) | Dagger (soft & strong) + Noise Injection across 4 towns; Town07 reserved for validation |
| Steering Preprocessing | Raw steering angle (binned categories) | Scaled tangent transform: 1.1*tan(steering) + category-based subsampling |
| Sampling Frequency | 20 fps synchronous recording | Exploring asynchronous 60Hz data capture (benchmark analysis) |
| Key Strengths | Balanced turn samples, reduced oscillations | Enhanced stability, noise robustness (in progress), generalized across towns |
| Remaining Challenges | Early steering commands, curb invasion on tight curves | Sharp turns still failing; effective sample reduction due to subsampling |
The approach this week heavily references the methodology described in Jorge Rodríguez González’s work (Semana 10). Key insights that will guide future dataset generation include:
These principles will be fully adopted in the upcoming dataset generation cycle to ensure consistency with state-of-the-art end-to-end driving pipelines.
Week 25 marked a significant shift toward advanced dataset curation and preprocessing, inspired by proven techniques from the robotics community. The introduction of tangent-based steering transformation improved stability, while the analysis of asynchronous 60Hz sampling opened new avenues for temporal reasoning. The main challenge remains sharp curve negotiation, which will be tackled by increasing sample diversity and completing the noise injection pipeline. With Town07 reserved for validation, the upcoming weeks will focus on rigorous quantitative evaluation and final integration of Dagger with noise augmentation.
Status: Dataset generation pipeline (Dagger + Noise) is at 70% completion; validation on Town07 scheduled for Week 26 after full noise integration and tangent preprocessing reimplementation without aggressive subsampling.