Armando Mateus – Report April 14, 2026 (Week 28)

Systematic Dataset Balancing and Active Data Generation for Right‑Lane Keeping

April 14, 2026

Investigating the limits of dataset size and the effect of targeted steering distributions to eliminate lane‑divider driving and oscillations.

1. Recap of Week 27 – Incomplete Recovery

In Week 27 we added 9,800 lane‑recovery examples (from left lane back to right lane) to the training set. The goal was to teach the ego‑vehicle to consistently stay in the right lane. However, the agent still drove over the lane dividing line rather than fully recovering to the right lane. The effect of tangent‑based steering preprocessing remained inconclusive due to the dominant behavioural flaw.

Key lesson: Adding recovery examples from the left lane is insufficient when the vehicle’s most common failure mode is driving on the lane divider, not fully in the left lane.

2. Scaling the Dataset – From 41k to 103k Samples

Following the Week 27 observations, we hypothesised that a larger and more balanced dataset might solve the lane‑divider behaviour. Therefore, we expanded the training set from 41,000 to 103,000 samples by:

Recording additional driving segments in rural areas of Town 12 (unstructured roads, varying lighting).
Applying oversampling to underrepresented steering angles (especially moderate positive steering for right‑lane recovery).
Including both straight driving and gentle curves to maintain a natural steering distribution.

The expanded dataset was used to retrain the PilotNet architecture exactly as in previous weeks.

Unexpected result

Contrary to our expectations, the larger dataset did not improve right‑lane keeping in either urban (Towns 01–03) or rural (Town 12) environments. Worse, the vehicle developed a tendency toward mild, sporadic oscillations (gentle left‑right movements) even on straight segments.

Diagnosis: The oversampling of direct driving examples (low steering angles) inadvertently created a severe imbalance. Although we added recovery samples, the proportion of straight‑driving examples became overwhelming (≈85% of the dataset). Consequently, the model learned to “play safe” by outputting near‑zero steering most of the time, but when a small perturbation occurred, it lacked strong corrective examples and began oscillating.

This is a classic symptom of a dataset dominated by the mean steering value (close to 0) with insufficient representation of moderate and extreme steering actions needed to correct from off‑centre positions.

3. New Approach – Active Dataset Generation with Targeted Steering Distribution

Given that simply scaling the dataset and oversampling did not work, we have pivoted to a fundamentally different strategy. Instead of complementing an existing dataset, we will actively generate data using a hybrid DAgger (Dataset Aggregation) scheme that enforces a precise steering distribution.

Core ideas

Soft DAgger: Collect samples when the ego‑vehicle’s steering command is within the range [-0.3, 0.3] (normal driving, small corrections).
Strong DAgger: Actively perturb the vehicle (e.g., by adding noise or using an exploratory policy) to generate samples with steering outside the soft range, especially to “push” it out of the right lane and then record corrective manoeuvres back into the right lane.
No more post‑hoc balancing: The dataset is built from scratch with a pre‑defined distribution that guarantees sufficient representation of all manoeuvre types.

Target steering distribution (active generation)

For every 100 frames collected, we will enforce the following approximate percentages:

Extreme left / extreme right

10%

(steering ≈ ±0.7 to ±1.0)

Moderate left / moderate right

15%

(steering ≈ ±0.3 to ±0.7)

Forward (near zero)

40%

(steering between -0.1 and 0.1)

Mild corrections

35%

(steering between ±0.1 and ±0.3)

The total sums to 100% (10+15+40+35). The exact bin sizes will be fine‑tuned, but the critical change is the forced inclusion of 25% moderate‑to‑extreme steering (15% moderate + 10% extreme) compared to previous datasets where such samples rarely exceeded 5–10%.

Implementation: A data collection script will monitor the vehicle’s steering command. If the desired distribution is not met over a rolling window, the script will temporarily switch to an exploratory policy (e.g., adding a small bias to the steering or commanding a lane‑change manoeuvre) to generate underrepresented steering values. All collected frames are stored with their corresponding expert steering (human or corrected by a simple controller). This ensures that the final dataset has the exact proportions we need.

4. Systematic Training – Finding the Optimal Dataset Size

A key open question is: Is there a maximum useful dataset size beyond which performance plateaus or degrades? To answer this, we will perform successive training runs on datasets of increasing size, starting from a minimal viable set and moving up to 100,000+ samples.

5,000 samples
(minimal balanced)

10,000 samples

20,000 samples

50,000 samples

100,000 samples

For each dataset size, we will train the same PilotNet architecture and evaluate on three metrics:

Right‑lane keeping success rate (percentage of time the vehicle’s centre is to the right of the lane divider, measured over 5 minutes of driving in Towns 01, 02, 03, and 12).
Oscillation magnitude (standard deviation of steering command when the vehicle is on a straight road).
Recovery time (seconds needed to return to the right lane after an intentional perturbation to the left lane).

Hypothesis: There exists an optimal dataset size (likely between 20,000 and 50,000 samples) where the balance of manoeuvres is perfect. Beyond that, adding more “easy” forward examples may degrade performance by biasing the model toward zero steering. The experiments will reveal the exact inflection point.

5. Current Status and Planned Experiments (Week 29)

As of April 14, the following actions are in progress:

Data collection script with distribution enforcement – being implemented in the CARLA simulator. The script monitors a rolling buffer of 500 frames and triggers exploratory actions when any steering bin falls below its target by more than 5%.
Initial 5,000‑sample dataset – collected with the exact distribution (10% extreme, 15% moderate, 40% forward, 35% mild). Training will begin on April 15.
Benchmarking framework – automated evaluation routes defined for Towns 01–03 and Town 12.

Expected outcomes for Week 29

If the distribution‑controlled dataset works, the vehicle should no longer drive over the lane divider and oscillations should disappear.
Successive training will show whether increasing the dataset beyond ~50k samples hurts performance.
Once the right‑lane keeping is robust, we will revisit the tangent preprocessing ablation (postponed from Week 27) using the best‑performing dataset.

Note on Huggingface integration: The upload of the Week 26 and Week 27 datasets is still ongoing. However, we will prioritise publishing the final distribution‑controlled datasets once validated, as they are expected to become the new baseline.

6. Conclusions (Week 28)

❌ Simply scaling the dataset from 41k to 103k samples and applying oversampling did not improve right‑lane keeping and introduced mild oscillations.
❌ The imbalance caused by an over‑representation of straight‑driving examples (low steering) is the root cause of the oscillatory behaviour.
✅ A new active data generation approach with a targeted steering distribution (25% moderate‑to‑extreme steering) is under development.
🔬 Successive training from 5k to 100k samples will determine the optimal dataset size and prevent performance degradation.
📦 Huggingface uploads will be updated once the new datasets are validated.

SUMMARY OF FINDINGS – APRIL 14, 2026 (WEEK 28):

⚠️ Scaling dataset to 103k with oversampling failed – caused oscillations.

🔧 New solution: active DAgger with enforced steering distribution (10% extreme, 15% moderate, 40% forward, 35% mild).

📊 Successive training on increasing dataset sizes (5k → 100k) to find optimum.

🎯 Goal for Week 29: eliminate lane‑divider driving and oscillations.

Immediate next steps (April 15–21, 2026):
Complete the distribution‑controlled data collection script. Gather the first 5,000‑sample balanced dataset and train PilotNet. Evaluate on Towns 01–03 and Town 12. If successful, repeat for 10k, 20k, 50k, and 100k samples while measuring performance metrics. Publish the best‑performing dataset to Huggingface. Then, conduct a clean ablation study of the tangent preprocessing function.

— Armando Mateus, Robotics Lab URJC