Fine‑Tuning or a New Model? Finding the Optimal Dataset Distribution
June 12, 2026
In the previous week, the following guiding questions were raised for this week's work:
- Are we fine‑tuning or training a new model?
- Why is there a performance degradation when going from 2000 samples to 5000 samples, followed by an improvement at 8000 samples?
To answer these questions, the following workflow was followed:
- Analysis of the training script, particularly the section defining the model architecture.
- Construction of 16 dataset compositions based on varying the percentages of each maneuver (out of a total of 10 maneuvers).
📌 WEEK 34 SUMMARY – JUNE 12, 2026
🔍 Fine‑tuning or new model? The script trains a new PilotNet from scratch (He initialisation). No fine‑tuning is performed.
📊 16 dataset compositions were built by varying percentages of 10 maneuvers. Composition #10 (10% left, 10% right, 30% forward, 30% recoveries) performed best.
⚠️ 5k dip and 8k recovery explained: Caused by left‑right imbalance due to scarcity of left turn samples (only 2,100 raw). Balanced compositions avoid the dip.
✅ Best model: Composition #10 with 15,000 samples.
🔜 Week 35: Expand with Town 04 samples, add weather diversity, test on Town 02 and Town 06.