Weeks 52-54. Training of the reinforcement learning algorithm

1 minute read


In this period of work, all the trainings have been carried out with the different parameter configurations in terms of perception levels, actions and circuits. Each training has had a fixed duration of 2 hours where the simplest configurations manage to solve the circuit in a few minutes and the most complex configuration (3 levels of perception and more extensive number of actions) does not manage to complete the training or the subsequent execution.

Strange behavior
Laser sensor broken

In the example seen in the images we see in the left graph a distribution of the states that have been given the most, being the value 0 (the line) the one that is repeated the most. The positive values represent right-hand curves and the negative values represent left-hand curves. As it has been trained in the Simple Circuit, which has more right-hand curves, it is logical that the output is like this.

The central and right graphs represent the accumulated reward for each episode and the steps taken for each episode. The green and orange graphs give very good results where, in a few times, they manage to complete the circuit and be constantly completing laps.

A more complex example, training in Nürburging with 2 levels of simplified perception and an average set of actions can be seen in the following graph

Laser sensor broken
Laser sensor broken

In this case it is not so easy to know the trend of the given circuit, being two levels of perception there are many more possible states. The trainings manage to complete the lap as well, but you don’t see such long episodes as you did in the case of a perception point. Even so the training sessions have gone well and have been completed but the values of the number of times, the size of the Q-table or the time to complete a lap are greater than in the previous case.

Next steps

Complete all the training, create a video summary and write the report of the final master’s work.