The strange behaviour of the previous session where the vehicle was able to complete the circuit but not above the line was because the analysis line where the centre is extracted was too high. A line that is too high implies that the range of rewards is the best but with the behaviour seen in the previous session. A too low analysis line implies that the behaviour is too abrupt which causes the car to go off the track too many times.
Therefore, we look for an intermediate line that is capable of completing the circuit and with a balanced distance to the line. The value of the horizontal line that is selected is 300 (the image is 480 pixels high with 0 being the top and 480 the bottom).
On the other hand, different reward regions have been created depending on the error with respect to the centre of the image. The more difference there is (or more error) the less reward will be returned and if the centre is in the central range of the image the reward will be higher. The following figure shows the different ranges.
With the value of 300 to analyze the centre of the line and the different reward regions, there is a configuration to perform different experiments.
In addition to the horizontal line to obtain the centre, it will also be tested with 2 and 3 levels (lines) of perception in order to see if more perception lines provide a better representation of the environment and improve training.
Associated with this they will also be tested with different configurations of actions from a “simple” configuration with a set of 3 actions, “medium” with a set of 5 actions and “difficult” with a set of 7 actions. Increasing the number of actions means more possibilities for turning the car.
In the following chapters we will see the results of the trainings combining different sets of parameters.