Week 6-7. Q learning - introducing stop and go and iteration velocity
On this occasion, it has been decided to improve the results obtained on the previous occasion. For this purpose, two techniques have been used to squeeze the algorithm a little more:
- Stop and go: this technique allows to stop the simulator, to make a decision, to leave some time to act on the environment of this decision and then to stop it again. This technique makes it possible to regulate learning, since the speed at which the algorithm decides can be regulated indirectly.
- Iteration speed: a more direct way to regulate the learning of the algorithm is to limit the iteration speed by imposing a maximum speed. With this technique you can find the iteration speed sufficient for the algorithm to learn to drive at the maximum speed given to it. The goal of applying these two techniques is to find a Q table that works correctly at a low iteration speed, since to obtain an algorithm that drives twice as fast, it would be enough to multiply the values of the Q table by two as well as the maximum linear and slew rate. I am currently testing the second option (although the first one is also implemented and can be implemented if passed by parameters) with two iteration speeds:
- 8 iterations/second: In this case the algorithm is not able to converge in a reasonable training time and even if it did, the training would be very long, which is unfeasible for the development of this work.
- 10 iterations/second: Currently the algorithm is training at this speed and shows good results, but it will be necessary to observe what happens after more training time. Once this last option has been tested, the iteration speed will be gradually increased until the exact convergence point is found. Then we will explore the stop and go option.