Post

Training Ackermann DQN

Index

Training

For the complete parking manoeuvre, several actions have been added to allow acceleration in either direction and turning at the same time. With this new implementation, the first complete parking manoeuvres have been displayed, but unfortunately they have been isolated cases and it has not been possible to repeat them.

New possible starting poses:

1
2
self.random_offset = np.random.choice([-2, -1.75, -1.5, -1, -0.75, -0.5, -0.25, 0, 0.25, 0.5, 0.75, 1, 1.5, 1.75, 2])
self.random_rotation = np.random.choice([-0.05, -0.01, 0, 0.01, 0.05])

New action space:

1
2
self.actions = [(0.0, 0.7, 0.0),(0.0, 0.0, -1.0),(-1.0, 0.0, 0.0),(0.0, 0.0, 0.0),(1.0, 0.0, 0.0),(0.0, 0.0, 1.0),
    (0.5, 0.0, 1.0), (0.5, 0.0, -1.0), (-0.5, 0.0, 1.0), (-0.5, 0.0, -1.0)]

Time issue

The load of this exercise with this chassis is much higher, a training session that used to last 6 hours now lasts 10-12 hours. If I try to increase the training or make the neural network more complex, we can go to about 16 hours. I have not completed any training with the more complex network as I have seen no clear signs of improvement and it is too long…

Rewards problem

The biggest challenge now is to create rewards that avoid stagnation at local minimums. Probably one way to avoid this would be to lengthen the training by having a longer exploration phase, but it’s not possible…

This post is licensed under CC BY 4.0 by the author.