Gazebo F1 Follow line ddpg (months 33 and 34)

1 minute read

Finished cartpole!!

Next goal

Now that we achieved to get a good discrete inputs discrete actions agent, lets try with continuous actions. For that purpose we will use DDPG.

This is another actor critic algorithm also closely related to Q-learning. Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman equation to learn the Q-function, and uses the Q-function to learn the policy.

Progress

ddpg learning to follow the line
ddpg learning to follow the line as fast as possible when adding v to reward function

For more details, see f1 follow line ddpg blog.

Next steps:

study why sigmoid activation function was not working so we need tanh
comparison between ddpg, dqn and qlearning both on training and inferencing
output training rewards and epsilon over time to enable ploting the training of 3 algorithms together
further investigate reward functions in literature (e.g misdesign of reward function)
explore L1 and L2 regularization if experiencing overfiting
Use batch normalization: Applying batch normalization to the output of the layer before the sigmoid activation function can help stabilize and control the values. This can potentially prevent the inputs to the sigmoid function from becoming too large.
explore training and retraining with slight different policy to fine tune pretrained agents with different rewards

Twitter LinkedIn

Gazebo F1 Follow line ddpg (months 33 and 34)

Next goal

Progress

Next steps:

You May Also Enjoy

Carla follow lane DDPG Vs PPO Vs SAC [July 1st]

Carla follow lane DDPG Vs PPO Vs SAC [June 3st]

Carla follow lane DDPG Vs PPO Vs SAC [June 2st]

Carla follow lane DDPG Vs PPO Vs SAC [June 1st]