- Continuing with the replicate of Vanessa’s master’s degree thesis using Python 3.
- Execute and undestanding Pong example from
1. Continuing with the replicate of Vanessa’s master’s degree thesis with Python3.
During the process of replicating Vanessa Fernandez’s master’s dissertation, the following points are highlighted:
- Migrating all the code to Python 2 means moving from ROS1 to ROS2. Initially that change is very big to do it in this point of the development of the work so it is set as a future point.
- In this replication attempt, access to the graphics card was blocked and Gazebo reported an error. I had to reinstall the drivers. Apparently everything is back to normal.
2. Execute and undestanding Pong example from
Reading the DQN chapter Maxim Lapan - Deep Reinforcement Learning - Hands On.
wrappers.py contains coverage for different situations, such as requiring the player to press START after each repetition.
dqn_model.py file contains the network configuration. The model has 3 convolutional layers and 2 fully-connected. All layers are connected by the ReLU activation function.
Important note from the book:
If transition in the batch is from the lat step in the episode, then out value of the action doesn’t have a discounted reward of the next state, as there is no next state to gather reward from. This main look minor, but this is very important in practice: without this, training will not converge.
I am currently reading about the DQN algorithm in the previously mentioned book. The goal is to fix the bug in the code in the training of the game of Pong.
At the same time I’m in contact with Vanessa to create a deployment guide for her algorithm.
Developing the DQN algorithm I realized that I needed to stop trying to solve the problem that occurs in the training to build the same path we take in the study of classical algorithms: read and study more. After that little process, the code was much better understood and made more sense :-)