Week 25. Deep in into code
To Do
Solve the following questions to understand the brain:
- How does the robot work?
- How does sensory data get to you?
- Stop/reset condition.
- Distance traveled.
- How is the reward collected?
- How do the parameters change for each step?
- How do you restart Gazebo?
- What changes at each restart?
Progress
The examples consist of two parts: the brain and the main program.
The brain only manages each step and restarts and it is the main program that makes the calls to OpenAI Gym that controls the rest of the components.
- How does the robot work?
The robot, for each step, selects an action at random from 3 values: 1, 2 and 3. The values 2 and 3 combine forward (0.05 m/s) and turn (0.2 rad/s). This random selection is part of the DeepQN learning method. Initially these values are randomly selected the first 10.000 times to avoid correlation between each iteration and to avoid divergence of the problem. When the times marked as “exploration” are exceeded, the higher values of the Q table will be used and thus the learning will begin.
- How does sensory data get to you?
The example of the turtlebot obtains the sensory data from two sources: the camera and the laser. Currently, only the second sensor is used and the code is under revision to change it so that only the camera is used. The actuators only respond to data collected by the laser. The camera is only used to record the sequence.
- Stop/reset condition.
The stop condition depends on the distance the laser gets from the wall. If the distance is less than 0.21 it is considered collision and the ‘reset()` method is called which returns the robot to the start position.
- Distance traveled.
The distance travelled is related to the number of steps the robot is able to accumulate (with a maximum of 1000 before moving on to the next period).
- How is the reward collected?
The reward values are between 0 and 1 when a step is achieved. The reward is obtained by using, again, the laser. All the laser beams are divided into two sectors (left and right). The more similar the values of each half are to each other, the higher the reward (maximum 1).
- How do the parameters change for each step?
As said in the first point, the action is selected at random and for each observation the Q values are stored.
- How do you restart Gazebo?
Calling the ‘reset()` method restores the position of the robot when the “collision” condition is met (distance to the wall < 0.21 meters.
- What changes at each restart?
Nothing at first. It will change when the first 10,000 times pass and the stored information begins to explode.
Working
Adapting the code so that the camera controls the robot by adapting the files to configure the formula 1 and the circuit.
Learned
Studying the code and understanding how it behaves in different situations has been a very good experience since it has allowed me to better understand how the DQN algorithm works.