Week 14. Organization and first algorithm using Deep Learning: DQN.

3 minute read

To Do

  • Continuing with the replicate of Vanessa’s master’s degree thesis.
  • Organize the information of the classical methods.
  • Try to run some example of OpenAI gym.

Progress

1. Continuing with the replicate of Vanessa’s master’s degree thesis.

Below is a summary of the steps that have been taken from the first iteration to the present day in replicating Vanessa Fernandez’s master’s dissertation.

For the replication of Vanessa’s master’s dissertation the packages have been installed:

sudo apt install jderobot
sudo apt install jderobot-gazebo-assets

The repository has been cloned:

https://github.com/RoboticsLabURJC/2017-tfm-vanessa-fernandez.git

We navigate to the directory where the main program is located:

cd 2017-tfm-vanessa-fernandez/Follow Line/dl-driver

Using two terminals, we launch on one the ROS command that executes the world of Gazebo with:

roslaunch /opt/jderobot/share/jderobot/launch/f1.launch 

and in another terminal the algorithm is executed:

python driver.py driver.yml 

At this point I have encountered the following problems:

  • In the installation of the file requirements.txt the PyQt5 library is not found. The installation is trying to be done from inside a Python virtual environment, it seems that this library is only accessible if you try to install it on the main machine. This is a point that should be improved, so the environment of the user’s machine is not affected. Using Python 3 seems to solve the problem.

  • The code is in Python 2, which will not be maintained after 2020. It would be a good idea to try to translate the code to Python 3.5+. With this improvement you would be within a current code execution.

  • Set the versions of the libraries used in pip. Currently the file requiremens.txt has not fixed any version of the libraries. At the time the end-of-master job was done, the stable TensorFlow version was 1.14. Currently it is version 2.0 so there are some changes that would have to be made to the original code to make it work again. Maybe it would be a good idea to set it to the latest stable version: 1.14.

I am currently in contact with Vanessa to solve these small bugs and get some functional models to run the network.

2. Organize the information of the classical methods.

So far the following classical methods have been studied:

  • QLearning.
  • SARSA.
  • Dynamic Programming (Policy and Value iteration).
  • Monte Carlo.

The division between the way they face the result can be seen in the following table.

Model-free methods Value-based methods Temporal Difference (TD) Q-Learning
SARSA
Policy-based methods
Policy Iteration
Value Iteration
Model-based methods - - -

Soon I will include more information on each of the algorithms.

3. Run an example of DQN algorithms.

The example exercise that I have chosen to train and execute a DQN algorithm is the example of Alberto Martin’s repository with the PONG game. The goal is, once this one works, to try to replicate the same code with the Space Invaders game (I really like that game :-) ).

Although the original code was configured using TensorFlow 2.0, this version was in alpha phase. Installing directly the requirements.txt libraries does not execute the algorithm correctly. Creating a virtual environment and installing the libraries, the result of the training returns the following error:

  writer = tf.contrib.summary.create_file_writer(logdir='runs', flush_millis=10000, filename_suffix="-dqn-pong")
AttributeError: module 'tensorflow' has no attribute 'contrib'

As I said before, since the version with which this exercise was configured was in alpha, the command that gives the error is not in the set of instructions. According to this thread of StackOverflow it is recommended to execute the script that gives the official documentation.

This process has been done and created a parallel folder with the same files but with the translated commands.

Running the file dqn_pong.py again we have the following error whe the algoritm is training (in the step 11):

. . . 
435: done 8 games, mean reward -20.625, eps 0.93, speed 762.74 f/s
8253: done 9 games, mean reward -20.667, eps 0.92, speed 732.73 f/s
9106: done 10 games, mean reward -20.700, eps 0.91, speed 708.14 f/s
9967: done 11 games, mean reward -20.727, eps 0.90, speed 732.39 f/s
Traceback (most recent call last):
  File "dqn_pong.py", line 172, in <module>
    tgt_net.set_weights(net.get_weights())
  File "/path/to/virtualenvironment/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 1327, in set_weights
    str(weights)[:50] + '...')
ValueError: You called `set_weights(weights)` on layer "dqn_1" with a  weight list of length 10, but the layer was expecting 0 weights. Provided weights: [array([[[[ 4.71510142e-02,  3.64676267e-02, -4.32...

Working

I am currently working in parallel on the three previous points:

  • Organizing the documentation of what has been learned from the classical algorithms as well as deepening the operation.
  • Finalizing the errors and improving the code structure of Vanessa’s end-of-master work.
  • Solving the bugs in the execution of the given Pong in the repository (folder puppis).

Learned