The tasks proposed for this week are
- Do different exercises and study Deep Reinforcement Learning algorithms.
- Replicating Vanessa Fernandez’s master’s dissertation. Trying to pack everything in a single Docker container.
I have tried to replicate Vanessa Martinez’s end-of-master work inside a Docker container to try to isolate all the components and have them located in a few files.
For this process it is necessary to uninstall
docker.io (it is the one that is installed by default in a normal docker installation) and install
docker-engine where plugins can be installed to enrich the program. Important: Is required to have
docker-engine >= 19.02. It’s recommended to uninstall previous versions of Docker by following these instructions.
Among the necessary plugins is
nvidia-docker, which cloning the official image has the necessary libraries to obtain the resources of the graphics card.
nvidia-docker image can be downloaded in this repository.
The other point of the week was to study some of the classic algorithms to strengthen the knowledge acquired in previous weeks and see the applications in specific algorithms.
The two methods studied this week are two classic time-difference algorithms:
The main difference between the two methods is that they differ in policy monitoring. The first one does not follow a policy and the main objective is to seek the value that maximizes the Q function regardless of how it has achieved it. The second of them follows a policy and maximising its value is linked to the way it learns.
I will add more information in the coming days.
- Reinforcement learning: Temporal-Difference, SARSA, Q-Learning & Expected SARSA in python.
- SARSA versus Q-learning – on-policy or off?.
- TD in Reinforcement Learning, the Easy Way.
- On and Off-Policy Relational Reinforcement Learning.
Given the size that the Docker image can have after installing all the dependencies as well as the fact that the exercise presented by Vanessa Fernandez requires a graphic interface, we will try a mixed configuration where everything that depends on the graphic card will be grouped inside the Docker container.
The Ubuntu I work with is version 18.04 with the KDE desktop interface. To prevent collisions between libraries in the operating system and OpenCV I will try to compile and install the library with the Qt flag instead of the default GTK flag.
In the process of learning and familiarisation with the reinforcement learning algorithms, the following will continue to be studied within the classical algorithms:
- Cross entropy
- Dynamic programming.
- Monte Carlo.
This week I have focused mainly on getting to know more deeply the differences between various methods of LR. In particular, everything related to Q-Learning and SARSA. Their differences between politics, advantages and disadvantages, usefulness, etc. Knowing in a deeper way classical RL algorithms like the ones mentioned above, helps to understand the way in which DRL algorithms are taken further. I am currently studying different classical algorithms such as Dynamic Programming or Monte Carlo.