Personal webpage for PhD Students.
View the Project on GitHub RoboticsLabURJC/2019-phd-alberto-martin
2017- A Brief Survey of Deep Reinforcement Learning
Classical methods:
Deep Learning methods:
Current research & challenges:
2018 - Dexterous Manipulation with Reinforcement Learning: Efficient, General, and Low-Cost
2018 - Learning to Walk via Deep Reinforcement Learning
2018 - An Introduction to Deep Reinforcement Learning
2018 - Deep Reinforcement Learning for robotic manipulation-the state of the art
2018 - Qt-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
2017 - Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates
A one-armed bandit is a simple slot machine wherein you insert a coin into the machine, pull a lever, and get an immediate reward. A multi-armed bandit is a complicated slot machine wherein instead of 1, there are several levers which a gambler can pull, with each lever giving a different return. The probability distribution for the reward corresponding to each lever is different and is unknown to the gambler.
The task is to identify which lever to pull in order to get maximum reward after a given a set of trials. This problem statement is like a single Markov decision process. Each arm chosen is equivalent to an action, which then leads to an immediate reward.
Reinforcement Learning Guide: Solving the Multi-Armed Bandit Problem from Scratch in Python
The main characters of RL are the agent and the environment. The environment is the world that the agent lives in and interacts with. At every step of interaction, the agent sees a (possibly partial) observation of the state of the world, and then decides on an action to take. The environment changes when the agent acts on it, but may also change on its own.
The agent also perceives a reward signal from the environment, a number that tells it how good or bad the current world state is. The goal of the agent is to maximize its cumulative reward, called return. Reinforcement learning methods are ways that the agent can learn behaviors to achieve its goal.
A state s is a complete description of the state of the world. There is no information about the world which is hidden from the state. An observation o is a partial description of a state, which may omit information.
Different environments allow different kinds of actions. The set of all valid actions in a given environment is often called the action space. Some environments, like Atari and Go, have discrete action spaces, where only a finite number of moves are available to the agent. Other environments, like where the agent controls a robot in a physical world, have continuous action spaces. In continuous spaces, actions are real-valued vectors.
A policy is a rule used by an agent to decide what actions to take. It can be deterministic, in which case it is usually denoted by mu or it may be stochastic, in which case it is usually denoted by pi.
A trajectory T is a sequence of states and actions in the world.
The reward function R is critically important in reinforcement learning. It depends on the current state of the world, the action just taken, and the next state of the world although frequently this is simplified to just a dependence on the current state, rt = R(st), or state-action pair rt = R(st,at).
The goal of the agent is to maximize some notion of cumulative reward over a trajectory.
Whatever the choice of return measure, and whatever the choice of policy, the goal in RL is to select a policy which maximizes expected return when the agent acts according to it.
It’s often useful to know the value of a state, or state-action pair. By value, we mean the expected return if you start in that state or state-action pair, and then act according to a particular policy forever after. Value functions are used, one way or another, in almost every RL algorithm.
Policy Search: Methods and Applications
Reinforcement Learning Explained
Deep Reinforcement Learning Nanodegree
OpenAI Spinning Up - rl introduction
Course: CPSC522/Markov Decision Process (UBC)
Course: CS20 Tensorflow for Deep Learning Research (Stanford)
Intro to TensorFlow for Deep Learning
TensorFlow: From Basics to Mastery
Deep Reinforcement Learning: Pong from Pixels
Demystifying Deep Reinforcement Learning
Simple Reinforcement Learning with Tensorflow
Reinforcement Learning: Q-Learning and exploration
A Beginner’s Guide to Deep Reinforcement Learning
Model-based reinforcement learning
Deep Reinforcement Learning: Playing CartPole through Asynchronous Advantage Actor Critic (A3C)
Actor-Critic Methods: A3C and A2C
Soft Actor Critic—Deep Reinforcement Learning with Real-World Robots
Deep Reinforcement Learning for Robotics
Google X’s Deep Reinforcement Learning in Robotics using Vision
Controlling a 2D Robotic Arm with Deep Reinforcement Learning
Reinforcement Q-Learning from Scratch in Python with OpenAI Gym
Reinforcement Learning: Introduction to Monte Carlo Learning using the OpenAI Gym Toolkit
Reinforcement Learning Demystified: Markov Decision Processes