Included random initial positions and random perturbations to cartpole problem (month 23)
This month we focused on the two following goals:
- Make cartpole learning stable enough making use of DQN, adaptative learning and replay buffers
- Investigate about reward policies alternatives and some ways to mitigate catastrophic forgetting
-
Include monitorization to cartpole to know how solid is our solution (TODO include this monitorization as a common library in RL-Studio)
- you can see more details about the projec status in cartpole project post