We present implementations of REINFORCE with Baseline and Monte-Carlo Tree Search algorithms on three MDPs: Cartpole, CS687-Gridworld and Mountain Car. For extra-credits, we have implemented a yet unexplored MDP: Mountain Car and we present different algorithms: Epsilon Greedy, Epsilon Decreasing Greedy, Upper Confidence Bound (UCB) and Thompson sampling performance analysis on multi-armed bandits.
jayxsinha/RL-Project
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|