|
76 | 76 |
|
77 | 77 | [**"Grandmaster Level in StarCraft II using Multi-agent Reinforcement Learning"**](#grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning-vinyals-et-al) by Vinyals et al. `paper` `summary` *(AlphaStar)* |
78 | 78 |
|
79 | | - [overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=318) by Oriol Vinyals `video` |
| 79 | + [overview](https://slideslive.com/38922724/grandmaster-level-in-starcraft-ii-using-multiagent-reinforcement-learning) by Oriol Vinyals `video` |
80 | 80 | [overview](https://youtu.be/3UdH3lPF7nE) by Oriol Vinyals `video` |
81 | 81 | [overview](https://slideslive.com/38916905/alphastar-mastering-the-game-of-starcraft-ii) by David Silver `video` |
82 | 82 | [overview](https://youtu.be/mzjGNo9Tz4g?t=10m53s) by David Silver `video` |
|
100 | 100 |
|
101 | 101 | ["Dota 2 with Large Scale Deep Reinforcement Learning"](https://cdn.openai.com/dota-2.pdf) by Berner et al. `paper` *(OpenAI Five)* |
102 | 102 |
|
103 | | - [OpenAI File overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=2175) by Jie Tang and Filip Wolski `video` |
| 103 | + [OpenAI Five overview](https://slideslive.com/38922722/contributed-talk-playing-dota-2-with-large-scale-deep-reinforcement-learning) by Jie Tang and Filip Wolski `video` |
104 | 104 | [OpenAI Five overview](https://youtu.be/w3ues-NayAs?t=2m26s) by Ilya Sutskever `video` |
105 | 105 | [OpenAI Five overview](https://youtu.be/N8_gVrIPLQM?t=1h3m41s) by David Silver `video` |
106 | 106 |
|
|
227 | 227 | [AlphaZero vs Stockfish](https://youtube.com/playlist?list=PLDnx7w_xuguHIxbL7akaYgEvV4spwYkmn) games `video` |
228 | 228 | [AlphaZero vs Stockfish](https://youtube.com/playlist?list=PL-qLOQ-OEls607FPLAsPZ6De4f1W3ZF-I) games `video` |
229 | 229 |
|
230 | | - ["Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI"](https://amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184) by Matthew Sadler and Natasha Regan `book` ([talk](https://youtube.com/watch?v=HgZYIDslnAI) `video`, [analysis](https://youtube.com/playlist?list=UUkK8M0dMhAX8JinU-6aD7xA) `video`) |
| 230 | + ["Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI"](https://amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184) by Matthew Sadler and Natasha Regan `book` ([talk](https://youtube.com/watch?v=HgZYIDslnAI) `video`, [games](https://youtube.com/playlist?list=UUkK8M0dMhAX8JinU-6aD7xA) `video`) |
| 231 | + |
| 232 | + [Leela Chess Zero](https://youtube.com/playlist?list=PLDnx7w_xuguH7UO4bNGo56w94NXAef6YH) games `video` ([overview](https://en.wikipedia.org/wiki/Leela_Chess_Zero)) |
231 | 233 |
|
232 | 234 | ---- |
233 | 235 | - *Quake III Arena* |
|
672 | 674 | [overview](http://videolectures.net/DLRLsummerschool2018_bowling_multi_agent_RL) by Michael Bowling `video` |
673 | 675 | [overview](http://youtube.com/watch?v=9qPhrEYIRF4) by Jakob Foerster `video` |
674 | 676 | [overview](http://youtube.com/watch?v=hGEz4Aumd1U) by Arsenii Ashukha `video` |
| 677 | + [overview](http://youtube.com/watch?v=0OSvoYbWs9o) by Sergei Sviridov `video` `in russian` |
675 | 678 |
|
676 | 679 | [overview](http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf) by Marc Lanctot `slides` |
677 | 680 |
|
|
783 | 786 | #### exploration and intrinsic motivation - bayesian exploration |
784 | 787 |
|
785 | 788 | [overview](https://youtu.be/sGuiWX07sKw?t=57m28s) by David Silver `video` |
786 | | - [overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970) by Shimon Whiteson `video` *(lack of good methods for real exploration as opposed to simulated exploration)* |
| 789 | + [overview](https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning) by Shimon Whiteson `video` *(lack of good methods for real exploration as opposed to simulated exploration)* |
787 | 790 |
|
788 | 791 | ---- |
789 | 792 |
|
|
952 | 955 | > "Maximizing incompetence does not model very well the psychological models of optimal challenge and “flow” proposed by (Csikszentmihalyi, 1991). Flow refers to the state of pleasure related to activities for which difficulty is optimal: neither too easy nor too difficult. As difficulty of a goal can be modeled by the (mean) performance in achieving this goal, a possible manner to model flow would be to introduce two thresholds defining the zone of optimal difficulty. Yet, the use of thresholds can be rather fragile, require hand tuning and possibly complex adaptive mechanism to update these thresholds during the robot’s lifetime. Another approach can be taken, which avoids the use of thresholds. It consists in defining the interestingness of a challenge as the competence progress that is experienced as the robot repeatedly tries to achieve it. So, a challenge for which a robot is bad initially but for which it is rapidly becoming good will be highly rewarding. Thus, a first manner to implement flow motivation would be: r(SM(→ t), gk, tg) = C·(la(gk, tg−θ) − la(gk, tg)) corresponding to the difference between the current performance for task gk and the performance corresponding to the last time gk was tried, at a time denoted tg−θ." |
953 | 956 |
|
954 | 957 | [**"Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes"**](https://github.com/brylevkirill/notes/blob/master/Artificial%20Intelligence.md#driven-by-compression-progress-a-simple-principle-explains-essential-aspects-of-subjective-beauty-novelty-surprise-interestingness-attention-curiosity-creativity-art-science-music-jokes-schmidhuber) by Schmidhuber `paper` `summary` ([**Artificial Curiosity and Creativity**](https://github.com/brylevkirill/notes/blob/master/Artificial%20Intelligence.md#artificial-curiosity-and-creativity) theory by Schmidhuber) ([overview](https://youtu.be/DSYzHPW26Ig?t=2h7m22s) by Alex Graves `video`) *(maximizing compression progress)* |
| 958 | + [**"PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem"**](https://github.com/brylevkirill/notes/blob/master/Artificial%20Intelligence.md#powerplay-training-an-increasingly-general-problem-solver-by-continually-searching-for-the-simplest-still-unsolvable-problem-schmidhuber) by Schmidhuber `paper` `summary` |
955 | 959 | [**"Automated Curriculum Learning for Neural Networks"**](#automated-curriculum-learning-for-neural-networks-graves-bellemare-menick-munos-kavukcuoglu) by Graves et al. `paper` `summary` *(maximizing prediction gain / complexity gain)* |
956 | 960 | [**"Automatic Goal Generation for Reinforcement Learning Agents"**](#automatic-goal-generation-for-reinforcement-learning-agents-held-geng-florensa-abbeel) by Held et al. `paper` `summary` *(optimally difficult goals)* |
| 961 | + ["Automatic Curriculum Learning For Deep RL: A Short Survey"](https://arxiv.org/abs/2003.04664) by Portelas et al. `paper` |
957 | 962 |
|
958 | 963 | [**interesting papers**](#interesting-papers---exploration-and-intrinsic-motivation---competence-based-models---maximizing-competence-motivation) |
959 | 964 | [**interesting recent papers**](https://github.com/brylevkirill/notes/blob/master/interesting%20recent%20papers.md#reinforcement-learning---exploration-and-intrinsic-motivation) |
|
1176 | 1181 | ["The Next Big Step in AI: Planning with a Learned Model"](https://youtube.com/watch?v=6-Uiq8-wKrg) by Richard Sutton `video` |
1177 | 1182 | ["The Grand Challenge of Knowledge"](http://www.fields.utoronto.ca/video-archive/2016/10/2267-16158) (41:35) by Richard Sutton `video` |
1178 | 1183 | ["Open Questions in Model-based RL"](https://youtube.com/watch?v=OeIVfQz3FUc) by Richard Sutton `video` |
1179 | | - ["Toward a General AI-Agent Architecture"](https://slideslive.com/38921889/biological-and-artificial-reinforcement-learning-4?t=980) by Richard Sutton `video` *(SuperDyna)* |
| 1184 | + ["Toward a General AI-Agent Architecture"](https://slideslive.com/38924024/toward-a-general-aiagent-architecture) by Richard Sutton `video` *(SuperDyna)* |
1180 | 1185 |
|
1181 | 1186 | ["Planning and Models"](https://youtube.com/watch?v=Xrxrd8nl4YI) by Hado van Hasselt `video` |
1182 | 1187 | ["Integrating Learning and Planning"](https://youtube.com/watch?v=ItMutbeOHtc) by David Silver `video` |
|
1342 | 1347 |
|
1343 | 1348 | [overview](https://youtu.be/5rev-zVx1Ps?t=58m45s) by Marc Toussaint `video` |
1344 | 1349 | [overview](https://youtu.be/sGuiWX07sKw?t=1h9m2s) by David Silver `video` |
1345 | | - [overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970) by Shimon Whiteson `video` |
| 1350 | + [overview](https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning) by Shimon Whiteson `video` |
1346 | 1351 |
|
1347 | 1352 | ["Reinforcement Learning: Beyond Markov Decision Processes"](https://youtube.com/watch?v=_dkaynuKUFE) by Alexey Seleznev `video` `in russian` |
1348 | 1353 | ["Partially Observable Markov Decision Process in Reinforcement Learning"](https://yadi.sk/i/pMdw-_uI3Gke7Z) by Pavel Shvechikov `video` `in russian` |
|
1359 | 1364 | [**"Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search"**](#efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search-guez-silver-dayan) by Guez et al. `paper` `summary` |
1360 | 1365 | ["Learning in POMDPs with Monte Carlo Tree Search"](http://proceedings.mlr.press/v70/katt17a.html) by Katt et al. `paper` |
1361 | 1366 | ["Variational Inference for Data-Efficient Model Learning in POMDPs"](https://arxiv.org/abs/1805.09281) by Tschiatschek et al. `paper` |
1362 | | - ["VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning"](https://arxiv.org/abs/1910.08348) by Zintgraf et al. `paper` ([overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970) by Shimon Whiteson `video`) |
| 1367 | + ["VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning"](https://arxiv.org/abs/1910.08348) by Zintgraf et al. `paper` ([overview](https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning) by Shimon Whiteson `video`) |
1363 | 1368 |
|
1364 | 1369 | ---- |
1365 | 1370 |
|
|
1377 | 1382 | ["Bayesian Policy Search"](https://youtu.be/AggqBRdz6CQ?t=9m53s) by Shakir Mohamed `video` |
1378 | 1383 | ["Connections Between Inference and Control"](https://youtu.be/iOYiPhu5GEk?t=2m34s) by Sergey Levine `video` ([write-up](https://arxiv.org/abs/1805.00909)) |
1379 | 1384 |
|
| 1385 | + ["Reinforcement Learning as Probabilistic Inference"](https://youtube.com/watch?v=j8T8Xt8TM8Q) by Pavel Temirchev `video` `in russian` |
| 1386 | + ["Reinforcement Learning as Probabilistic Inference"](https://youtube.com/watch?v=pYsXIPEkSxs) by Pavel Temirchev `video` `in russian` |
1380 | 1387 | ["Reinforcement Learning through the Lenses of Variational Inference"](https://youtube.com/watch?v=6v3RxQycT0E) by Sergey Bartunov `video` |
1381 | | - ["Bayesian Inference for Reinforcement Learning"](https://youtube.com/watch?v=KZd-jkmeIcU) by Sergey Bartunov `video` `in russian` |
1382 | | - ([slides](https://drive.google.com/drive/folders/0B2zoFVYw1rN3N0RUNXE1WnNObTQ) `in english`) |
| 1388 | + ["Bayesian Inference for Reinforcement Learning"](https://youtube.com/watch?v=KZd-jkmeIcU) by Sergey Bartunov `video` `in russian` |
1383 | 1389 |
|
1384 | 1390 | ---- |
1385 | 1391 |
|
|
1610 | 1616 | ["Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents"](https://arxiv.org/abs/1712.06560) by Conti et al. `paper` |
1611 | 1617 | ["Playing Atari with Six Neurons"](https://arxiv.org/abs/1806.01363) by Cuccu et al. `paper` |
1612 | 1618 |
|
1613 | | - ["Evolutionary Computation for Reinforcement Learning"](http://cs.ox.ac.uk/publications/publication10159-abstract.html) by Shimon Whiteson `paper` |
| 1619 | + ["Evolutionary Computation for Reinforcement Learning"](http://cs.ox.ac.uk/publications/publication10159-abstract.html) by Shimon Whiteson `paper` |
| 1620 | + ["Evolutionary Algorithms for Reinforcement Learning"](https://arxiv.org/abs/1106.0221) by Moriarty, Schultz, Grefenstette `paper` |
1614 | 1621 |
|
1615 | 1622 |
|
1616 | 1623 |
|
@@ -1745,7 +1752,7 @@ interesting recent papers: |
1745 | 1752 |
|
1746 | 1753 | - `post` <https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning> |
1747 | 1754 | - `video` <https://youtube.com/watch?v=6eiErYh_FeY> |
1748 | | - - `video` <https://slideslive.com/38922025/deep-reinforcement-learning-1?t=318) (Vinyals) |
| 1755 | + - `video` <https://slideslive.com/38922724/grandmaster-level-in-starcraft-ii-using-multiagent-reinforcement-learning) (Vinyals) |
1749 | 1756 | - `video` <https://youtu.be/3UdH3lPF7nE> (Vinyals) |
1750 | 1757 | - `video` <https://youtu.be/Kedt2or9xlo> (Vinyals) |
1751 | 1758 | - `video` <https://slideslive.com/38916905/alphastar-mastering-the-game-of-starcraft-ii> (Silver) |
@@ -1807,7 +1814,8 @@ interesting recent papers: |
1807 | 1814 | - `video` <https://youtube.com/playlist?list=PLnn6VZp3hqNsrsp_Bg-bEfzzhJ3SuEZE9> |
1808 | 1815 | - `video` <https://slideslive.com/38922026/deep-reinforcement-learning-2?t=3855> (Schrittwieser) |
1809 | 1816 | - `video` <https://youtube.com/watch?v=We20YSAJZSE> (Kilcher) |
1810 | | - - `video` <https://slideslive.com/38921974/perception-as-generative-reasoning-structure-causality-probability-3?t=3954> (Rezende) |
| 1817 | + - `video` <https://slideslive.com/38923124/nonsupervised-learning-and-decision-making?t=1832> (Rezende) |
| 1818 | + - `video` <https://youtu.be/BGyRM5vCkfw?t=26m54s> (Engalych) `in russian` |
1811 | 1819 | - `notes` <https://www.shortscience.org/paper?bibtexKey=journals/corr/1911.08265> |
1812 | 1820 |
|
1813 | 1821 |
|
@@ -1843,6 +1851,7 @@ interesting recent papers: |
1843 | 1851 | - `video` <https://youtube.com/watch?v=XuzIqE2IshY> (Kington) |
1844 | 1852 | - `video` <https://youtube.com/watch?v=_x9bXso3wo4> (Hinzman) |
1845 | 1853 | - `video` <https://youtu.be/V0HNXVSrvhg?t=1h20m45s> + <https://youtu.be/Lz5_xFGt2hA?t=3m11s> (Grinchuk) `in russian` |
| 1854 | + - `video` <https://youtu.be/BGyRM5vCkfw?t=16m30s> (Engalych) `in russian` |
1846 | 1855 | - `video` <https://youtu.be/WM4HC720Cms?t=1h34m49s> (Nikolenko) `in russian` |
1847 | 1856 | - `video` <https://youtu.be/zHjE07NBA_o?t=1h10m24s> (Kozlov) `in russian` |
1848 | 1857 | - `post` <http://depthfirstlearning.com/2018/AlphaGoZero> |
@@ -1873,6 +1882,7 @@ interesting recent papers: |
1873 | 1882 | - `video` <http://youtube.com/watch?v=LX8Knl0g0LE> (Huang) |
1874 | 1883 | - `video` <http://youtu.be/CvL-KV3IBcM?t=31m55s> (Graepel) |
1875 | 1884 | - `video` <http://youtube.com/watch?v=UMm0XaCFTJQ> (Sutton, Szepesvari, Bowling, Hayward, Muller) |
| 1885 | + - `video` <https://youtube.com/watch?v=BGyRM5vCkfw> (Engalych) `in russian` |
1876 | 1886 | - `video` <https://youtu.be/WM4HC720Cms?t=1h18m21s> (Nikolenko) `in russian` |
1877 | 1887 | - `video` <https://youtube.com/watch?v=zHjE07NBA_o> (Kozlov) `in russian` |
1878 | 1888 | - `notes` <https://github.com/Rochester-NRT/RocAlphaGo/wiki> |
@@ -1996,6 +2006,7 @@ interesting recent papers: |
1996 | 2006 | - `video` <https://youtube.com/watch?v=mzjGNo9Tz4g> (Silver) |
1997 | 2007 | - `video` <https://youtu.be/3N9phq_yZP0?t=12m43s> (Hassabis) |
1998 | 2008 | - `video` <https://youtu.be/DXNqYSNvnjA?t=21m24s> (Hassabis) |
| 2009 | + - `video` <https://youtu.be/BGyRM5vCkfw?t=22m2s> (Engalych) `in russian` |
1999 | 2010 | - `video` <https://youtu.be/WM4HC720Cms?t=1h34m49s> (Nikolenko) `in russian` |
2000 | 2011 | - `notes` <https://blog.acolyer.org/2018/01/10/mastering-chess-and-shogi-by-self-play-with-a-general-reinforcement-learning-algorithm/> |
2001 | 2012 | - `code` <https://lczero.org> |
@@ -2466,6 +2477,7 @@ interesting recent papers: |
2466 | 2477 | - `post` <https://jangirrishabh.github.io/2018/03/25/Overcoming-exploration-demos.html> |
2467 | 2478 | - `code` <https://github.com/openai/baselines/tree/master/baselines/her> |
2468 | 2479 | - `paper` ["Universal Value Function Approximators"](https://github.com/brylevkirill/notes/blob/master/Reinforcement%20Learning.md#schaul-horgan-gregor-silver---universal-value-function-approximators) by Schaul et al. `summary` |
| 2480 | + - `paper` ["Hindsight Policy Gradients"](https://arxiv.org/abs/1711.06006) by Rauber et al. |
2469 | 2481 |
|
2470 | 2482 |
|
2471 | 2483 | #### ["Reinforcement Learning with Unsupervised Auxiliary Tasks"](http://arxiv.org/abs/1611.05397) Jaderberg, Mnih, Czarnecki, Schaul, Leibo, Silver, Kavukcuoglu |
@@ -2544,7 +2556,7 @@ interesting recent papers: |
2544 | 2556 | - `video` <https://youtube.com/watch?v=0yI2wJ6F8r0> + <https://youtube.com/watch?v=qeeTok1qDZk> + <https://youtube.com/watch?v=EzQwCmGtEHs> (demo) |
2545 | 2557 | - `video` <https://youtu.be/qSfd27AgcEk?t=29m5s> (Bellemare) |
2546 | 2558 | - `video` <https://youtu.be/WuFMrk3ZbkE?t=1h27m37s> (Bellemare) |
2547 | | - - `video` <https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970> (Whiteson) |
| 2559 | + - `video` <https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning> (Whiteson) |
2548 | 2560 | - `video` <https://youtu.be/qduxl-vKz1E?t=1h16m30s> (Seleznev) `in russian` |
2549 | 2561 | - `video` <https://youtube.com/watch?v=qKyOLNVpknQ> (Pavlov) `in russian` |
2550 | 2562 | - `notes` <http://pemami4911.github.io/paper-summaries/deep-rl/2016/10/08/unifying-count-based-exploration-and-intrinsic-motivation.html> |
@@ -3220,6 +3232,8 @@ interesting recent papers: |
3220 | 3232 |
|
3221 | 3233 | > "application of deep successor reinforcement learning" |
3222 | 3234 |
|
| 3235 | +> "uses supervised learning to predict future values of measurements (possibly rewards) given actions, which sidesteps traditional reinforcement learning algorithms" |
| 3236 | + |
3223 | 3237 | - `video` <https://youtube.com/watch?v=947bSUtuSQ0> + <https://youtube.com/watch?v=947bSUtuSQ0> (demo) |
3224 | 3238 | - `video` <https://facebook.com/iclr.cc/videos/1712224178806641?t=3252> (Dosovitskiy) |
3225 | 3239 | - `video` <https://youtube.com/watch?v=buUF5F8UCH8> (Lamb, Ozair) |
|
0 commit comments