Skip to content

Commit bfed0e4

Browse files
committed
160
1 parent 7e8d380 commit bfed0e4

File tree

4 files changed

+46
-20
lines changed

4 files changed

+46
-20
lines changed

Deep Learning.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2441,6 +2441,7 @@ Yoshua Bengio:
24412441
- `video` <https://youtu.be/_XRBlhzb31U?t=48m35s> (Figurnov) `in russian`
24422442
- `video` <https://youtu.be/LhH6wMvntSM?t=54m56s> (Suleymanov) `in russian`
24432443
- `audio` <https://soundcloud.com/nlp-highlights/36-attention-is-all-you-need-with-ashish-vaswani-and-jakob-uszkoreit> (Vaswani, Uszkoreit)
2444+
- `post` <https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html>
24442445
- `post` <https://jalammar.github.io/illustrated-transformer/>
24452446
- `post` <https://danieltakeshi.github.io/2019/03/30/transformers/>
24462447
- `post` <http://nlp.seas.harvard.edu/2018/04/03/attention.html>

Machine Learning.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -356,7 +356,7 @@
356356
---
357357
### meta-learning
358358

359-
[course](https://youtube.com/playlist?list=PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5) by Chelsea Finn `video`
359+
[course](http://cs330.stanford.edu) by Chelsea Finn ([videos](https://youtube.com/playlist?list=PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5))
360360

361361
[overview](https://facebook.com/icml.imls/videos/400619163874853?t=500) by Chelsea Finn and Sergey Levine `video`
362362
[overview](https://facebook.com/nipsfoundation/videos/1554594181298482?t=277) by Pieter Abbeel `video`

Reinforcement Learning.md

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676

7777
[**"Grandmaster Level in StarCraft II using Multi-agent Reinforcement Learning"**](#grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning-vinyals-et-al) by Vinyals et al. `paper` `summary` *(AlphaStar)*
7878

79-
[overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=318) by Oriol Vinyals `video`
79+
[overview](https://slideslive.com/38922724/grandmaster-level-in-starcraft-ii-using-multiagent-reinforcement-learning) by Oriol Vinyals `video`
8080
[overview](https://youtu.be/3UdH3lPF7nE) by Oriol Vinyals `video`
8181
[overview](https://slideslive.com/38916905/alphastar-mastering-the-game-of-starcraft-ii) by David Silver `video`
8282
[overview](https://youtu.be/mzjGNo9Tz4g?t=10m53s) by David Silver `video`
@@ -100,7 +100,7 @@
100100

101101
["Dota 2 with Large Scale Deep Reinforcement Learning"](https://cdn.openai.com/dota-2.pdf) by Berner et al. `paper` *(OpenAI Five)*
102102

103-
[OpenAI File overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=2175) by Jie Tang and Filip Wolski `video`
103+
[OpenAI Five overview](https://slideslive.com/38922722/contributed-talk-playing-dota-2-with-large-scale-deep-reinforcement-learning) by Jie Tang and Filip Wolski `video`
104104
[OpenAI Five overview](https://youtu.be/w3ues-NayAs?t=2m26s) by Ilya Sutskever `video`
105105
[OpenAI Five overview](https://youtu.be/N8_gVrIPLQM?t=1h3m41s) by David Silver `video`
106106

@@ -227,7 +227,9 @@
227227
[AlphaZero vs Stockfish](https://youtube.com/playlist?list=PLDnx7w_xuguHIxbL7akaYgEvV4spwYkmn) games `video`
228228
[AlphaZero vs Stockfish](https://youtube.com/playlist?list=PL-qLOQ-OEls607FPLAsPZ6De4f1W3ZF-I) games `video`
229229

230-
["Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI"](https://amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184) by Matthew Sadler and Natasha Regan `book` ([talk](https://youtube.com/watch?v=HgZYIDslnAI) `video`, [analysis](https://youtube.com/playlist?list=UUkK8M0dMhAX8JinU-6aD7xA) `video`)
230+
["Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI"](https://amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184) by Matthew Sadler and Natasha Regan `book` ([talk](https://youtube.com/watch?v=HgZYIDslnAI) `video`, [games](https://youtube.com/playlist?list=UUkK8M0dMhAX8JinU-6aD7xA) `video`)
231+
232+
[Leela Chess Zero](https://youtube.com/playlist?list=PLDnx7w_xuguH7UO4bNGo56w94NXAef6YH) games `video` ([overview](https://en.wikipedia.org/wiki/Leela_Chess_Zero))
231233

232234
----
233235
- *Quake III Arena*
@@ -672,6 +674,7 @@
672674
[overview](http://videolectures.net/DLRLsummerschool2018_bowling_multi_agent_RL) by Michael Bowling `video`
673675
[overview](http://youtube.com/watch?v=9qPhrEYIRF4) by Jakob Foerster `video`
674676
[overview](http://youtube.com/watch?v=hGEz4Aumd1U) by Arsenii Ashukha `video`
677+
[overview](http://youtube.com/watch?v=0OSvoYbWs9o) by Sergei Sviridov `video` `in russian`
675678

676679
[overview](http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf) by Marc Lanctot `slides`
677680

@@ -783,7 +786,7 @@
783786
#### exploration and intrinsic motivation - bayesian exploration
784787

785788
[overview](https://youtu.be/sGuiWX07sKw?t=57m28s) by David Silver `video`
786-
[overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970) by Shimon Whiteson `video` *(lack of good methods for real exploration as opposed to simulated exploration)*
789+
[overview](https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning) by Shimon Whiteson `video` *(lack of good methods for real exploration as opposed to simulated exploration)*
787790

788791
----
789792

@@ -952,8 +955,10 @@
952955
> "Maximizing incompetence does not model very well the psychological models of optimal challenge and “flow” proposed by (Csikszentmihalyi, 1991). Flow refers to the state of pleasure related to activities for which difficulty is optimal: neither too easy nor too difficult. As difficulty of a goal can be modeled by the (mean) performance in achieving this goal, a possible manner to model flow would be to introduce two thresholds defining the zone of optimal difficulty. Yet, the use of thresholds can be rather fragile, require hand tuning and possibly complex adaptive mechanism to update these thresholds during the robot’s lifetime. Another approach can be taken, which avoids the use of thresholds. It consists in defining the interestingness of a challenge as the competence progress that is experienced as the robot repeatedly tries to achieve it. So, a challenge for which a robot is bad initially but for which it is rapidly becoming good will be highly rewarding. Thus, a first manner to implement flow motivation would be: r(SM(→ t), gk, tg) = C·(la(gk, tg−θ) − la(gk, tg)) corresponding to the difference between the current performance for task gk and the performance corresponding to the last time gk was tried, at a time denoted tg−θ."
953956

954957
[**"Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes"**](https://github.com/brylevkirill/notes/blob/master/Artificial%20Intelligence.md#driven-by-compression-progress-a-simple-principle-explains-essential-aspects-of-subjective-beauty-novelty-surprise-interestingness-attention-curiosity-creativity-art-science-music-jokes-schmidhuber) by Schmidhuber `paper` `summary` ([**Artificial Curiosity and Creativity**](https://github.com/brylevkirill/notes/blob/master/Artificial%20Intelligence.md#artificial-curiosity-and-creativity) theory by Schmidhuber) ([overview](https://youtu.be/DSYzHPW26Ig?t=2h7m22s) by Alex Graves `video`) *(maximizing compression progress)*
958+
[**"PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem"**](https://github.com/brylevkirill/notes/blob/master/Artificial%20Intelligence.md#powerplay-training-an-increasingly-general-problem-solver-by-continually-searching-for-the-simplest-still-unsolvable-problem-schmidhuber) by Schmidhuber `paper` `summary`
955959
[**"Automated Curriculum Learning for Neural Networks"**](#automated-curriculum-learning-for-neural-networks-graves-bellemare-menick-munos-kavukcuoglu) by Graves et al. `paper` `summary` *(maximizing prediction gain / complexity gain)*
956960
[**"Automatic Goal Generation for Reinforcement Learning Agents"**](#automatic-goal-generation-for-reinforcement-learning-agents-held-geng-florensa-abbeel) by Held et al. `paper` `summary` *(optimally difficult goals)*
961+
["Automatic Curriculum Learning For Deep RL: A Short Survey"](https://arxiv.org/abs/2003.04664) by Portelas et al. `paper`
957962

958963
[**interesting papers**](#interesting-papers---exploration-and-intrinsic-motivation---competence-based-models---maximizing-competence-motivation)
959964
[**interesting recent papers**](https://github.com/brylevkirill/notes/blob/master/interesting%20recent%20papers.md#reinforcement-learning---exploration-and-intrinsic-motivation)
@@ -1176,7 +1181,7 @@
11761181
["The Next Big Step in AI: Planning with a Learned Model"](https://youtube.com/watch?v=6-Uiq8-wKrg) by Richard Sutton `video`
11771182
["The Grand Challenge of Knowledge"](http://www.fields.utoronto.ca/video-archive/2016/10/2267-16158) (41:35) by Richard Sutton `video`
11781183
["Open Questions in Model-based RL"](https://youtube.com/watch?v=OeIVfQz3FUc) by Richard Sutton `video`
1179-
["Toward a General AI-Agent Architecture"](https://slideslive.com/38921889/biological-and-artificial-reinforcement-learning-4?t=980) by Richard Sutton `video` *(SuperDyna)*
1184+
["Toward a General AI-Agent Architecture"](https://slideslive.com/38924024/toward-a-general-aiagent-architecture) by Richard Sutton `video` *(SuperDyna)*
11801185

11811186
["Planning and Models"](https://youtube.com/watch?v=Xrxrd8nl4YI) by Hado van Hasselt `video`
11821187
["Integrating Learning and Planning"](https://youtube.com/watch?v=ItMutbeOHtc) by David Silver `video`
@@ -1342,7 +1347,7 @@
13421347

13431348
[overview](https://youtu.be/5rev-zVx1Ps?t=58m45s) by Marc Toussaint `video`
13441349
[overview](https://youtu.be/sGuiWX07sKw?t=1h9m2s) by David Silver `video`
1345-
[overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970) by Shimon Whiteson `video`
1350+
[overview](https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning) by Shimon Whiteson `video`
13461351

13471352
["Reinforcement Learning: Beyond Markov Decision Processes"](https://youtube.com/watch?v=_dkaynuKUFE) by Alexey Seleznev `video` `in russian`
13481353
["Partially Observable Markov Decision Process in Reinforcement Learning"](https://yadi.sk/i/pMdw-_uI3Gke7Z) by Pavel Shvechikov `video` `in russian`
@@ -1359,7 +1364,7 @@
13591364
[**"Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search"**](#efficient-bayes-adaptive-reinforcement-learning-using-sample-based-search-guez-silver-dayan) by Guez et al. `paper` `summary`
13601365
["Learning in POMDPs with Monte Carlo Tree Search"](http://proceedings.mlr.press/v70/katt17a.html) by Katt et al. `paper`
13611366
["Variational Inference for Data-Efficient Model Learning in POMDPs"](https://arxiv.org/abs/1805.09281) by Tschiatschek et al. `paper`
1362-
["VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning"](https://arxiv.org/abs/1910.08348) by Zintgraf et al. `paper` ([overview](https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970) by Shimon Whiteson `video`)
1367+
["VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning"](https://arxiv.org/abs/1910.08348) by Zintgraf et al. `paper` ([overview](https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning) by Shimon Whiteson `video`)
13631368

13641369
----
13651370

@@ -1377,9 +1382,10 @@
13771382
["Bayesian Policy Search"](https://youtu.be/AggqBRdz6CQ?t=9m53s) by Shakir Mohamed `video`
13781383
["Connections Between Inference and Control"](https://youtu.be/iOYiPhu5GEk?t=2m34s) by Sergey Levine `video` ([write-up](https://arxiv.org/abs/1805.00909))
13791384

1385+
["Reinforcement Learning as Probabilistic Inference"](https://youtube.com/watch?v=j8T8Xt8TM8Q) by Pavel Temirchev `video` `in russian`
1386+
["Reinforcement Learning as Probabilistic Inference"](https://youtube.com/watch?v=pYsXIPEkSxs) by Pavel Temirchev `video` `in russian`
13801387
["Reinforcement Learning through the Lenses of Variational Inference"](https://youtube.com/watch?v=6v3RxQycT0E) by Sergey Bartunov `video`
1381-
["Bayesian Inference for Reinforcement Learning"](https://youtube.com/watch?v=KZd-jkmeIcU) by Sergey Bartunov `video` `in russian`
1382-
([slides](https://drive.google.com/drive/folders/0B2zoFVYw1rN3N0RUNXE1WnNObTQ) `in english`)
1388+
["Bayesian Inference for Reinforcement Learning"](https://youtube.com/watch?v=KZd-jkmeIcU) by Sergey Bartunov `video` `in russian`
13831389

13841390
----
13851391

@@ -1610,7 +1616,8 @@
16101616
["Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents"](https://arxiv.org/abs/1712.06560) by Conti et al. `paper`
16111617
["Playing Atari with Six Neurons"](https://arxiv.org/abs/1806.01363) by Cuccu et al. `paper`
16121618

1613-
["Evolutionary Computation for Reinforcement Learning"](http://cs.ox.ac.uk/publications/publication10159-abstract.html) by Shimon Whiteson `paper`
1619+
["Evolutionary Computation for Reinforcement Learning"](http://cs.ox.ac.uk/publications/publication10159-abstract.html) by Shimon Whiteson `paper`
1620+
["Evolutionary Algorithms for Reinforcement Learning"](https://arxiv.org/abs/1106.0221) by Moriarty, Schultz, Grefenstette `paper`
16141621

16151622

16161623

@@ -1745,7 +1752,7 @@ interesting recent papers:
17451752

17461753
- `post` <https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning>
17471754
- `video` <https://youtube.com/watch?v=6eiErYh_FeY>
1748-
- `video` <https://slideslive.com/38922025/deep-reinforcement-learning-1?t=318) (Vinyals)
1755+
- `video` <https://slideslive.com/38922724/grandmaster-level-in-starcraft-ii-using-multiagent-reinforcement-learning) (Vinyals)
17491756
- `video` <https://youtu.be/3UdH3lPF7nE> (Vinyals)
17501757
- `video` <https://youtu.be/Kedt2or9xlo> (Vinyals)
17511758
- `video` <https://slideslive.com/38916905/alphastar-mastering-the-game-of-starcraft-ii> (Silver)
@@ -1807,7 +1814,8 @@ interesting recent papers:
18071814
- `video` <https://youtube.com/playlist?list=PLnn6VZp3hqNsrsp_Bg-bEfzzhJ3SuEZE9>
18081815
- `video` <https://slideslive.com/38922026/deep-reinforcement-learning-2?t=3855> (Schrittwieser)
18091816
- `video` <https://youtube.com/watch?v=We20YSAJZSE> (Kilcher)
1810-
- `video` <https://slideslive.com/38921974/perception-as-generative-reasoning-structure-causality-probability-3?t=3954> (Rezende)
1817+
- `video` <https://slideslive.com/38923124/nonsupervised-learning-and-decision-making?t=1832> (Rezende)
1818+
- `video` <https://youtu.be/BGyRM5vCkfw?t=26m54s> (Engalych) `in russian`
18111819
- `notes` <https://www.shortscience.org/paper?bibtexKey=journals/corr/1911.08265>
18121820

18131821

@@ -1843,6 +1851,7 @@ interesting recent papers:
18431851
- `video` <https://youtube.com/watch?v=XuzIqE2IshY> (Kington)
18441852
- `video` <https://youtube.com/watch?v=_x9bXso3wo4> (Hinzman)
18451853
- `video` <https://youtu.be/V0HNXVSrvhg?t=1h20m45s> + <https://youtu.be/Lz5_xFGt2hA?t=3m11s> (Grinchuk) `in russian`
1854+
- `video` <https://youtu.be/BGyRM5vCkfw?t=16m30s> (Engalych) `in russian`
18461855
- `video` <https://youtu.be/WM4HC720Cms?t=1h34m49s> (Nikolenko) `in russian`
18471856
- `video` <https://youtu.be/zHjE07NBA_o?t=1h10m24s> (Kozlov) `in russian`
18481857
- `post` <http://depthfirstlearning.com/2018/AlphaGoZero>
@@ -1873,6 +1882,7 @@ interesting recent papers:
18731882
- `video` <http://youtube.com/watch?v=LX8Knl0g0LE> (Huang)
18741883
- `video` <http://youtu.be/CvL-KV3IBcM?t=31m55s> (Graepel)
18751884
- `video` <http://youtube.com/watch?v=UMm0XaCFTJQ> (Sutton, Szepesvari, Bowling, Hayward, Muller)
1885+
- `video` <https://youtube.com/watch?v=BGyRM5vCkfw> (Engalych) `in russian`
18761886
- `video` <https://youtu.be/WM4HC720Cms?t=1h18m21s> (Nikolenko) `in russian`
18771887
- `video` <https://youtube.com/watch?v=zHjE07NBA_o> (Kozlov) `in russian`
18781888
- `notes` <https://github.com/Rochester-NRT/RocAlphaGo/wiki>
@@ -1996,6 +2006,7 @@ interesting recent papers:
19962006
- `video` <https://youtube.com/watch?v=mzjGNo9Tz4g> (Silver)
19972007
- `video` <https://youtu.be/3N9phq_yZP0?t=12m43s> (Hassabis)
19982008
- `video` <https://youtu.be/DXNqYSNvnjA?t=21m24s> (Hassabis)
2009+
- `video` <https://youtu.be/BGyRM5vCkfw?t=22m2s> (Engalych) `in russian`
19992010
- `video` <https://youtu.be/WM4HC720Cms?t=1h34m49s> (Nikolenko) `in russian`
20002011
- `notes` <https://blog.acolyer.org/2018/01/10/mastering-chess-and-shogi-by-self-play-with-a-general-reinforcement-learning-algorithm/>
20012012
- `code` <https://lczero.org>
@@ -2466,6 +2477,7 @@ interesting recent papers:
24662477
- `post` <https://jangirrishabh.github.io/2018/03/25/Overcoming-exploration-demos.html>
24672478
- `code` <https://github.com/openai/baselines/tree/master/baselines/her>
24682479
- `paper` ["Universal Value Function Approximators"](https://github.com/brylevkirill/notes/blob/master/Reinforcement%20Learning.md#schaul-horgan-gregor-silver---universal-value-function-approximators) by Schaul et al. `summary`
2480+
- `paper` ["Hindsight Policy Gradients"](https://arxiv.org/abs/1711.06006) by Rauber et al.
24692481

24702482

24712483
#### ["Reinforcement Learning with Unsupervised Auxiliary Tasks"](http://arxiv.org/abs/1611.05397) Jaderberg, Mnih, Czarnecki, Schaul, Leibo, Silver, Kavukcuoglu
@@ -2544,7 +2556,7 @@ interesting recent papers:
25442556
- `video` <https://youtube.com/watch?v=0yI2wJ6F8r0> + <https://youtube.com/watch?v=qeeTok1qDZk> + <https://youtube.com/watch?v=EzQwCmGtEHs> (demo)
25452557
- `video` <https://youtu.be/qSfd27AgcEk?t=29m5s> (Bellemare)
25462558
- `video` <https://youtu.be/WuFMrk3ZbkE?t=1h27m37s> (Bellemare)
2547-
- `video` <https://slideslive.com/38922025/deep-reinforcement-learning-1?t=3970> (Whiteson)
2559+
- `video` <https://slideslive.com/38922727/bayesadaptive-deep-reinforcement-learning-via-metalearning> (Whiteson)
25482560
- `video` <https://youtu.be/qduxl-vKz1E?t=1h16m30s> (Seleznev) `in russian`
25492561
- `video` <https://youtube.com/watch?v=qKyOLNVpknQ> (Pavlov) `in russian`
25502562
- `notes` <http://pemami4911.github.io/paper-summaries/deep-rl/2016/10/08/unifying-count-based-exploration-and-intrinsic-motivation.html>
@@ -3220,6 +3232,8 @@ interesting recent papers:
32203232

32213233
> "application of deep successor reinforcement learning"
32223234

3235+
> "uses supervised learning to predict future values of measurements (possibly rewards) given actions, which sidesteps traditional reinforcement learning algorithms"
3236+
32233237
- `video` <https://youtube.com/watch?v=947bSUtuSQ0> + <https://youtube.com/watch?v=947bSUtuSQ0> (demo)
32243238
- `video` <https://facebook.com/iclr.cc/videos/1712224178806641?t=3252> (Dosovitskiy)
32253239
- `video` <https://youtube.com/watch?v=buUF5F8UCH8> (Lamb, Ozair)

0 commit comments

Comments
 (0)