You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: HandsOnMachineLearningWithScikitLearnAndTensorFlow/.ipynb_checkpoints/homl_ch18_Reinforcement-learning-checkpoint.ipynb
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
924
920
925
-
Stopping early at episode 477
926
921
927
922
928
923
@@ -1135,6 +1130,245 @@ model = keras.Model(inputs=[input_states], outputs=[Q_values])
1135
1130
1136
1131
## The TF-Agents library
1137
1132
1133
+
The TF-Agents library is a Reinforcement Learning library based on TF.
1134
+
It provides many environments, including wrapping around OpenAI Gym, physics engines, and models.
1135
+
We will use it to train a DQN to play the Atari game *Breakout*.
1136
+
1137
+
### TG-Agents environment
1138
+
1139
+
We can create Breakout environment which is a wrapper around an OpenAI Gym environment.
1140
+
1141
+
1142
+
```python
1143
+
from tf_agents.environments import suite_gym
1144
+
1145
+
breakout_env = suite_gym.load('Breakout-v4')
1146
+
breakout_env
1147
+
```
1148
+
1149
+
1150
+
1151
+
1152
+
<tf_agents.environments.wrappers.TimeLimit at 0x146ff3790>
1153
+
1154
+
1155
+
1156
+
There are some differences between the APIs of OpenAI Gym and TF-Agents.
1157
+
For instance, calling the `reset()` method of an environment does not return just the observations, but a `TimeStep` object with a bunch of information.
TF-Agents includes *environment wrappers*: wrappers for environments that are automatically involved in very step of the environment and add some extra functionality.
1332
+
Here are some that seem quite useful:
1333
+
1334
+
*`ActionClipWrapper`: Clips the actions to the action specification.
1335
+
*`ActionDiscretizeWrapper`: If an environment has actions on a continuous scale, this can turn them into a specified number of discrete steps.
1336
+
*`ActionRepeat`: Repeats each action for multiple steps, accumulating the rewards - this can be useful to speed up the training in some environments.
1337
+
*`RunStats`: Records environment statistics.
1338
+
*`TimeLimit`: Interrupts the environment if it runs for longer than a maximum number of steps.
1339
+
*`VideoWrapper`: Records a video of the environment.
1340
+
1341
+
The wrappers for Atari environments are fairly standardized - greyscale and downsampling the observations, max pooling of the last two frames of the game using a 1x1 filter, frame skipping (the default is to skip every 4 frames), end-of-life loss (whether or not to end the game after the player loses a life).
1342
+
1343
+
We will not use the frame skipping in this case, but will apply a wrapper that merges 4 frames into one (it helps the agent learn about the direction the ball is moving in).
1344
+
1345
+
1346
+
```python
1347
+
from tf_agents.environments import suite_atari
1348
+
from tf_agents.environments.atari_preprocessing import AtariPreprocessing
1349
+
from tf_agents.environments.atari_wrappers import FrameStack4
0 commit comments