Conversation
| (1, 3, 1, 0.5, 0.5), # yapf: disable | ||
| (3, 1, 1, 0.5, 0.5), # yapf: disable | ||
| (3, 3, 1, 0.5, 0.5), # yapf: disable | ||
| (3, 3, 3, 0.5, 0.5)) # yapf: disable |
There was a problem hiding this comment.
i think you should be able to just put one yapf_disable right after the @params statement and it should ignore the whole block.
if it doesn't, you can use this form
# yapf: disable
stuff()
more_stuff()
# yapf: enable| obs_inputs = np.full((self.batch_size, time_step, input_dim), 1.) | ||
| obs_input = np.full((self.batch_size, input_dim), 1.) | ||
|
|
||
| _input_var = tf.placeholder( |
There was a problem hiding this comment.
Not sure. I will remove the _
Codecov Report
@@ Coverage Diff @@
## master #668 +/- ##
==========================================
+ Coverage 62.73% 63.26% +0.52%
==========================================
Files 164 167 +3
Lines 9572 9715 +143
Branches 1247 1256 +9
==========================================
+ Hits 6005 6146 +141
+ Misses 3263 3254 -9
- Partials 304 315 +11
Continue to review full report at Codecov.
|
| input_var = tf.placeholder( | ||
| tf.float32, shape=(None, None, input_dim), name='input') | ||
| step_input_var = tf.placeholder( | ||
| tf.float32, shape=(None, input_dim), name='input') |
There was a problem hiding this comment.
Why do the two placeholders have the same name?
| name (str): Name of the variable scope. | ||
| gru_cell (tf.keras.layers.Layer): GRU cell used to generate | ||
| outputs. | ||
| all_input_var (tf.Tensor): Place holder for entire time-seried inputs. |
| hidden state is trainable. | ||
|
|
||
| Return: | ||
| outputs (tf.Tensor): Entire time-seried outputs. |
There was a problem hiding this comment.
do you mean time-series? please ignore if time-seried is a word.
| assert full_output.shape == (self.batch_size, time_step, output_dim) | ||
|
|
||
| # yapf: disable | ||
| @params( |
There was a problem hiding this comment.
Could you add a test of variable length inputs to GRU and LSTM? It seems we don't have any tests using off-policy algorithms with recurrent policies. And since DQN doesn't set a max_path_length, I assume the episodes would have variable lengths. For recurrent policies, we should pad them. Do we pad it somewhere? Please point it to me if we do.
There was a problem hiding this comment.
If we pad the inputs, then we don't have to worry about variable length inputs?
There was a problem hiding this comment.
Sorry just saw this.
I think I might have used an incorrect example. It is uncommon to use LSTMNetwork in DQN. Besides, during training, the shape of samples (from replay buffer) would be (sample_size, 1, obs_dim). During sampling, the shape would be (1, time_steps, obs_dim).
If we pad them, we just need to indicate that <pad> is a condition to stop the traverse. tf.while_loop or tf.scan(what we are using) should have an argument to pass the condition (please check, I am not familiar with tf.scan).
There was a problem hiding this comment.
I think this problem seems to beyond the pr's scope. If you think we should address this later or we don't have the situation to deal with variable-length inputs, feel free to ignore this issue.
Added GRU, GRUModel and CategoricalGRUPolicyWithModel.
It aims to replace the existing GRUNetwork and GRULayer,
which are based on garage.tf.core.layers.
Added test for TRPO with CategoricalGRUPolicyWithModel.
Apart from testing functionality of CategoricalGRUPolicyWithModel
in test_categorical_gru_policy_with_model.py, transitions from the
old model (CategoricalGRUPolicy) to the new model
(CategoricalGRUPolicyWithModel) are also tested in
test_categorical_gru_policy_with_model_transit.py, to make sure
they have the same API.
Existing GRU implementation in GRULayer is not exactly the same as
TensorFlow implementation (from original paper), and is modified in
this PR.