LSTM without layers by ahtsan · Pull Request #642 · rlworkgroup/garage

ahtsan · 2019-04-29T06:23:13Z

Add lstm, lstm_model, categorical_lstm_policy_with_model.
Unit tests for the above.
Test TRPO with categorical_lstm_policy_with_model

ryanjulian

overall this is very very good.

please make the test of lstm() a little more comprehensive and less reliant on comparisons to another library. that way we will also detect a breakage if keras breaks. it should also be a useful test if someone decides to replace the keras layer with something else.

ryanjulian · 2019-04-29T17:38:30Z

                    i.obs_var,
                    i.policy_state_info_vars,
-                    name='policy_dist_info')
+                    name='policy_dist_info_entropy')


i don't think this is 'policy_dist_info_entropy' -- dist_info contains the parameters of the distribution (e.g. mean and std for a gaussian, probabilities for a categorial, etc.)

I renamed it because we can't call dist_info_sym with the same name twice (because the underlying model will build twice with the same name, which is not allowed). I think Chang is going to fix it by changing the name_scope to variable_scope. @CatherineSue

Changing it from name_scope to variable_scope won't fix the error. The cause is in Model instead of here.

ryanjulian · 2019-04-29T17:47:23Z

+                 state_include_action=True,
+                 forget_bias=True,
+                 layer_normalization=False):
+        assert isinstance(env_spec.action_space, Discrete), (


this is probably better as a ValueError

ryanjulian · 2019-04-29T17:57:20Z

+from tests.fixtures import TfGraphTestCase
+
+
+class TestLSTM(TfGraphTestCase):


can you test a least a couple sets of known inputs/outputs here?

I know that this is implemented using tf.keras.layers.LSTM, but if we are going to rely on that we should at least also have one smoke test to make sure that values are correct.

most of the work can be done by mocking out the keras layer and ensuring we are calling the correct set of keras APIs

it is also possible to assert some basic truths which should be true for all LSTMs, e.g. is the output length the correct length for the input? etc.

i think overall this testing strategy is okay, but please add some simple checks too, e.g.

output length vs input length

gradient path (or lack thereof) between pairs of inputs/outputs

non-zero hidden states

etc.

perhaps you can take a look as the tests from lasagne for some ideas: https://github.com/Lasagne/Lasagne/blob/master/lasagne/tests/layers/test_recurrent.py

ryanjulian · 2019-04-29T18:07:29Z

It seems you may have forgotten to cover the state_include_action case for LSTM:

https://codecov.io/gh/rlworkgroup/garage/commit/07d0d3155ce9961affb152589e9dc3545f487831#D12-207

CatherineSue · 2019-05-02T19:59:11Z

+        with tf.variable_scope(self._variable_scope):
+            outputs, _, _, _, _, _ = self.model.build(
+                all_input_var,
+                *self.model.networks['default'].inputs[1:4],


I think this line is hard to read.

yeah this is pretty crazy. what even is inputs[1:4]?

It is crazy. I will fix it.

codecov · 2019-05-03T00:17:57Z

Codecov Report

Merging #642 into master will increase coverage by 0.83%.
The diff coverage is 83.17%.

@@            Coverage Diff             @@
##           master     #642      +/-   ##
==========================================
+ Coverage   61.49%   62.32%   +0.83%     
==========================================
  Files         160      163       +3     
  Lines        9247     9444     +197     
  Branches     1249     1262      +13     
==========================================
+ Hits         5686     5886     +200     
+ Misses       3254     3240      -14     
- Partials      307      318      +11

Impacted Files	Coverage Δ
...e/tf/policies/categorical_mlp_policy_with_model.py	`96% <100%> (ø)`	⬆️
garage/tf/models/__init__.py	`100% <100%> (ø)`	⬆️
garage/experiment/local_tf_runner.py	`80.18% <100%> (ø)`	⬆️
garage/tf/models/base.py	`93.45% <100%> (+0.74%)`	⬆️
...tf/policies/deterministic_mlp_policy_with_model.py	`93.02% <100%> (ø)`	⬆️
garage/tf/models/lstm_model.py	`100% <100%> (ø)`
...rage/tf/policies/gaussian_mlp_policy_with_model.py	`100% <100%> (ø)`	⬆️
garage/tf/core/lstm.py	`100% <100%> (ø)`
.../tf/policies/categorical_conv_policy_with_model.py	`96% <100%> (ø)`	⬆️
garage/tf/algos/npo.py	`93.42% <100%> (+0.03%)`	⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2001a3b...a8beb38. Read the comment docs.

ryanjulian · 2019-05-03T16:14:41Z

            return loss, pol_mean_kl

-    def _build_entropy_term(self, i):
+    def _build_entropy_term(self, i, policy_dist_info):


please don't make these functions dependent on anything but i

it is very easy to make a loss function which is a tangle of calls to other functions. we instead make sure all components of the loss function are only dependent on i (the input spec, basically) which makes it easy to follow.

this will be much easier once the loss function is a model.

I see. I modified that since _build_entropy_term actually reuses the same policy_dist_info symbolic tensor with _build_policy_loss. I will come up with another way.

you can just not reuse it...

We have to reuse it if we don't do dist info sym with a different name. Because each network name is unique. That's why I add "_entropy" after that.

We can just append a "_2" to the name in the second call of dist info sym.

ryanjulian · 2019-05-03T16:16:15Z

            # We store the variable_scope to reenter later when we reuse it
-            with tf.variable_scope(
-                    self._name, reuse=False) as self._variable_scope:
+            with tf.variable_scope(self._name) as self._variable_scope:


please explicitly assign the new a scope into a member variable -- do not capture it as a coincidence of a context manager

Forgot to modify it here. Will fix.

ryanjulian

see new comments. i think there are a few maintainability concerns left in this PR

ryanjulian · 2019-05-03T16:21:24Z

-        if spec:
-            c = namedtuple(network_name,
-                           [*spec, 'input', 'output', 'inputs', 'outputs'])
+        in_spec = self.network_input_spec()


this logic about spec is getting kind of complicated and i'm not sure i understand what's going on.

can you explain what the purpose of this code is? perhaps it would be good to break it into a well-documented helper function.

can you also help me understand the purpose of the input and output specs?

Basically the specs tells us what the inputs and outputs are. There are 4 cases:

Case 1: Single input/output model

Then we don't need any specs, since we don't need extra attribute for the networks. We only need input,inputs,output,outputs which are the default ones in network.

Case 2: Single input/Multiple output model

We set network_output_spec() so that we can have assign "names" to the outputs, e.g.

def network_output_spec(self): return ['state', 'action']

Then we can do model.networks['default'].state and model.networks['default'].action
In this case, we don't need network_input_spec() since there is only one input. What we need is only input, which are already included by default.

Case 3: Multiple input/Single output model

Same as above, but instead of having network_output_spec, we have network_input_spec this time.

Case 4: Multiple input/output model

We have both network_output_spec and network_input_spec. Then we can have sth like
model.networks['default'].state and model.networks['default'].action as the outputs and model.networks['default'].state_input and model.networks['default'].action_input as the inputs, to make it more readable.

Without the spec, we can only do model.networks['default'].inputs[1:4].

this seems pretty complex -- is there a way we can always have an input/output spec, even for multi -input and -output models?

i think always having a spec would simplify the logic significantly.

Do you mean we make specs not optional?
Right now by default the specs returns empty list, which means networks will only have input,inputs,output,outputs.

i mean that you internally maintain the specs data structure regardless of whether the user has chosen to do multi input/output or not. then the single-input and single-output cases don't need to be handled with so much special logic.

the outer interface can stay the same.

ryanjulian · 2019-05-03T16:28:36Z

+    x_ifco = np.matmul(input_val, w_x_ifco)
+    h_ifco = np.matmul(step_hidden, w_h_ifco)
+
+    x_i, x_f, x_c, x_o = np.split(x_ifco, 4, axis=1)  # noqa: E501, pylint: disable=unbalanced-tuple-unpacking


what is the justifcation for disabling E501 and the pylint warning?

Since I can't find out what's the reason pylint is complaining about that error, so I disable the pylint checking and since it is too long, I disabled E501 as well. I will dig into it and see what pylint is talking about.

perhaps is one of your return values itself a tuple or list?

no... They are all numpy.ndarray.

Could be because interpreter cannot ensure the right hand side can be split into 4. Is there any other way to pass pylint for this?

generally pylint rules try to encourage being very explicit and writing code so that it can be statically analyzed for errors (e.g. using pylint).

in this case i think pylint has good point, because the np.split operation is totally static (all inputs are known at compile time), but it has been written as a dynamic operation unnecessarily.

the rationale for this is that code is read much more often than it is written, so it's usually better to make it longer and easier to read rather than shorted and easier to write.

maybe it would be clearer if you just wrote out. this way a reader doesn't have to look up documentation for np.split to find out how your array is unpacked.

x_i, x_f, x_c, x_o = x_ifco[:, :4], x_ifco[:, 4:8], x_ifco[:, 8:12], x_ifco[:, 12:16] # or x_i = x_ifco[:, :4] x_f = x_ifco[:, 4:8] x_c = x_ifco[:, 8:12] x_o = x_ifco[:, 12:16]

ahtsan requested review from CatherineSue and ryanjulian April 29, 2019 06:23

ahtsan requested a review from a team as a code owner April 29, 2019 06:23

ryanjulian reviewed Apr 29, 2019

View reviewed changes

CatherineSue reviewed May 2, 2019

View reviewed changes

ahtsan force-pushed the categorical_lstm_policy_with_model branch from 07d0d31 to d7d6d50 Compare May 2, 2019 23:48

ahtsan force-pushed the categorical_lstm_policy_with_model branch from f034d66 to 1aa4980 Compare May 3, 2019 05:39

ryanjulian reviewed May 3, 2019

View reviewed changes

Comment thread examples/tf/trpo_cartpole_recurrent_with_model.py

ryanjulian reviewed May 3, 2019

View reviewed changes

Comment thread garage/tf/models/base.py

ryanjulian reviewed May 3, 2019

View reviewed changes

ahtsan force-pushed the categorical_lstm_policy_with_model branch 2 times, most recently from ea988ff to 3d37267 Compare May 9, 2019 22:59

ahtsan added 6 commits May 9, 2019 20:38

LSTM without layers

c9b8ac0

Fix comments

9ebabef

Fix pickling for policies

992df3e

Use variable.load

d21552b

Add more test for lstm

5f2c87d

Cache VariableScope

a8beb38

ahtsan force-pushed the categorical_lstm_policy_with_model branch from 3d37267 to a8beb38 Compare May 10, 2019 03:41

ryanjulian merged commit cca8e9b into master May 10, 2019

ryanjulian deleted the categorical_lstm_policy_with_model branch May 10, 2019 21:30

ahtsan mentioned this pull request Jun 21, 2019

Refactor CategoricalLSTMPolicy to use garage.tf.Model #524

Closed

1 task

		from tests.fixtures import TfGraphTestCase


		class TestLSTM(TfGraphTestCase):

Conversation

ahtsan commented Apr 29, 2019

Uh oh!

ryanjulian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanjulian commented Apr 29, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanjulian left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahtsan May 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Case 1: Single input/output model

Case 2: Single input/Multiple output model

Case 3: Multiple input/Single output model

Case 4: Multiple input/output model

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahtsan May 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 3, 2019 •

edited

Loading

ryanjulian left a comment •

edited

Loading

ahtsan May 3, 2019 •

edited

Loading

ahtsan May 3, 2019 •

edited

Loading