Skip to content

LSTM without layers#642

Merged
ryanjulian merged 6 commits intomasterfrom
categorical_lstm_policy_with_model
May 10, 2019
Merged

LSTM without layers#642
ryanjulian merged 6 commits intomasterfrom
categorical_lstm_policy_with_model

Conversation

@ahtsan
Copy link
Copy Markdown
Contributor

@ahtsan ahtsan commented Apr 29, 2019

  • Add lstm, lstm_model, categorical_lstm_policy_with_model.
  • Unit tests for the above.
  • Test TRPO with categorical_lstm_policy_with_model

@ahtsan ahtsan requested a review from a team as a code owner April 29, 2019 06:23
Copy link
Copy Markdown
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this is very very good.

please make the test of lstm() a little more comprehensive and less reliant on comparisons to another library. that way we will also detect a breakage if keras breaks. it should also be a useful test if someone decides to replace the keras layer with something else.

Comment thread garage/tf/core/layers.py
Comment thread garage/tf/algos/npo.py Outdated
i.obs_var,
i.policy_state_info_vars,
name='policy_dist_info')
name='policy_dist_info_entropy')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is 'policy_dist_info_entropy' -- dist_info contains the parameters of the distribution (e.g. mean and std for a gaussian, probabilities for a categorial, etc.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it because we can't call dist_info_sym with the same name twice (because the underlying model will build twice with the same name, which is not allowed). I think Chang is going to fix it by changing the name_scope to variable_scope. @CatherineSue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it from name_scope to variable_scope won't fix the error. The cause is in Model instead of here.

Comment thread garage/tf/core/lstm.py
Comment thread garage/tf/core/lstm.py
Comment thread garage/tf/policies/base2.py
state_include_action=True,
forget_bias=True,
layer_normalization=False):
assert isinstance(env_spec.action_space, Discrete), (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably better as a ValueError

from tests.fixtures import TfGraphTestCase


class TestLSTM(TfGraphTestCase):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test a least a couple sets of known inputs/outputs here?

I know that this is implemented using tf.keras.layers.LSTM, but if we are going to rely on that we should at least also have one smoke test to make sure that values are correct.

most of the work can be done by mocking out the keras layer and ensuring we are calling the correct set of keras APIs

it is also possible to assert some basic truths which should be true for all LSTMs, e.g. is the output length the correct length for the input? etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think overall this testing strategy is okay, but please add some simple checks too, e.g.

  • output length vs input length
  • gradient path (or lack thereof) between pairs of inputs/outputs
  • non-zero hidden states
  • etc.

perhaps you can take a look as the tests from lasagne for some ideas: https://github.com/Lasagne/Lasagne/blob/master/lasagne/tests/layers/test_recurrent.py

@ryanjulian
Copy link
Copy Markdown
Member

It seems you may have forgotten to cover the state_include_action case for LSTM:

https://codecov.io/gh/rlworkgroup/garage/commit/07d0d3155ce9961affb152589e9dc3545f487831#D12-207

with tf.variable_scope(self._variable_scope):
outputs, _, _, _, _, _ = self.model.build(
all_input_var,
*self.model.networks['default'].inputs[1:4],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is hard to read.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is pretty crazy. what even is inputs[1:4]?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is crazy. I will fix it.

@ahtsan ahtsan force-pushed the categorical_lstm_policy_with_model branch from 07d0d31 to d7d6d50 Compare May 2, 2019 23:48
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2019

Codecov Report

Merging #642 into master will increase coverage by 0.83%.
The diff coverage is 83.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #642      +/-   ##
==========================================
+ Coverage   61.49%   62.32%   +0.83%     
==========================================
  Files         160      163       +3     
  Lines        9247     9444     +197     
  Branches     1249     1262      +13     
==========================================
+ Hits         5686     5886     +200     
+ Misses       3254     3240      -14     
- Partials      307      318      +11
Impacted Files Coverage Δ
...e/tf/policies/categorical_mlp_policy_with_model.py 96% <100%> (ø) ⬆️
garage/tf/models/__init__.py 100% <100%> (ø) ⬆️
garage/experiment/local_tf_runner.py 80.18% <100%> (ø) ⬆️
garage/tf/models/base.py 93.45% <100%> (+0.74%) ⬆️
...tf/policies/deterministic_mlp_policy_with_model.py 93.02% <100%> (ø) ⬆️
garage/tf/models/lstm_model.py 100% <100%> (ø)
...rage/tf/policies/gaussian_mlp_policy_with_model.py 100% <100%> (ø) ⬆️
garage/tf/core/lstm.py 100% <100%> (ø)
.../tf/policies/categorical_conv_policy_with_model.py 96% <100%> (ø) ⬆️
garage/tf/algos/npo.py 93.42% <100%> (+0.03%) ⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2001a3b...a8beb38. Read the comment docs.

@ahtsan ahtsan force-pushed the categorical_lstm_policy_with_model branch from f034d66 to 1aa4980 Compare May 3, 2019 05:39
Comment thread examples/tf/trpo_cartpole_recurrent_with_model.py
Comment thread garage/tf/algos/npo.py Outdated
return loss, pol_mean_kl

def _build_entropy_term(self, i):
def _build_entropy_term(self, i, policy_dist_info):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't make these functions dependent on anything but i

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very easy to make a loss function which is a tangle of calls to other functions. we instead make sure all components of the loss function are only dependent on i (the input spec, basically) which makes it easy to follow.

this will be much easier once the loss function is a model.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I modified that since _build_entropy_term actually reuses the same policy_dist_info symbolic tensor with _build_policy_loss. I will come up with another way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just not reuse it...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to reuse it if we don't do dist info sym with a different name. Because each network name is unique. That's why I add "_entropy" after that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just append a "_2" to the name in the second call of dist info sym.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment thread garage/tf/models/base.py Outdated
# We store the variable_scope to reenter later when we reuse it
with tf.variable_scope(
self._name, reuse=False) as self._variable_scope:
with tf.variable_scope(self._name) as self._variable_scope:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explicitly assign the new a scope into a member variable -- do not capture it as a coincidence of a context manager

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to modify it here. Will fix.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

Comment thread garage/tf/models/base.py
Copy link
Copy Markdown
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see new comments. i think there are a few maintainability concerns left in this PR

Comment thread garage/tf/models/base.py Outdated
if spec:
c = namedtuple(network_name,
[*spec, 'input', 'output', 'inputs', 'outputs'])
in_spec = self.network_input_spec()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic about spec is getting kind of complicated and i'm not sure i understand what's going on.

can you explain what the purpose of this code is? perhaps it would be good to break it into a well-documented helper function.

can you also help me understand the purpose of the input and output specs?

Copy link
Copy Markdown
Contributor Author

@ahtsan ahtsan May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the specs tells us what the inputs and outputs are. There are 4 cases:

Case 1: Single input/output model

Then we don't need any specs, since we don't need extra attribute for the networks. We only need input,inputs,output,outputs which are the default ones in network.

Case 2: Single input/Multiple output model

We set network_output_spec() so that we can have assign "names" to the outputs, e.g.

def network_output_spec(self):
    return ['state', 'action']

Then we can do model.networks['default'].state and model.networks['default'].action
In this case, we don't need network_input_spec() since there is only one input. What we need is only input, which are already included by default.

Case 3: Multiple input/Single output model

Same as above, but instead of having network_output_spec, we have network_input_spec this time.

Case 4: Multiple input/output model

We have both network_output_spec and network_input_spec. Then we can have sth like
model.networks['default'].state and model.networks['default'].action as the outputs and model.networks['default'].state_input and model.networks['default'].action_input as the inputs, to make it more readable.

Without the spec, we can only do model.networks['default'].inputs[1:4].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems pretty complex -- is there a way we can always have an input/output spec, even for multi -input and -output models?

i think always having a spec would simplify the logic significantly.

Copy link
Copy Markdown
Contributor Author

@ahtsan ahtsan May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we make specs not optional?
Right now by default the specs returns empty list, which means networks will only have input,inputs,output,outputs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean that you internally maintain the specs data structure regardless of whether the user has chosen to do multi input/output or not. then the single-input and single-output cases don't need to be handled with so much special logic.

the outer interface can stay the same.

Comment thread tests/helpers.py Outdated
x_ifco = np.matmul(input_val, w_x_ifco)
h_ifco = np.matmul(step_hidden, w_h_ifco)

x_i, x_f, x_c, x_o = np.split(x_ifco, 4, axis=1) # noqa: E501, pylint: disable=unbalanced-tuple-unpacking
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the justifcation for disabling E501 and the pylint warning?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I can't find out what's the reason pylint is complaining about that error, so I disable the pylint checking and since it is too long, I disabled E501 as well. I will dig into it and see what pylint is talking about.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps is one of your return values itself a tuple or list?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no... They are all numpy.ndarray.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be because interpreter cannot ensure the right hand side can be split into 4. Is there any other way to pass pylint for this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally pylint rules try to encourage being very explicit and writing code so that it can be statically analyzed for errors (e.g. using pylint).

in this case i think pylint has good point, because the np.split operation is totally static (all inputs are known at compile time), but it has been written as a dynamic operation unnecessarily.

the rationale for this is that code is read much more often than it is written, so it's usually better to make it longer and easier to read rather than shorted and easier to write.

maybe it would be clearer if you just wrote out. this way a reader doesn't have to look up documentation for np.split to find out how your array is unpacked.

x_i, x_f, x_c, x_o = x_ifco[:, :4], x_ifco[:, 4:8], x_ifco[:, 8:12], x_ifco[:, 12:16]
# or
x_i = x_ifco[:, :4]
x_f = x_ifco[:, 4:8]
x_c = x_ifco[:, 8:12]
x_o = x_ifco[:, 12:16]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@ahtsan ahtsan force-pushed the categorical_lstm_policy_with_model branch 2 times, most recently from ea988ff to 3d37267 Compare May 9, 2019 22:59
@ahtsan ahtsan force-pushed the categorical_lstm_policy_with_model branch from 3d37267 to a8beb38 Compare May 10, 2019 03:41
@ryanjulian ryanjulian merged commit cca8e9b into master May 10, 2019
@ryanjulian ryanjulian deleted the categorical_lstm_policy_with_model branch May 10, 2019 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants