CategoricalConvPolicy with model by ahtsan · Pull Request #618 · rlworkgroup/garage

ahtsan · 2019-04-11T22:02:22Z

Time to work on CNN. This PR does the following:

fix some small issue in cnn and removed the dense layer from it. Instead of having a "CNNnMLP", it's more reasonable to just do CNN -> MLP.
Introduce the notion of self.add_model() and self.build_models() in policy. This is needed for stacking multiple models. In the future, when we eventually also make policy is-a-model, we will also put this notion into the rest of the models.
and of course, categorical_conv_policy with model, and tests.

codecov · 2019-04-11T22:28:25Z

Codecov Report

Merging #618 into master will increase coverage by 0.55%.
The diff coverage is 90.47%.

@@            Coverage Diff             @@
##           master     #618      +/-   ##
==========================================
+ Coverage   60.63%   61.18%   +0.55%     
==========================================
  Files         156      159       +3     
  Lines        9069     9159      +90     
  Branches     1241     1242       +1     
==========================================
+ Hits         5499     5604     +105     
+ Misses       3260     3238      -22     
- Partials      310      317       +7

Impacted Files	Coverage Δ
...e/tf/policies/categorical_mlp_policy_with_model.py	`95.91% <ø> (ø)`	⬆️
garage/tf/core/parameter.py	`100% <ø> (ø)`	⬆️
...tf/regressors/gaussian_mlp_regressor_with_model.py	`100% <ø> (ø)`	⬆️
...tf/policies/deterministic_mlp_policy_with_model.py	`92.85% <ø> (ø)`	⬆️
...rage/tf/policies/gaussian_mlp_policy_with_model.py	`100% <ø> (ø)`	⬆️
garage/tf/policies/discrete_qf_derived_policy.py	`96% <ø> (ø)`	⬆️
garage/tf/core/mlp.py	`100% <ø> (ø)`	⬆️
garage/tf/q_functions/discrete_mlp_q_function.py	`42.1% <0%> (ø)`	⬆️
garage/tf/policies/categorical_mlp_policy.py	`96% <100%> (ø)`	⬆️
garage/tf/models/__init__.py	`100% <100%> (ø)`	⬆️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b8cbc6...a444a91. Read the comment docs.

CatherineSue · 2019-04-11T22:27:51Z

+        out = input_var
+        for model in self._models[:-1]:
+            out = model.build(out, name=name)
+        self.model = self._models[-1]


could you remind me why the last model is self.model?

This is actually a temp-fix from my previous implementation. We don't need self.model anymore, as they now become models. Nice catch.

CatherineSue · 2019-04-11T22:34:24Z

    """
-    strides = [1, stride, stride, 1]
-
+    if padding not in ['SAME', 'VALID']:


i think TensorFlow would also throw a ValueError. Any reason you want to raise the error here?

I was not aware of that. Then I think it's fine to let TensorFlow handle it.

ryanjulian · 2019-04-12T00:37:42Z

+    CNN Model.
+
+    Args:
+        filter_dims: Dimension of the filters.


Please add types to these parameters

e.g. filter_dims (tuple[int]): ...

ryanjulian

This LGTM mostly. Other than basic comments, please resolve:

Whether the inputs/outputs API should be on Policy or Model
The question of the add_model API for policy -- can't this just be a simple derived model class instead?

ryanjulian · 2019-04-12T00:44:19Z

+
+    def build_models(self, input_var, name=None):
+        out = input_var
+        for model in self._models:


what if my models are not sequential?

Perhaps instead you can define a model class Sequential

class Sequential(Model): def __init__(self, *models): self._models = models def _build(self, input_var, name=None): out = input_var for model in self._models: out = model.build(out, name=name) return out

That's a good idea.

ryanjulian · 2019-04-12T00:44:38Z

+        return self._models[0].networks['default'].input
+
+    @property
+    def outputs(self):


maybe this should just an an API on Model instead?

Yes. We should add this into Model. Something like

class Model(...): ... @property def input(self): return self.networks['default'].input @property def output(self): return self.networks['default'].output

and for the Sequential model, we can override as

class Sequential(Model): ... @property def input(self): return self._models[0].networks['default'].input @property def output(self): return self._models[-1].networks['default'].output

ryanjulian · 2019-04-12T00:45:39Z

+    It only works with akro.tf.Discrete action space.
+
+    Args:
+        env_spec: Environment specification.


please include types in docstrings

ryanjulian · 2019-04-12T00:46:49Z

+    @overrides
+    def get_action(self, observation):
+        """Return a single action."""
+        flat_obs = self.observation_space.flatten(observation)


does this flatten the 2D image, or only the batch?

currently it flatten the 2D image, since we are doing self.obs_dim = env_spec.observation_space.flat_dim in the policy.

I actually think this is a mistake, it should not flatten the observation here. For example, in pixel environment we want to pass the original image input with shape (w, h, c) to the policy. Therefore, we should do self.obs_dim = env_spec.observation_space.shape instead.

This is missed because the CNNModel was mocked out.

cam you fix it?
i don't think the image should be flattened. that seems wrong.

ryanjulian · 2019-04-12T00:50:38Z

                    of intermediate dense layer(s).
        hidden_b_init: Initializer function for the bias
                    of intermediate dense layer(s).
-        output_nonlinearity: Activation function for


please take a moment to add types to this docstring

ryanjulian · 2019-04-12T00:51:42Z

-        pool_stride: The stride of the pooling layer(s).
+        pool_shapes: Dimension of the pooling layer(s).
+        pool_strides: The strides of the pooling layer(s).
        padding: The type of padding algorithm to use, from "SAME", "VALID".


single quotes

ryanjulian · 2019-04-12T00:52:03Z

+            num_filters=self._num_filters,
+            strides=self._strides,
+            padding=self._padding,
+            name="cnn")


single quotes

ryanjulian · 2019-04-12T00:52:31Z

 __all__ = [
    "Policy",
    "StochasticPolicy",
+    "CategoricalConvPolicy",


please take a moment to replace all these with single quotes (when you visit a file)

ryanjulian · 2019-04-12T00:58:13Z

+        hidden_w_init=tf.glorot_uniform_initializer(),
+        hidden_b_init=tf.zeros_initializer()):
    """
    CNN model. Based on 'NHWC' data format: [batch, height, width, channel].


i don't think it's a model

ryanjulian

LGTM. please submit once docstrings are updated. see my suggestion about how to name inner models in a Sequential

ryanjulian · 2019-04-16T21:25:26Z

+    Sequential Model.
+
+    Args:
+        name: Variable scope of the Sequential model.


types, please

ryanjulian · 2019-04-22T17:07:58Z

+    CNN. Based on 'NHWC' data format: [batch, height, width, channel].

    Args:
        input_var: Input tf.Tensor to the CNN.


ryanjulian · 2019-04-22T17:08:20Z

    CNN model. Based on 'NHWC' data format: [batch, height, width, channel].

    Args:
        input_var: Input tf.Tensor to the CNN.


still missing types

ryanjulian · 2019-04-22T17:08:35Z

+        pool_strides(tuple[int]): The strides of the pooling layer(s). For
+            example, (2, 2) means that all the pooling layers have
+            strides (2, 2).
+        padding: The type of padding algorithm to use,


please provide a type

ryanjulian · 2019-04-22T17:09:05Z

            inputs: Tensor input(s), recommended to be position arguments, e.g.
              def build(self, state_input=None, action_input=None, name=None).
              It would be usually same as the inputs in build().
+            name: Variable scope of the inner model, if exist.


please update the docstrings with types

ryanjulian · 2019-04-22T17:09:39Z

+    def inputs(self):
+        return self.networks['default'].inputs
+
+    @property


you should document these new properties with docstrings

ryanjulian · 2019-04-22T17:11:55Z

+        strides(tuple[int]): The stride of the sliding window. For example,
+            (1, 2) means there are two convolutional layers. The stride of the
+            filter for first layer is 1 and that of the second layer is 2.
+        name: Variable scope of the cnn model.


please provide a type for every parameter

ryanjulian · 2019-04-22T17:12:50Z

        return ['sample', 'mean', 'log_std', 'std_param', 'dist']

-    def _build(self, state_input):
+    def _build(self, state_input, name=None):


please take a moment to update the docstrings here with types.

ryanjulian · 2019-04-22T17:13:11Z

        self._layer_normalization = layer_normalization

-    def _build(self, state_input):
+    def _build(self, state_input, name=None):


please take a moment to update the docstring here with types

ryanjulian · 2019-04-22T17:13:48Z

        self._name = name
        self._env_spec = env_spec
        self._variable_scope = tf.VariableScope(reuse=False, name=name)
+        self._models = []


please take a moment to make these docstrings complete

ryanjulian · 2019-04-22T17:14:36Z

+
+    @overrides
+    def dist_info_sym(self, obs_var, state_info_vars=None, name=None):
+        """Symbolic graph of the distribution."""


please provide full docstrings for all methods (unless the parent class provides a docstring which is equivalent)

ryanjulian · 2019-04-22T17:14:55Z

            hidden_nonlinearity=hidden_nonlinearity,
            output_nonlinearity=output_nonlinearity,
-            layer_normalization=layer_normalization)
+            layer_normalization=layer_normalization,


please take a moment to add types to these docstrings

ryanjulian · 2019-04-22T17:15:14Z

        self.model = MLPModel(
            output_dim=action_dim,
-            name=name,
+            name='MLPModel',


please take a moment to add types to these docstrings

ryanjulian · 2019-04-22T17:15:33Z

            std_output_nonlinearity=std_output_nonlinearity,
            std_parameterization=std_parameterization,
-            layer_normalization=layer_normalization)
+            layer_normalization=layer_normalization,


please take a moment to add types to these docstrings

ryanjulian · 2019-04-22T17:15:48Z

        ]

-    def _build(self, state_input):
+    def _build(self, state_input, name=None):


please take a moment to add types to these docstrings

ryanjulian · 2019-04-23T23:30:27Z

+            For example, (32, 32) means this MLP consists of two
+            hidden layers, each with 32 hidden units.
+        name (str): Network name, also the variable scope.
        hidden_nonlinearity: Activation function for


what about these types?

They are functions. Make it hidden_nonlinearity(function)?

ryanjulian · 2019-04-23T23:30:57Z

+        name (str): Model name, also the variable scope.
+        padding (str): The type of padding algorithm to use,
            either 'SAME' or 'VALID'.
        hidden_nonlinearity: Activation function for


what about these types?

ryanjulian · 2019-04-23T23:31:28Z

    Args:
-        name: Variable scope of the Sequential model.
+        name (str): Model name, also the variable scope.
        models (list[garage.Model]): The models to be connected


garage.tf.models.Model?

oh yes. It only takes garage.tf.models.Model.

ryanjulian · 2019-04-23T23:32:08Z

+            hidden_sizes (list[int]): Output dimension of dense layer(s).
+                For example, (32, 32) means the MLP of this policy consists
+                of two hidden layers, each with 32 hidden units.
+            hidden_nonlinearity: Activation function for


there are tf.Operation right?

I don't think they are tf.Operation. In python doing type(tf.nn.relu) returns <class 'function'>.

it's a function which returns tf.Tensor https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/ops/nn_ops.py#L2011

See https://mypy.readthedocs.io/en/latest/cheat_sheet_py3.html#functions for a guide on how to write the type hint for a function

ryanjulian

Looking great. Thanks for updating the docstrings. I think activation functions are just tf.Tensor. Feel free to submit once they are all cleared up.

CatherineSue · 2019-04-23T23:33:52Z

+                For example, (32, 32) means the MLP of this policy consists
+                of two hidden layers, each with 32 hidden units.
+            hidden_nonlinearity: Activation function for
+                        intermediate dense layer(s).


CatherineSue · 2019-04-23T23:34:06Z

+            For example, (32, 32) means the MLP of this policy consists of two
+            hidden layers, each with 32 hidden units.
        hidden_nonlinearity: Activation function for
                    intermediate dense layer(s).


ahtsan requested review from CatherineSue and ryanjulian April 11, 2019 22:02

ahtsan requested a review from a team as a code owner April 11, 2019 22:02

CatherineSue reviewed Apr 11, 2019

View reviewed changes

ryanjulian reviewed Apr 12, 2019

View reviewed changes

ryanjulian approved these changes Apr 16, 2019

View reviewed changes

ryanjulian reviewed Apr 22, 2019

View reviewed changes

Comment thread garage/tf/core/mlp.py

ryanjulian reviewed Apr 22, 2019

View reviewed changes

ryanjulian reviewed Apr 23, 2019

View reviewed changes

ryanjulian approved these changes Apr 23, 2019

View reviewed changes

CatherineSue reviewed Apr 23, 2019

View reviewed changes

CatherineSue approved these changes Apr 23, 2019

View reviewed changes

ahtsan force-pushed the categorical_conv_policy_with_model branch from a963ba1 to 4d51f86 Compare April 25, 2019 00:45

ahtsan added 9 commits April 24, 2019 21:26

CategoricalConvPolicy with model

7993728

Fix comments

1ca5e95

Sequential model class

dac930e

Fix CI

facb7c6

Fix docstring

a75674b

Add more docstring

f829c78

More docstring

f2f43e0

Fix string

b0a88ff

Fix docstring

a444a91

ahtsan force-pushed the categorical_conv_policy_with_model branch from caafe45 to a444a91 Compare April 25, 2019 04:28

ahtsan merged commit f11c494 into master Apr 25, 2019

ahtsan deleted the categorical_conv_policy_with_model branch April 25, 2019 06:13

ahtsan mentioned this pull request Jun 21, 2019

Refactor CategoricalConvPolicy to use garage.tf.Model #519

Closed

1 task

Conversation

ahtsan commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanjulian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahtsan Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahtsan Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanjulian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahtsan commented Apr 11, 2019 •

edited

Loading

codecov Bot commented Apr 11, 2019 •

edited

Loading

ahtsan Apr 12, 2019 •

edited

Loading

ahtsan Apr 12, 2019 •

edited

Loading