Skip to content

Refactor *MLPPolicy with model#574

Merged
ahtsan merged 10 commits intomasterfrom
mlp_policy_with_model
Mar 26, 2019
Merged

Refactor *MLPPolicy with model#574
ahtsan merged 10 commits intomasterfrom
mlp_policy_with_model

Conversation

@ahtsan
Copy link
Copy Markdown
Contributor

@ahtsan ahtsan commented Mar 7, 2019

Custom pickle logic is added, basically remove all unpickleable operations in __getstate__ and reconstruct those in __setstate__.

Also the default model (model.networks['default']) will be built in __setstate__ as well, so after unpickled the policy can be used right away.

@ahtsan ahtsan requested a review from a team as a code owner March 7, 2019 21:46
Comment thread garage/tf/policies/base2.py Outdated
Comment thread garage/tf/policies/base2.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 7, 2019

Codecov Report

Merging #574 into master will increase coverage by 0.38%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #574      +/-   ##
=========================================
+ Coverage   60.51%   60.9%   +0.38%     
=========================================
  Files         141     142       +1     
  Lines        8833    8857      +24     
  Branches     1251    1239      -12     
=========================================
+ Hits         5345    5394      +49     
+ Misses       3155    3130      -25     
  Partials      333     333
Impacted Files Coverage Δ
garage/tf/core/mlp.py 100% <ø> (ø) ⬆️
garage/tf/models/__init__.py 100% <100%> (ø)
garage/tf/policies/deterministic_mlp_policy.py 80% <100%> (ø) ⬆️
garage/tf/models/mlp_model.py 100% <100%> (ø)
garage/tf/policies/__init__.py 100% <100%> (ø) ⬆️
garage/tf/q_functions/discrete_mlp_q_function.py 42.1% <27.27%> (-7.9%) ⬇️
garage/tf/policies/base2.py 62.79% <70%> (-10.74%) ⬇️
garage/tf/models/gaussian_mlp_model.py 96.2% <89.28%> (+20.2%) ⬆️
...tf/policies/deterministic_mlp_policy_with_model.py 92.85% <92.85%> (ø)
...e/tf/policies/categorical_mlp_policy_with_model.py 95.91% <95.91%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2d4f4f...8d1b53e. Read the comment docs.

Comment thread garage/tf/policies/continuous_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/deterministic_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/deterministic_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/deterministic_mlp_policy_with_model.py Outdated
Comment thread tests/garage/tf/policies/test_categorical_mlp_policy_with_model.py Outdated
Comment thread tests/garage/tf/policies/test_deterministic_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/continuous_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/deterministic_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/base2.py Outdated
Comment thread garage/tf/policies/base2.py Outdated
@CatherineSue
Copy link
Copy Markdown
Member

We should probably test the refactored policies with benchmark. It prevents from further debugging the benchmark test.

@ryanjulian
Copy link
Copy Markdown
Member

@CatherineSue I agree.

but I think if we can figure out good unit tests for policies, we won't worry so much about messing them up when we refactor.

@ryanjulian
Copy link
Copy Markdown
Member

for instance, maybe we can define some unit tests which run basic supervised learning using each model and make sure it produces known-good results.

if GaussianMLPModel can properly fit a (state, action) dataset, then it seems unlikely the optimization of GaussianMLPPolicy would get messed up

Copy link
Copy Markdown
Member

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we could come up with a good unittest to make sure the policies work well.

Comment thread garage/tf/models/continuous_mlp_model.py Outdated
Comment thread garage/tf/policies/categorical_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/continuous_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/continuous_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/deterministic_mlp_policy_with_model.py
@ahtsan ahtsan force-pushed the mlp_policy_with_model branch 2 times, most recently from 80417c0 to 9c6b694 Compare March 13, 2019 06:38
Comment thread garage/experiment/local_tf_runner.py Outdated
Comment thread garage/tf/policies/base2.py Outdated
Comment thread tests/garage/tf/policies/test_gaussian_mlp_policy_with_model.py Outdated
Comment thread tests/garage/tf/policies/test_gaussian_mlp_policy_with_model.py Outdated
Comment thread tests/garage/tf/policies/test_gaussian_mlp_policy_with_model.py Outdated
Comment thread tests/garage/tf/policies/test_gaussian_mlp_policy_with_model.py Outdated
Comment thread garage/tf/policies/base2.py Outdated
Copy link
Copy Markdown
Member

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not going to add the unit test that assures the policy with models work?

@ryanjulian
Copy link
Copy Markdown
Member

@CatherineSue it's your review, so you get to decide what the criteria are.

I think at this stage a couple simple unit tests would be reasonable:

  1. Initialize the network weights to a known state
  2. Known input/output pair
    Make sure that a known input produces the proper output
  3. Empirical mean/variance
    Make sure that a known input produces the expected empirical mean/variance (by sampling n times and calculating mean/variance)

I think a test where we attempt to train each model on a known dataset is a gold standard, but perhaps is too much for this PR.

@ryanjulian
Copy link
Copy Markdown
Member

It does seem like the new version of this PR misses coverage on several significant branches:
https://codecov.io/gh/rlworkgroup/garage/compare/c4b49939df36487c49282ddb6bd9af647ea05b87...79f4770dcfd92294f026a12d16125e9e9b6dc601/diff

Comment thread garage/tf/models/gaussian_mlp_model.py
@ahtsan
Copy link
Copy Markdown
Contributor Author

ahtsan commented Mar 15, 2019

My thought about adding unit test is, if we have already tested MLPModel, then for any policy that uses MLPModel, we only need to test policy-level functions, like get_action_sym or dist_info_sym and make sure the output is the same as the policy output. The reason is that we already know MLPModel works properly.
To make sure policy.get_action works properly, we can directly compare

out = self.sess.run(policy.model.output, feed_dict={policy.model.input: [obs]})
policy.get_action(obs) == out

But then, it's not that necessary since get_action is essential the same as self.sess.run(policy.model.output, feed_dict={policy.model.input: [obs]}) in the code.


Then since we have already tested mlp and MLPModel is basically wrapping a single mlp, we already know it is working properly.

@ryanjulian
Copy link
Copy Markdown
Member

I agree. You can implement this by mocking out the model

Comment thread tests/garage/tf/models/test_gaussian_mlp_model.py Outdated
Comment thread tests/garage/tf/models/test_gaussian_mlp_model.py Outdated
from tests.fixtures.envs.dummy import DummyDiscreteEnv


class SimpleMLPModel(Model):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put these fixtures into a test library

def test_output_values(self):
model = MLPModel(
output_dim=self.output_dim,
hidden_sizes=(2, ),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about hidden_size = 1?

in most software, bugs center around the values 0, 1, 2, and 3+. it's good to test all of those cases (or whichever ones are valid for your test)

Comment thread tests/garage/tf/policies/test_categorical_mlp_policy_with_model.py Outdated

def test_get_action(self):
action, _ = self.policy.get_action(self.obs)
assert self.env.action_space.contains(np.asarray([action]))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is np.asarray required here? why is get_action returning something that needs to be converted before being used with a space?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually an issue.
e.g. In akro.box, the contains function is as following:

    def contains(self, x):
        """Return boolean specifying if x is a valid member of this space."""
        return x.shape == self.shape and (x >= self.low).all() and (
            x <= self.high).all()

It assumes the input is a numpy array.
Because the return action from policy.get_action can be a scalar value, it won't work.
Should we force the output of policy.get_action to be a numpy array?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you found a n = 1 vs n > 1 bug. good job!

i think that the output of get_action should always pass the test

a = foo.get_action(bar)
assert a in foo.action_space

bs = foo.get_actions(bars)
for b in bs:
    assert b in foo.action_space

the question is -- is this a garage bug or an akro bug?

i think it is a garage bug. the output type of a policy should be fixed -- it should't be "a scalar sometimes and an np.ndarray other times." if the value is single-dimensional, i think it should output an ndarray with shape (1,).

of course, i could be persuaded otherwise if this made code elsewhere really nasty.

Comment thread tests/garage/tf/policies/test_gaussian_mlp_policy_with_model.py Outdated
class TestGaussianMLPPolicyWithModel(TfGraphTestCase):
def setUp(self):
super().setUp()
self.env = TfEnv(DummyBoxEnv())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please test several output_dim

@ryanjulian
Copy link
Copy Markdown
Member

These tests are looking great

Please see my comments about

  • Moving your test fixtures into libraries in tests/
  • Testing a variety of relevant values, especially for output_dim and hidden_sizes

Other things I noticed:

Comment thread garage/sampler/parallel_sampler.py Outdated

def _worker_set_seed(_, seed):
logger.log("Setting seed to %d" % seed)
# import here to avoid circular dependency
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhh can we restructure this to avoid a circular dependency instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think @zhanpenghe fixed this?

Comment thread garage/tf/models/gaussian_mlp_model.py Outdated
p = tf.constant_initializer(self._init_std_param)
std_network = parameter(

def p():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use a lambda

@ahtsan ahtsan force-pushed the mlp_policy_with_model branch 2 times, most recently from 3fbe49d to 4f0930b Compare March 22, 2019 20:56
Copy link
Copy Markdown
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

What about ContinousMLPQFunction?

Comment thread setup.cfg
not-context-manager,
c-extension-no-member
c-extension-no-member,
wrong-import-order
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer if we just configure pylint properly than disable this. which package did it complain about?

-Observations are
after reset : np.zeros(self._shape).
action 1 (FIRE): np.ones(self._shape).
after reset : np.ones(self._shape).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this sneak in from a different PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because we don't want to have zeros output


Args:
env_spec: Environment specification.
name: variable scope of the mlp.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable scope

@@ -88,6 +95,14 @@ def terminate(self):
"""Clean up operation."""
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this function for? Why pass?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy from the existing policy base class from garage.tf.policies.base. If it is not used anywhere, we can remove that.

@ryanjulian
Copy link
Copy Markdown
Member

Feel free to merge this and we will deal with pylint later

ahtsan added 10 commits March 26, 2019 11:19
Removed test for GaussianMLPPolicyWithModel2 which uses
tfp.distributions, which should be introduced in later
part of refactoring. Interface get_regularizable_vars
is also removed with the same reason.
Pylint is having conflict with flake8 regarding to import order
checking. Therefore disable pylint checking and rely on flake8 only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants