Conversation
Codecov Report
@@ Coverage Diff @@
## master #574 +/- ##
=========================================
+ Coverage 60.51% 60.9% +0.38%
=========================================
Files 141 142 +1
Lines 8833 8857 +24
Branches 1251 1239 -12
=========================================
+ Hits 5345 5394 +49
+ Misses 3155 3130 -25
Partials 333 333
Continue to review full report at Codecov.
|
|
We should probably test the refactored policies with benchmark. It prevents from further debugging the benchmark test. |
|
@CatherineSue I agree. but I think if we can figure out good unit tests for policies, we won't worry so much about messing them up when we refactor. |
|
for instance, maybe we can define some unit tests which run basic supervised learning using each model and make sure it produces known-good results. if GaussianMLPModel can properly fit a |
CatherineSue
left a comment
There was a problem hiding this comment.
I agree we could come up with a good unittest to make sure the policies work well.
80417c0 to
9c6b694
Compare
CatherineSue
left a comment
There was a problem hiding this comment.
Are we not going to add the unit test that assures the policy with models work?
|
@CatherineSue it's your review, so you get to decide what the criteria are. I think at this stage a couple simple unit tests would be reasonable:
I think a test where we attempt to train each model on a known dataset is a gold standard, but perhaps is too much for this PR. |
|
It does seem like the new version of this PR misses coverage on several significant branches: |
|
My thought about adding unit test is, if we have already tested out = self.sess.run(policy.model.output, feed_dict={policy.model.input: [obs]})
policy.get_action(obs) == outBut then, it's not that necessary since Then since we have already tested |
|
I agree. You can implement this by mocking out the model |
| from tests.fixtures.envs.dummy import DummyDiscreteEnv | ||
|
|
||
|
|
||
| class SimpleMLPModel(Model): |
There was a problem hiding this comment.
please put these fixtures into a test library
| def test_output_values(self): | ||
| model = MLPModel( | ||
| output_dim=self.output_dim, | ||
| hidden_sizes=(2, ), |
There was a problem hiding this comment.
what about hidden_size = 1?
in most software, bugs center around the values 0, 1, 2, and 3+. it's good to test all of those cases (or whichever ones are valid for your test)
|
|
||
| def test_get_action(self): | ||
| action, _ = self.policy.get_action(self.obs) | ||
| assert self.env.action_space.contains(np.asarray([action])) |
There was a problem hiding this comment.
why is np.asarray required here? why is get_action returning something that needs to be converted before being used with a space?
There was a problem hiding this comment.
I think it's actually an issue.
e.g. In akro.box, the contains function is as following:
def contains(self, x):
"""Return boolean specifying if x is a valid member of this space."""
return x.shape == self.shape and (x >= self.low).all() and (
x <= self.high).all()It assumes the input is a numpy array.
Because the return action from policy.get_action can be a scalar value, it won't work.
Should we force the output of policy.get_action to be a numpy array?
There was a problem hiding this comment.
it looks like you found a n = 1 vs n > 1 bug. good job!
i think that the output of get_action should always pass the test
a = foo.get_action(bar)
assert a in foo.action_space
bs = foo.get_actions(bars)
for b in bs:
assert b in foo.action_spacethe question is -- is this a garage bug or an akro bug?
i think it is a garage bug. the output type of a policy should be fixed -- it should't be "a scalar sometimes and an np.ndarray other times." if the value is single-dimensional, i think it should output an ndarray with shape (1,).
of course, i could be persuaded otherwise if this made code elsewhere really nasty.
| class TestGaussianMLPPolicyWithModel(TfGraphTestCase): | ||
| def setUp(self): | ||
| super().setUp() | ||
| self.env = TfEnv(DummyBoxEnv()) |
There was a problem hiding this comment.
please test several output_dim
|
These tests are looking great Please see my comments about
Other things I noticed:
|
|
|
||
| def _worker_set_seed(_, seed): | ||
| logger.log("Setting seed to %d" % seed) | ||
| # import here to avoid circular dependency |
There was a problem hiding this comment.
uhh can we restructure this to avoid a circular dependency instead?
| p = tf.constant_initializer(self._init_std_param) | ||
| std_network = parameter( | ||
|
|
||
| def p(): |
3fbe49d to
4f0930b
Compare
ryanjulian
left a comment
There was a problem hiding this comment.
Looks good.
What about ContinousMLPQFunction?
| not-context-manager, | ||
| c-extension-no-member | ||
| c-extension-no-member, | ||
| wrong-import-order |
There was a problem hiding this comment.
i would prefer if we just configure pylint properly than disable this. which package did it complain about?
| -Observations are | ||
| after reset : np.zeros(self._shape). | ||
| action 1 (FIRE): np.ones(self._shape). | ||
| after reset : np.ones(self._shape). |
There was a problem hiding this comment.
did this sneak in from a different PR?
There was a problem hiding this comment.
I changed it because we don't want to have zeros output
|
|
||
| Args: | ||
| env_spec: Environment specification. | ||
| name: variable scope of the mlp. |
| @@ -88,6 +95,14 @@ def terminate(self): | |||
| """Clean up operation.""" | |||
| pass | |||
There was a problem hiding this comment.
What's this function for? Why pass?
There was a problem hiding this comment.
I copy from the existing policy base class from garage.tf.policies.base. If it is not used anywhere, we can remove that.
|
Feel free to merge this and we will deal with pylint later |
Removed test for GaussianMLPPolicyWithModel2 which uses tfp.distributions, which should be introduced in later part of refactoring. Interface get_regularizable_vars is also removed with the same reason.
Pylint is having conflict with flake8 regarding to import order checking. Therefore disable pylint checking and rely on flake8 only.
4f0930b to
8d1b53e
Compare
Custom pickle logic is added, basically remove all unpickleable operations in
__getstate__and reconstruct those in__setstate__.Also the default model (
model.networks['default']) will be built in__setstate__as well, so after unpickled the policy can be used right away.