GaussianLSTMPolicy with model by ahtsan · Pull Request #677 · rlworkgroup/garage

ahtsan · 2019-05-23T04:33:02Z

Added GaussianLSTMModel and GaussianLSTMPolicyWithModel.

Added test for PPO with GaussianLSTMPolicyWithModel.

Apart from testing functionality of GaussianLSTMPolicyWithModel
in test_gaussian_lstm_policy_with_model.py, transitions from the
old policy (GaussianLSTMPolicy) to the new policy
(GaussianLSTMPolicyWithModel) are also tested in
test_gaussian_lstm_policy_with_model_transit.py, to make sure
they have the same API.

codecov · 2019-05-23T07:51:40Z

Codecov Report

Merging #677 into master will increase coverage by 0.53%.
The diff coverage is 98.7%.

@@            Coverage Diff             @@
##           master     #677      +/-   ##
==========================================
+ Coverage   64.08%   64.62%   +0.53%     
==========================================
  Files         159      161       +2     
  Lines        9770     9924     +154     
  Branches     1293     1303      +10     
==========================================
+ Hits         6261     6413     +152     
- Misses       3182     3183       +1     
- Partials      327      328       +1

Impacted Files	Coverage Δ
src/garage/tf/policies/gaussian_lstm_policy.py	`78.83% <ø> (ø)`	⬆️
src/garage/tf/models/gaussian_lstm_model.py	`100% <100%> (ø)`
...age/tf/policies/gaussian_lstm_policy_with_model.py	`97.7% <97.7%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95f0295...57206f1. Read the comment docs.

ryanjulian · 2019-05-23T20:12:04Z

@@ -0,0 +1,295 @@
+"""GaussianLSTMPolicy with GaussianLSTMModel."""
+from akro.tf import Box


import akro.tf

zhanpenghe

I only have minor comments. Is the *LSTMModel always be single layer recurrent unit now?

ryanjulian · 2019-05-29T17:34:38Z

+
+        Returns:
+            action (numpy.ndarray): Predicted action.
+            agent_info (dict[numpy.ndarray]): Mean and log std of the


maybe be explicit about the contents of the dict, e.g.

https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/batch_polopt.py#L85

ryanjulian · 2019-05-29T17:34:53Z

+
+        Returns:
+            actions (numpy.ndarray): Predicted actions.
+            agent_infos (dict[numpy.ndarray]): Mean and log std of the


be explicit about keys in the dict https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/batch_polopt.py#L85

ryanjulian · 2019-05-29T17:35:16Z

+
+        Returns:
+            action (numpy.ndarray): Predicted action.
+            agent_info (dict[numpy.ndarray]): Mean and log std of the


be explicit
https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/batch_polopt.py#L85

zhanpenghe · 2019-05-30T21:29:01Z

Can you please make it multi layer recurrent? Single layer sometimes does not work well for hard problem..

ahtsan · 2019-05-30T22:01:24Z

@zhanpenghe I think another PR will be more appropriate.

zhanpenghe

Sure

zhanpenghe · 2019-05-31T02:18:12Z

File an issue then.

This PR refactored GaussianLSTMPolicy with garage.tf.models.Model. It added two classes: GaussianLSTMModel and GaussianLSTMPolicyWithModel.

ahtsan requested review from CatherineSue, nish21 and ryanjulian May 23, 2019 04:33

ahtsan requested a review from a team as a code owner May 23, 2019 04:33

ryanjulian changed the title ~~Gaussian LSTM Policy with model~~ GaussianLSTMPolicy with model May 23, 2019