Add advantage normalization for recurrent policies by CatherineSue · Pull Request #626 · rlworkgroup/garage

CatherineSue · 2019-04-17T19:40:23Z

No description provided.

codecov · 2019-04-17T20:10:52Z

Codecov Report

Merging #626 into master will increase coverage by 0.05%.
The diff coverage is 95.89%.

@@            Coverage Diff             @@
##           master     #626      +/-   ##
==========================================
+ Coverage    60.6%   60.65%   +0.05%     
==========================================
  Files         156      156              
  Lines        9057     9069      +12     
  Branches     1239     1241       +2     
==========================================
+ Hits         5489     5501      +12     
+ Misses       3259     3251       -8     
- Partials      309      317       +8

Impacted Files	Coverage Δ
garage/tf/misc/tensor_utils.py	`73.59% <100%> (+2.29%)`	⬆️
garage/tf/algos/npo.py	`93.39% <94.73%> (-0.89%)`	⬇️
garage/tf/policies/categorical_gru_policy.py	`80% <0%> (ø)`	⬆️
...arage/tf/samplers/off_policy_vectorized_sampler.py	`75.34% <0%> (ø)`	⬆️
garage/tf/policies/gaussian_gru_policy.py	`78.67% <0%> (ø)`	⬆️
garage/misc/krylov.py	`17.94% <0%> (ø)`	⬆️
garage/sampler/stateful_pool.py	`38.63% <0%> (ø)`	⬆️
garage/tf/policies/gaussian_lstm_policy.py	`78.83% <0%> (ø)`	⬆️
garage/tf/policies/categorical_lstm_policy.py	`79.83% <0%> (ø)`	⬆️
garage/tf/optimizers/first_order_optimizer.py	`68.05% <0%> (+2.77%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1f106a...e4e37b9. Read the comment docs.

* Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a backport of #626.

* Fix entropy in tf/algos/npo.py * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term `policy_ent_coeff * pol_ent` should be added to the policy gradient loss instead of the reward. By fixing this, the policy stddev will decrease if policy_ent_coeff is very small. * Also addressed issue #442. This is a partial backport of #579 so that later fixes can be packported. * Add advantage norm. for recurrent policies * Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a backport of #626. * Rewrite automatic versioning (#659) The previous automatic versioning script was flawed. It produced the correct package version for documentation builds and building PyPI distributions, but produced an incorrect version when you run setup.py from the downloaded package. Unfortunately, Python environment managers (e.g. Pipenv, conda) resolve package version by evaluating setup.py, not using the PyPI version. This PR makes version generation simpler by reading the version string from a simple file. Automatic versioning from tags is handled by clobbering the version file from within the CI, rather than looking for a CI environment variable on every usage. * Fix sim_policy (#691) Recent updates to resume functionality changed keys in the checkpoint pickle files from `'policy'` to `'algo'`, without updating the corresponding example. This PR updates the example to reflect the new API.

* Fix entropy in tf/algos/npo.py. According to the PPO paper, the term `policy_ent_coeff * pol_ent` should be added to the policy gradient loss instead of the reward. By fixing this, the policy stddev will decrease if policy_ent_coeff is very small. * Also addressed issue #442. Add advantage norm. for recurrent policies * Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a partial backport of #579 and #626.

* Fix entropy in tf/algos/npo.py * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term `policy_ent_coeff * pol_ent` should be added to the policy gradient loss instead of the reward. By fixing this, the policy stddev will decrease if policy_ent_coeff is very small. * Also addressed issue #442. Add advantage norm. for recurrent policies * Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a partial backport of #579 and #626. * Rewrite automatic versioning (#659) The previous automatic versioning script was flawed. It produced the correct package version for documentation builds and building PyPI distributions, but produced an incorrect version when you run setup.py from the downloaded package. Unfortunately, Python environment managers (e.g. Pipenv, conda) resolve package version by evaluating setup.py, not using the PyPI version. This PR makes version generation simpler by reading the version string from a simple file. Automatic versioning from tags is handled by clobbering the version file from within the CI, rather than looking for a CI environment variable on every usage. * Fix sim_policy (#691) Recent updates to resume functionality changed keys in the checkpoint pickle files from `'policy'` to `'algo'`, without updating the corresponding example. This PR updates the example to reflect the new API.

Add advantage normalization for recurrent policies

392f4b0

CatherineSue requested a review from a team as a code owner April 17, 2019 19:40

CatherineSue mentioned this pull request Apr 17, 2019

No advantage normalization for recurrent policy #620

Closed

ryanjulian approved these changes Apr 17, 2019

View reviewed changes

CatherineSue requested a review from ahtsan April 17, 2019 21:48

Turn on pre-commit-hook double-quote-string-fixer

e4e37b9

CatherineSue force-pushed the rec_center_adv branch from 5e61e8a to e4e37b9 Compare April 17, 2019 21:51

ahtsan approved these changes Apr 17, 2019

View reviewed changes

CatherineSue merged commit c4020b5 into master Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add advantage normalization for recurrent policies#626

Add advantage normalization for recurrent policies#626
CatherineSue merged 2 commits intomasterfrom
rec_center_adv

CatherineSue commented Apr 17, 2019

Uh oh!

codecov Bot commented Apr 17, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CatherineSue commented Apr 17, 2019

Uh oh!

codecov Bot commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 17, 2019 •

edited

Loading