Add advantage normalization for recurrent policies#626
Merged
CatherineSue merged 2 commits intomasterfrom Apr 18, 2019
Merged
Conversation
Codecov Report
@@ Coverage Diff @@
## master #626 +/- ##
==========================================
+ Coverage 60.6% 60.65% +0.05%
==========================================
Files 156 156
Lines 9057 9069 +12
Branches 1239 1241 +2
==========================================
+ Hits 5489 5501 +12
+ Misses 3259 3251 -8
- Partials 309 317 +8
Continue to review full report at Codecov.
|
ryanjulian
approved these changes
Apr 17, 2019
5e61e8a to
e4e37b9
Compare
ahtsan
approved these changes
Apr 17, 2019
krzentner
pushed a commit
that referenced
this pull request
Jul 15, 2019
* Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a backport of #626.
krzentner
pushed a commit
that referenced
this pull request
Jul 17, 2019
* Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a backport of #626.
krzentner
added a commit
that referenced
this pull request
Jul 17, 2019
* Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a backport of #626.
krzentner
added a commit
that referenced
this pull request
Jul 19, 2019
* Fix entropy in tf/algos/npo.py * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term `policy_ent_coeff * pol_ent` should be added to the policy gradient loss instead of the reward. By fixing this, the policy stddev will decrease if policy_ent_coeff is very small. * Also addressed issue #442. This is a partial backport of #579 so that later fixes can be packported. * Add advantage norm. for recurrent policies * Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a backport of #626. * Rewrite automatic versioning (#659) The previous automatic versioning script was flawed. It produced the correct package version for documentation builds and building PyPI distributions, but produced an incorrect version when you run setup.py from the downloaded package. Unfortunately, Python environment managers (e.g. Pipenv, conda) resolve package version by evaluating setup.py, not using the PyPI version. This PR makes version generation simpler by reading the version string from a simple file. Automatic versioning from tags is handled by clobbering the version file from within the CI, rather than looking for a CI environment variable on every usage. * Fix sim_policy (#691) Recent updates to resume functionality changed keys in the checkpoint pickle files from `'policy'` to `'algo'`, without updating the corresponding example. This PR updates the example to reflect the new API.
krzentner
pushed a commit
that referenced
this pull request
Jul 20, 2019
* Fix entropy in tf/algos/npo.py. According to the PPO paper, the term `policy_ent_coeff * pol_ent` should be added to the policy gradient loss instead of the reward. By fixing this, the policy stddev will decrease if policy_ent_coeff is very small. * Also addressed issue #442. Add advantage norm. for recurrent policies * Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a partial backport of #579 and #626.
ryanjulian
pushed a commit
that referenced
this pull request
Nov 21, 2019
* Fix entropy in tf/algos/npo.py * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term `policy_ent_coeff * pol_ent` should be added to the policy gradient loss instead of the reward. By fixing this, the policy stddev will decrease if policy_ent_coeff is very small. * Also addressed issue #442. Add advantage norm. for recurrent policies * Add advantage normalization ops to recurrent policies in NPO. As these ops are common for all models, move them to `tensor_utils`. * Enable pre-commit-hook double-quote-string-fixer for fixing double quote strings in changed files. This is a partial backport of #579 and #626. * Rewrite automatic versioning (#659) The previous automatic versioning script was flawed. It produced the correct package version for documentation builds and building PyPI distributions, but produced an incorrect version when you run setup.py from the downloaded package. Unfortunately, Python environment managers (e.g. Pipenv, conda) resolve package version by evaluating setup.py, not using the PyPI version. This PR makes version generation simpler by reading the version string from a simple file. Automatic versioning from tags is handled by clobbering the version file from within the CI, rather than looking for a CI environment variable on every usage. * Fix sim_policy (#691) Recent updates to resume functionality changed keys in the checkpoint pickle files from `'policy'` to `'algo'`, without updating the corresponding example. This PR updates the example to reflect the new API.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.