Skip to content

Add advantage normalization for recurrent policies#626

Merged
CatherineSue merged 2 commits intomasterfrom
rec_center_adv
Apr 18, 2019
Merged

Add advantage normalization for recurrent policies#626
CatherineSue merged 2 commits intomasterfrom
rec_center_adv

Conversation

@CatherineSue
Copy link
Copy Markdown
Member

No description provided.

@CatherineSue CatherineSue requested a review from a team as a code owner April 17, 2019 19:40
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 17, 2019

Codecov Report

Merging #626 into master will increase coverage by 0.05%.
The diff coverage is 95.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #626      +/-   ##
==========================================
+ Coverage    60.6%   60.65%   +0.05%     
==========================================
  Files         156      156              
  Lines        9057     9069      +12     
  Branches     1239     1241       +2     
==========================================
+ Hits         5489     5501      +12     
+ Misses       3259     3251       -8     
- Partials      309      317       +8
Impacted Files Coverage Δ
garage/tf/misc/tensor_utils.py 73.59% <100%> (+2.29%) ⬆️
garage/tf/algos/npo.py 93.39% <94.73%> (-0.89%) ⬇️
garage/tf/policies/categorical_gru_policy.py 80% <0%> (ø) ⬆️
...arage/tf/samplers/off_policy_vectorized_sampler.py 75.34% <0%> (ø) ⬆️
garage/tf/policies/gaussian_gru_policy.py 78.67% <0%> (ø) ⬆️
garage/misc/krylov.py 17.94% <0%> (ø) ⬆️
garage/sampler/stateful_pool.py 38.63% <0%> (ø) ⬆️
garage/tf/policies/gaussian_lstm_policy.py 78.83% <0%> (ø) ⬆️
garage/tf/policies/categorical_lstm_policy.py 79.83% <0%> (ø) ⬆️
garage/tf/optimizers/first_order_optimizer.py 68.05% <0%> (+2.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1f106a...e4e37b9. Read the comment docs.

@CatherineSue CatherineSue requested a review from ahtsan April 17, 2019 21:48
@CatherineSue CatherineSue merged commit c4020b5 into master Apr 18, 2019
krzentner pushed a commit that referenced this pull request Jul 15, 2019
* Add advantage normalization ops to recurrent policies in NPO. As these
   ops are common for all models, move them to `tensor_utils`.
* Enable pre-commit-hook double-quote-string-fixer for fixing double
   quote strings in changed files.

This is a backport of #626.
krzentner pushed a commit that referenced this pull request Jul 17, 2019
* Add advantage normalization ops to recurrent policies in NPO. As these
   ops are common for all models, move them to `tensor_utils`.
* Enable pre-commit-hook double-quote-string-fixer for fixing double
   quote strings in changed files.

This is a backport of #626.
krzentner added a commit that referenced this pull request Jul 17, 2019
* Add advantage normalization ops to recurrent policies in NPO. As these
   ops are common for all models, move them to `tensor_utils`.
* Enable pre-commit-hook double-quote-string-fixer for fixing double
   quote strings in changed files.

This is a backport of #626.
krzentner added a commit that referenced this pull request Jul 19, 2019
* Fix entropy in tf/algos/npo.py

 * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term
   `policy_ent_coeff * pol_ent` should be added to the policy gradient
   loss instead of the reward. By fixing this, the policy stddev will
   decrease if policy_ent_coeff is very small.
 * Also addressed issue #442.

This is a partial backport of #579 so that later fixes can be
packported.

* Add advantage norm. for recurrent policies

* Add advantage normalization ops to recurrent policies in NPO. As these
   ops are common for all models, move them to `tensor_utils`.
* Enable pre-commit-hook double-quote-string-fixer for fixing double
   quote strings in changed files.

This is a backport of #626.

* Rewrite automatic versioning (#659)

The previous automatic versioning script was flawed. It produced the
correct package version for documentation builds and building PyPI
distributions, but produced an incorrect version when you run setup.py
from the downloaded package. Unfortunately, Python environment managers
(e.g. Pipenv, conda) resolve package version by evaluating setup.py,
not using the PyPI version.

This PR makes version generation simpler by reading the version string
from a simple file. Automatic versioning from tags is handled by
clobbering the version file from within the CI, rather than looking for
a CI environment variable on every usage.

* Fix sim_policy (#691)

Recent updates to resume functionality changed keys in the checkpoint
pickle files from `'policy'` to `'algo'`, without updating the
corresponding example. This PR updates the example to reflect the new
API.
krzentner pushed a commit that referenced this pull request Jul 20, 2019
 * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term
   `policy_ent_coeff * pol_ent` should be added to the policy gradient
   loss instead of the reward. By fixing this, the policy stddev will
   decrease if policy_ent_coeff is very small.
 * Also addressed issue #442.

Add advantage norm. for recurrent policies

* Add advantage normalization ops to recurrent policies in NPO. As these
   ops are common for all models, move them to `tensor_utils`.
* Enable pre-commit-hook double-quote-string-fixer for fixing double
   quote strings in changed files.

This is a partial backport of #579 and #626.
ryanjulian pushed a commit that referenced this pull request Nov 21, 2019
* Fix entropy in tf/algos/npo.py

 * Fix entropy in tf/algos/npo.py. According to the PPO paper, the term
   `policy_ent_coeff * pol_ent` should be added to the policy gradient
   loss instead of the reward. By fixing this, the policy stddev will
   decrease if policy_ent_coeff is very small.
 * Also addressed issue #442.

Add advantage norm. for recurrent policies

* Add advantage normalization ops to recurrent policies in NPO. As these
   ops are common for all models, move them to `tensor_utils`.
* Enable pre-commit-hook double-quote-string-fixer for fixing double
   quote strings in changed files.

This is a partial backport of #579 and #626.

* Rewrite automatic versioning (#659)

The previous automatic versioning script was flawed. It produced the
correct package version for documentation builds and building PyPI
distributions, but produced an incorrect version when you run setup.py
from the downloaded package. Unfortunately, Python environment managers
(e.g. Pipenv, conda) resolve package version by evaluating setup.py,
not using the PyPI version.

This PR makes version generation simpler by reading the version string
from a simple file. Automatic versioning from tags is handled by
clobbering the version file from within the CI, rather than looking for
a CI environment variable on every usage.

* Fix sim_policy (#691)

Recent updates to resume functionality changed keys in the checkpoint
pickle files from `'policy'` to `'algo'`, without updating the
corresponding example. This PR updates the example to reflect the new
API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants