Use env_spec in algos by naeioi · Pull Request #575 · rlworkgroup/garage

naeioi · 2019-03-07T22:09:59Z

This PR replaces env with env_spec in algorithm constructor.

This closes #513 .

ryanjulian

so how does my env get into the snapshot .pkl file?

codecov · 2019-03-07T22:32:03Z

Codecov Report

Merging #575 into master will increase coverage by 0.11%.
The diff coverage is 93.1%.

@@            Coverage Diff             @@
##           master     #575      +/-   ##
==========================================
+ Coverage   58.55%   58.66%   +0.11%     
==========================================
  Files         139      139              
  Lines        9143     9124      -19     
  Branches     1338     1331       -7     
==========================================
- Hits         5354     5353       -1     
+ Misses       3404     3385      -19     
- Partials      385      386       +1

Impacted Files	Coverage Δ
garage/tf/algos/npo.py	`94.31% <ø> (ø)`	⬆️
garage/tf/algos/reps.py	`98.78% <100%> (ø)`	⬆️
garage/tf/algos/batch_polopt.py	`87.87% <100%> (ø)`	⬆️
garage/tf/samplers/on_policy_vectorized_sampler.py	`90.66% <100%> (+2.35%)`	⬆️
garage/tf/algos/off_policy_rl_algorithm.py	`90% <100%> (+7.07%)`	⬆️
garage/experiment/local_tf_runner.py	`76.28% <100%> (+0.75%)`	⬆️
...arage/tf/samplers/off_policy_vectorized_sampler.py	`75.34% <100%> (+2%)`	⬆️
garage/sampler/base.py	`16.66% <100%> (+0.93%)`	⬆️
garage/tf/samplers/batch_sampler.py	`77.01% <66.66%> (ø)`	⬆️
garage/tf/algos/ddpg.py	`85.81% <75%> (+1.92%)`	⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7ec41f...a757dae. Read the comment docs.

naeioi · 2019-03-07T23:23:03Z

@ryanjulian Because the snapshot is taken by the algorithm and I removed env from the pickled file, it turns out that there's no way to save env. My idea is that the saving env is not very important so I just removed it.

Also, I doubt the experiement resuming really worked previously because algo.train() will randomized all variables. It's neither working currently because run_experiment() still calls algo.train() which is not valid anymore.

garage/scripts/run_experiment.py

Lines 174 to 178 in 1c299fd

    
           if args.resume_from is not None: 
        
               data = joblib.load(args.resume_from) 
        
               assert 'algo' in data 
        
               algo = data['algo'] 
        
               algo.train()

I plan to resolve this issue together with #516 .

ryanjulian · 2019-03-08T22:20:38Z

@naeioi it's quite important to serialize the environment, and we can't really merge this without saving the environment.

Look at https://github.com/rlworkgroup/garage/blob/master/examples/sim_policy.py -- this PR breaks it.

TODO: We should add a test to make sure the snapshotter produces complete .pkl files, so that this PR would fail in the CI immediately.

Reasons:

There's no reason env can't have meaningful parameters which are saved/restored during pickle/unpickle -- for instance, what if I had a PointMass environment with specifiable mass? Not all envs are just strings from openai/gym. The only interface we impose on env that (1) env == pickle.loads(pickle.dumps(env)) and (2) isinstance(env, gym.Env)
A .pkl file should be a total record of an experiment -- it should have everything I need to evaluate an agent in the environment, continue training the agent in the environment, etc. removing Env makes that untrue -- i would have to look at the launcher and be careful to recreate the environment. consider how hard this would be if I decided to procedurally generate my environments!
Perhaps (2) isn't the best way to save experiment state, but right now it's what we have.

naeioi · 2019-03-13T03:51:49Z

@ryanjulian That makes sense. I added a test to verify the presence and integrity of both environment and policy in snapshot. As a workaround of algo not having access to the actual environment, the environment is now saved in save_snapshot() in the runner.

ryanjulian · 2019-03-13T17:46:03Z

awesome!

ryanjulian

awesome work. these PRs are making a huge positive impact on the codebase.

CatherineSue · 2019-03-20T00:31:30Z

-            plot=False,
            target_update_tau=0.05,
+            n_epoch_cycles=20,
            max_path_length=100,


I thought this line is deleted in off policy algorithm?

Actually not. I keep n_epoch_cycles so that an algorithm can compute epoch by iteration/n_epoch_cycles. This is a little bit awkward but required in DDPG.

I don't quite understand. I meant is max_path_length=100 not necessary here?

ryanjulian · 2019-03-21T01:59:46Z

bump. are there any hard blockers on this PR?

naeioi · 2019-03-21T21:20:45Z

I will have this merged after rebase.

This commit saves env in runner because algo not longer has access to the actual env. A test is also added to test the presence and integrity of both env and policy in snapshot. Signed-off-by: Keren Zhu <naeioi@hotmail.com>

Disable not-context-manager check. Pylint has long been having this bug. See pylint-dev/pylint#782. Disable c-extension-no-member check. Some C modules don't report their exported functions and variable in python. In our case, baselines is using mpi4py which makes pylint complain. See https://travis-ci.com/rlworkgroup/garage/builds/105359651#L690.

naeioi · 2019-03-21T22:17:13Z

I disable two pylint checks.

not-context-manager check
Pylint has long been having bug on checking context manager.
See False Positive E1129 (Not Context Manager) with class decorators pylint-dev/pylint#782.
c-extension-no-member check
Some C modules don't report their functions and variable in python. In our case, baselines is using mpi4py which makes pylint complain.
See https://travis-ci.com/rlworkgroup/garage/builds/105359651#L690.

naeioi requested a review from a team as a code owner March 7, 2019 22:10

ryanjulian reviewed Mar 7, 2019

View reviewed changes

Comment thread garage/experiment/local_tf_runner.py Outdated

Comment thread garage/tf/algos/ddpg.py Outdated

Comment thread garage/tf/samplers/on_policy_vectorized_sampler.py Outdated

naeioi force-pushed the env_spec-in-algo branch from d511f52 to 6a32cff Compare March 7, 2019 23:45

ryanjulian reviewed Mar 8, 2019

View reviewed changes

Comment thread garage/experiment/local_tf_runner.py

naeioi force-pushed the env_spec-in-algo branch 4 times, most recently from 154a86c to 4b4595b Compare March 13, 2019 03:45

ryanjulian reviewed Mar 13, 2019

View reviewed changes

Comment thread tests/garage/experiment/test_snapshot.py Outdated

ryanjulian approved these changes Mar 13, 2019

View reviewed changes

ryanjulian reviewed Mar 14, 2019

View reviewed changes

Comment thread garage/experiment/local_tf_runner.py

Comment thread garage/tf/algos/ddpg.py Outdated

naeioi force-pushed the env_spec-in-algo branch from 86735a5 to 996a176 Compare March 14, 2019 23:09

ryanjulian reviewed Mar 15, 2019

View reviewed changes

Comment thread tests/garage/experiment/test_snapshot.py Outdated

ryanjulian reviewed Mar 15, 2019

View reviewed changes

Comment thread tests/garage/experiment/test_snapshot.py Outdated

naeioi force-pushed the env_spec-in-algo branch from 8b18cd2 to a8821f0 Compare March 19, 2019 00:01

CatherineSue approved these changes Mar 20, 2019

View reviewed changes

naeioi added 7 commits March 21, 2019 14:22

Use env_spec in algos

a2cff9c

Address comments

a70afcb

Format

d0a1bd1

Save env in runner

5f0b4ae

This commit saves env in runner because algo not longer has access to the actual env. A test is also added to test the presence and integrity of both env and policy in snapshot. Signed-off-by: Keren Zhu <naeioi@hotmail.com>

Fix CI

4614f84

Fix CI

91da5ea

Fix CI

3aab2a1

naeioi added 5 commits March 21, 2019 14:22

Remove saving snapshot in DDPG

538ab9d

Test many rounds of snapshot

a9e8bbf

Exit sess in TfGraphTestCase

c12baf7

Format

47b302a

Fix CI

7a8caa8

naeioi force-pushed the env_spec-in-algo branch from 78254b4 to 7a8caa8 Compare March 21, 2019 21:23

naeioi merged commit c4106d8 into master Mar 22, 2019

naeioi deleted the env_spec-in-algo branch March 22, 2019 01:18

Conversation

naeioi commented Mar 7, 2019

Uh oh!

ryanjulian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

naeioi commented Mar 7, 2019

Uh oh!

Uh oh!

ryanjulian commented Mar 8, 2019

Uh oh!

naeioi commented Mar 13, 2019

Uh oh!

ryanjulian commented Mar 13, 2019

Uh oh!

Uh oh!

ryanjulian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CatherineSue Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

naeioi Mar 21, 2019

Choose a reason for hiding this comment

Uh oh!

CatherineSue Mar 21, 2019

Choose a reason for hiding this comment

Uh oh!

ryanjulian commented Mar 21, 2019

Uh oh!

naeioi commented Mar 21, 2019

Uh oh!

naeioi commented Mar 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Mar 7, 2019 •

edited

Loading