Skip to content

Use env_spec in algos#575

Merged
naeioi merged 13 commits intomasterfrom
env_spec-in-algo
Mar 22, 2019
Merged

Use env_spec in algos#575
naeioi merged 13 commits intomasterfrom
env_spec-in-algo

Conversation

@naeioi
Copy link
Copy Markdown
Member

@naeioi naeioi commented Mar 7, 2019

This PR replaces env with env_spec in algorithm constructor.

This closes #513 .

@naeioi naeioi requested a review from a team as a code owner March 7, 2019 22:10
Copy link
Copy Markdown
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how does my env get into the snapshot .pkl file?

Comment thread garage/experiment/local_tf_runner.py Outdated
Comment thread garage/tf/algos/ddpg.py Outdated
Comment thread garage/tf/samplers/on_policy_vectorized_sampler.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 7, 2019

Codecov Report

Merging #575 into master will increase coverage by 0.11%.
The diff coverage is 93.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #575      +/-   ##
==========================================
+ Coverage   58.55%   58.66%   +0.11%     
==========================================
  Files         139      139              
  Lines        9143     9124      -19     
  Branches     1338     1331       -7     
==========================================
- Hits         5354     5353       -1     
+ Misses       3404     3385      -19     
- Partials      385      386       +1
Impacted Files Coverage Δ
garage/tf/algos/npo.py 94.31% <ø> (ø) ⬆️
garage/tf/algos/reps.py 98.78% <100%> (ø) ⬆️
garage/tf/algos/batch_polopt.py 87.87% <100%> (ø) ⬆️
garage/tf/samplers/on_policy_vectorized_sampler.py 90.66% <100%> (+2.35%) ⬆️
garage/tf/algos/off_policy_rl_algorithm.py 90% <100%> (+7.07%) ⬆️
garage/experiment/local_tf_runner.py 76.28% <100%> (+0.75%) ⬆️
...arage/tf/samplers/off_policy_vectorized_sampler.py 75.34% <100%> (+2%) ⬆️
garage/sampler/base.py 16.66% <100%> (+0.93%) ⬆️
garage/tf/samplers/batch_sampler.py 77.01% <66.66%> (ø) ⬆️
garage/tf/algos/ddpg.py 85.81% <75%> (+1.92%) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7ec41f...a757dae. Read the comment docs.

@naeioi
Copy link
Copy Markdown
Member Author

naeioi commented Mar 7, 2019

@ryanjulian Because the snapshot is taken by the algorithm and I removed env from the pickled file, it turns out that there's no way to save env. My idea is that the saving env is not very important so I just removed it.

Also, I doubt the experiement resuming really worked previously because algo.train() will randomized all variables. It's neither working currently because run_experiment() still calls algo.train() which is not valid anymore.

if args.resume_from is not None:
data = joblib.load(args.resume_from)
assert 'algo' in data
algo = data['algo']
algo.train()

I plan to resolve this issue together with #516 .

@naeioi naeioi force-pushed the env_spec-in-algo branch from d511f52 to 6a32cff Compare March 7, 2019 23:45
Comment thread garage/experiment/local_tf_runner.py
@ryanjulian
Copy link
Copy Markdown
Member

@naeioi it's quite important to serialize the environment, and we can't really merge this without saving the environment.

Look at https://github.com/rlworkgroup/garage/blob/master/examples/sim_policy.py -- this PR breaks it.

TODO: We should add a test to make sure the snapshotter produces complete .pkl files, so that this PR would fail in the CI immediately.

Reasons:

  1. There's no reason env can't have meaningful parameters which are saved/restored during pickle/unpickle -- for instance, what if I had a PointMass environment with specifiable mass? Not all envs are just strings from openai/gym. The only interface we impose on env that (1) env == pickle.loads(pickle.dumps(env)) and (2) isinstance(env, gym.Env)
  2. A .pkl file should be a total record of an experiment -- it should have everything I need to evaluate an agent in the environment, continue training the agent in the environment, etc. removing Env makes that untrue -- i would have to look at the launcher and be careful to recreate the environment. consider how hard this would be if I decided to procedurally generate my environments!
  3. Perhaps (2) isn't the best way to save experiment state, but right now it's what we have.

@naeioi naeioi force-pushed the env_spec-in-algo branch 4 times, most recently from 154a86c to 4b4595b Compare March 13, 2019 03:45
@naeioi
Copy link
Copy Markdown
Member Author

naeioi commented Mar 13, 2019

@ryanjulian That makes sense. I added a test to verify the presence and integrity of both environment and policy in snapshot. As a workaround of algo not having access to the actual environment, the environment is now saved in save_snapshot() in the runner.

@ryanjulian
Copy link
Copy Markdown
Member

awesome!

Comment thread tests/garage/experiment/test_snapshot.py Outdated
Copy link
Copy Markdown
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work. these PRs are making a huge positive impact on the codebase.

Comment thread garage/experiment/local_tf_runner.py
Comment thread garage/tf/algos/ddpg.py Outdated
@naeioi naeioi force-pushed the env_spec-in-algo branch from 86735a5 to 996a176 Compare March 14, 2019 23:09
Comment thread tests/garage/experiment/test_snapshot.py Outdated
Comment thread tests/garage/experiment/test_snapshot.py Outdated
@naeioi naeioi force-pushed the env_spec-in-algo branch from 8b18cd2 to a8821f0 Compare March 19, 2019 00:01
plot=False,
target_update_tau=0.05,
n_epoch_cycles=20,
max_path_length=100,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this line is deleted in off policy algorithm?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not. I keep n_epoch_cycles so that an algorithm can compute epoch by iteration/n_epoch_cycles. This is a little bit awkward but required in DDPG.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand. I meant is max_path_length=100 not necessary here?

@ryanjulian
Copy link
Copy Markdown
Member

bump. are there any hard blockers on this PR?

@naeioi
Copy link
Copy Markdown
Member Author

naeioi commented Mar 21, 2019

I will have this merged after rebase.

naeioi added 7 commits March 21, 2019 14:22
This commit saves env in runner because algo not longer
has access to the actual env. A test is also added to
test the presence and integrity of both env and policy
in snapshot.

Signed-off-by: Keren Zhu <naeioi@hotmail.com>
@naeioi naeioi force-pushed the env_spec-in-algo branch from 78254b4 to 7a8caa8 Compare March 21, 2019 21:23
Disable not-context-manager check. Pylint has long been having this bug.
See pylint-dev/pylint#782.

Disable c-extension-no-member check. Some C modules don't report their
exported functions and variable in python. In our case, baselines is
using mpi4py which makes pylint complain.
See https://travis-ci.com/rlworkgroup/garage/builds/105359651#L690.
@naeioi
Copy link
Copy Markdown
Member Author

naeioi commented Mar 21, 2019

I disable two pylint checks.

@naeioi naeioi merged commit c4106d8 into master Mar 22, 2019
@naeioi naeioi deleted the env_spec-in-algo branch March 22, 2019 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use specs for constructing algorithms and primitives, not environments

3 participants