Skip to content

Gather CV Results as Completed#433

Merged
TomAugspurger merged 88 commits into
dask:masterfrom
vecchp:gather_results_as_completed
May 20, 2019
Merged

Gather CV Results as Completed#433
TomAugspurger merged 88 commits into
dask:masterfrom
vecchp:gather_results_as_completed

Conversation

@vecchp

@vecchp vecchp commented Dec 1, 2018

Copy link
Copy Markdown

In progress PR to address #415.

@TomAugspurger

Copy link
Copy Markdown
Member

The CI failures can probably be ignored. I'll try to get them fixed next week.

@vecchp can you think of a way to test this? The best I've come up with is a custom estimator class, that when given a specific hyper parameter combination does something like

get_worker inside the task and closes the worker, to simulate it going away. https://distributed.dask.org/en/latest/api.html#distributed.get_worker. Haven't tested that out yet.

@jcrist will you have time next week to give feedback on this approach?

@vecchp

vecchp commented Dec 2, 2018

Copy link
Copy Markdown
Author

@TomAugspurger I'll think a little about the tests, though your current suggestion seems reasonable.

Comment thread dask_ml/model_selection/_search.py Outdated
Comment thread dask_ml/model_selection/_search.py Outdated
Comment thread dask_ml/model_selection/_search.py Outdated
Comment thread dask_ml/model_selection/_search.py Outdated
Comment thread dask_ml/model_selection/_search.py Outdated
@vecchp

vecchp commented Dec 5, 2018

Copy link
Copy Markdown
Author

@jcrist @TomAugspurger made some changes w.r.t the PR comments. On my end all tests pass except for test_cache_cv. I believe the failure is due to the cache being hit more than the previous expected values given the current changes.

build_graph is now broken into build_cv_graph & build_result_graph, I had to return main_token, X_name, y_name, weights, fit_params and then pass it to build_result_graph which seems a bit messy, but I haven't thought of a better way. What are your thoughts on these changes now?

I also haven't got around to confirming if the futures no longer need to be canceled or writing robust tests for the new as_completed feature.

@vecchp

vecchp commented Apr 18, 2019

Copy link
Copy Markdown
Author

@TomAugspurger checking in again to see what else should be done. All tests pass.

@TomAugspurger

TomAugspurger commented Apr 18, 2019 via email

Copy link
Copy Markdown
Member

@TomAugspurger

Copy link
Copy Markdown
Member

All tests pass.

It seems like coverage is decreasing. Does that make sense, or is this a false positive? https://codecov.io/gh/dask/dask-ml/compare/cfa57e5789fc29bdcee5d846b5116c07f5fbe292...9da5b8f0cfa5960a7ee70f216a684bff96a450b2/changes

@vecchp

vecchp commented May 4, 2019

Copy link
Copy Markdown
Author

All tests pass.

It seems like coverage is decreasing. Does that make sense, or is this a false positive? https://codecov.io/gh/dask/dask-ml/compare/cfa57e5789fc29bdcee5d846b5116c07f5fbe292...9da5b8f0cfa5960a7ee70f216a684bff96a450b2/changes

Makes sense, looks like it's dead code given the as_completed implementation no longer uses it. I'll update.

@vecchp

vecchp commented May 4, 2019

Copy link
Copy Markdown
Author

@TomAugspurger looks like all stages pass. I think codecov was removed but that should be back to normal as well given the dead code is now gone.

Comment thread dask_ml/model_selection/_search.py Outdated
from toolz import get, pluck

try:
from dask.distributed import as_completed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this try/except needed? distributed is a dependency now (#466).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with removing the optional check. Previous guidance was to make it option as it wasn't a dependency at the time.

@vecchp

vecchp commented May 7, 2019

Copy link
Copy Markdown
Author

@stsievert all optional distributed checks in this PR should be removed now.

@TomAugspurger any guidance left to get this merged?

@TomAugspurger TomAugspurger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vecchp, and apologies once again for the delay on reviewing.

This looks quite good. My only request is a small compatibility layer for build_graph.

return fit_params


def build_cv_graph(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide a compatibility shim with the old name? I believe that TPOT is using build_graph https://github.com/EpistasisLab/tpot/blob/b626271e6b5896a73fb9d7d29bebc7aa9100772e/tpot/gp_deap.py#L429

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just add back build_graph with a comment that it's going to be removed (not even a deprecation warning). This is a private module, but I was lazy when adding dask support to TPOT. Once this is in, I can cut a release and make a PR to tpot updating their call.

@TomAugspurger

Copy link
Copy Markdown
Member

I'm looking into the compatibility for tpot now. Will push an update.

@vecchp

vecchp commented May 20, 2019

Copy link
Copy Markdown
Author

Sounds good!

@TomAugspurger TomAugspurger merged commit 954d41f into dask:master May 20, 2019
@TomAugspurger

Copy link
Copy Markdown
Member

Thanks @vecchp!

@vecchp

vecchp commented May 21, 2019

Copy link
Copy Markdown
Author

@TomAugspurger thanks so much for getting this in!

@vecchp vecchp deleted the gather_results_as_completed branch May 21, 2019 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants