replace multi processing with joblib#477
replace multi processing with joblib#477you-n-g merged 11 commits intomicrosoft:nested_decision_exefrom
Conversation
5100a5a to
ef7fe8a
Compare
1b7080f to
4a62e02
Compare
6dcbf51 to
4ffb05a
Compare
qlib/data/updateparallel.py
Outdated
| require=None, | ||
| maxtasksperchild=None, | ||
| **kwargs) | ||
| self._backend_args["maxtasksperchild"] = ["maxtasksperchild"] |
There was a problem hiding this comment.
self._backend_args["maxtasksperchild"] = maxtasksperchild
There was a problem hiding this comment.
if isinstance(self._backend, MultiprocessingBackend):
self._backend_args["maxtasksperchild"] = maxtasksperchild
qlib/data/updateparallel.py
Outdated
| from joblib import Parallel | ||
|
|
||
|
|
||
| class UpdateParallel(Parallel): |
There was a problem hiding this comment.
UpdateParallel moves to qlib/utils/__init__.py
UpdateParllel renamed to ParallelExt
There was a problem hiding this comment.
https://github.com/microsoft/qlib/blob/main/qlib/utils/paral.py will be a better place
qlib/data/updateparallel.py
Outdated
| maxtasksperchild=None, | ||
| **kwargs | ||
| ): | ||
| super(UpdateParallel, self).__init__(n_jobs=n_jobs, |
There was a problem hiding this comment.
super(UpdateParallel, self).__init__(
n_jobs=n_jobs,
backend=backend,
verbose=verbose,
timeout=timeout,
pre_dispatch=pre_dispatch,
batch_size=batch_size,
temp_folder=temp_folder,
max_nbytes=max_nbytes,
mmap_mode=mmap_mode,
prefer=prefer,
require=require,
)
qlib/data/updateparallel.py
Outdated
| backend=None, | ||
| verbose=0, | ||
| timeout=None, | ||
| pre_dispatch="2 * n_jobs", |
There was a problem hiding this comment.
Why not using *args, **kwargs instead of explicitly list all the arguments?
qlib/data/updateparallel.py
Outdated
| from joblib import Parallel | ||
|
|
||
|
|
||
| class UpdateParallel(Parallel): |
There was a problem hiding this comment.
https://github.com/microsoft/qlib/blob/main/qlib/utils/paral.py will be a better place
qlib/config.py
Outdated
| "kernels": NUM_USABLE_CPU, | ||
| # How many tasks belong to one process. Recommend 1 for high-frequency data and None for daily data. | ||
| "maxtasksperchild": None, | ||
| "joblib_backend" : None, |
There was a problem hiding this comment.
Can we set the default backend to multiprocessing if loky is very likely to OOM?
qlib/data/updateparallel.py
Outdated
| @@ -0,0 +1,41 @@ | |||
| from joblib import Parallel | |||
There was a problem hiding this comment.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
tests/misc/test_get_multi_proc.py
Outdated
| """ | ||
| For testing if it will raise error | ||
| """ | ||
| qlib.init(provider_uri=TestAutoData.provider_uri, expression_cache=None, dataset_cache=None) |
There was a problem hiding this comment.
You have to use loky to pass the test
* replace multi processing with joblib * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * Fix Parallel support for maxtasksperchild Co-authored-by: wangw <1666490690@qq.com> Co-authored-by: zhupr <zhu.pengrong@foxmail.com>
…flow (microsoft#477) * several improvement on kaggle loop * small refinement on prompt * fix bugs * add the score of each model in every experiment * fix ci error * fix error in ventilator tpl * fix CI --------- Co-authored-by: Xu Yang <xuyang1@microsoft.com> Co-authored-by: Bowen Xian <xianbowen@outlook.com> Co-authored-by: WinstonLiye <1957922024@qq.com> Co-authored-by: TPLin22 <tplin2@163.com>
Description
Multiprocessing has following weakness
Joblib has no above problems.
So we try to replace multi processing with joblib
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.pyunder upper directory ofqlib.Types of changes