-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add DataLoader Based on DataHandler & Add Rolling Process Example & Restructure the Config & Setup_data #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
1ca3c6a
add DataHandlerDL
bxdd b1a2835
black format
bxdd 1fcfe8e
add rolling process data
bxdd f6dc25b
update rolling process
bxdd 4ec3007
update rolling workflow
bxdd efe134e
update workflow
bxdd a04c6bd
balck format
bxdd 68246b3
update workflow
bxdd e119c85
black format
bxdd 9cc3b18
fix but
bxdd d6ff764
black format
bxdd 194217f
fix bug
bxdd 5f60d18
fix config_data bug
bxdd 4ee0240
black format
bxdd 31bc85b
restructure data layer config & setup
bxdd fb7f84f
fix ubg
bxdd 8743576
black format
bxdd d18c367
update README
bxdd 1074284
fix docstring
bxdd 136830b
update comments
bxdd f8da79b
fix readme
bxdd 0236034
fix readme
bxdd 7a2203f
update comments
bxdd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Rolling Process Data | ||
|
|
||
| This workflow is an example for `Rolling Process Data`. | ||
|
|
||
| ## Background | ||
|
|
||
| When rolling train the models, data also needs to be generated in the different rolling windows. When the rolling window moves, the training data will also change, and the processor's learnable state (such as standard deviation, mean, etc.) will also be changed. | ||
|
|
||
| In order to avoid regenerating data, this example uses the `DataHandler-based DataLoader` to load the raw features that are not related to the rolling window, and then used Processors to generate processed-features related to the sliding window. | ||
|
|
||
|
|
||
| ### Run the Code | ||
|
|
||
| Run the example by running the following command: | ||
| ```bash | ||
| python workflow.py rolling_process | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| from qlib.data.dataset.handler import DataHandlerLP | ||
| from qlib.data.dataset.loader import DataLoaderDH | ||
| from qlib.contrib.data.handler import check_transform_proc | ||
|
|
||
|
|
||
| class RollingDataHandler(DataHandlerLP): | ||
| def __init__( | ||
| self, | ||
| start_time=None, | ||
| end_time=None, | ||
| infer_processors=[], | ||
| learn_processors=[], | ||
| fit_start_time=None, | ||
| fit_end_time=None, | ||
| data_loader_kwargs={}, | ||
| ): | ||
| infer_processors = check_transform_proc(infer_processors, fit_start_time, fit_end_time) | ||
| learn_processors = check_transform_proc(learn_processors, fit_start_time, fit_end_time) | ||
|
|
||
| data_loader = { | ||
| "class": "DataLoaderDH", | ||
| "kwargs": {**data_loader_kwargs}, | ||
| } | ||
|
|
||
| super().__init__( | ||
| instruments=None, | ||
| start_time=start_time, | ||
| end_time=end_time, | ||
| data_loader=data_loader, | ||
| infer_processors=infer_processors, | ||
| learn_processors=learn_processors, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| # Copyright (c) Microsoft Corporation. | ||
| # Licensed under the MIT License. | ||
|
|
||
| import qlib | ||
| import fire | ||
| import pickle | ||
| import pandas as pd | ||
|
|
||
| from datetime import datetime | ||
| from qlib.config import REG_CN | ||
| from qlib.data.dataset.handler import DataHandlerLP | ||
| from qlib.contrib.data.handler import Alpha158 | ||
| from qlib.utils import exists_qlib_data, init_instance_by_config | ||
| from qlib.tests.data import GetData | ||
|
|
||
|
|
||
| class RollingDataWorkflow(object): | ||
you-n-g marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| MARKET = "csi300" | ||
| start_time = "2010-01-01" | ||
| end_time = "2019-12-31" | ||
| rolling_cnt = 5 | ||
|
|
||
| def _init_qlib(self): | ||
| """initialize qlib""" | ||
| # use yahoo_cn_1min data | ||
| provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir | ||
| if not exists_qlib_data(provider_uri): | ||
| print(f"Qlib data is not found in {provider_uri}") | ||
| GetData().qlib_data(target_dir=provider_uri, region=REG_CN) | ||
| qlib.init(provider_uri=provider_uri, region=REG_CN) | ||
|
|
||
| def _dump_pre_handler(self, path): | ||
| handler_config = { | ||
| "class": "Alpha158", | ||
| "module_path": "qlib.contrib.data.handler", | ||
| "kwargs": { | ||
| "start_time": self.start_time, | ||
| "end_time": self.end_time, | ||
| "instruments": self.MARKET, | ||
| "infer_processors": [], | ||
| "learn_processors": [], | ||
| }, | ||
| } | ||
| pre_handler = init_instance_by_config(handler_config) | ||
| pre_handler.config(dump_all=True) | ||
| pre_handler.to_pickle(path) | ||
|
|
||
| def _load_pre_handler(self, path): | ||
| with open(path, "rb") as file_dataset: | ||
| pre_handler = pickle.load(file_dataset) | ||
| return pre_handler | ||
|
|
||
| def rolling_process(self): | ||
| self._init_qlib() | ||
| self._dump_pre_handler("pre_handler.pkl") | ||
| pre_handler = self._load_pre_handler("pre_handler.pkl") | ||
|
|
||
| train_start_time = (2010, 1, 1) | ||
| train_end_time = (2012, 12, 31) | ||
| valid_start_time = (2013, 1, 1) | ||
| valid_end_time = (2013, 12, 31) | ||
| test_start_time = (2014, 1, 1) | ||
| test_end_time = (2014, 12, 31) | ||
|
|
||
| dataset_config = { | ||
| "class": "DatasetH", | ||
| "module_path": "qlib.data.dataset", | ||
| "kwargs": { | ||
| "handler": { | ||
| "class": "RollingDataHandler", | ||
| "module_path": "rolling_handler", | ||
| "kwargs": { | ||
| "start_time": datetime(*train_start_time), | ||
| "end_time": datetime(*test_end_time), | ||
| "fit_start_time": datetime(*train_start_time), | ||
| "fit_end_time": datetime(*train_end_time), | ||
| "infer_processors": [ | ||
| {"class": "RobustZScoreNorm", "kwargs": {"fields_group": "feature"}}, | ||
| ], | ||
| "learn_processors": [ | ||
| {"class": "DropnaLabel"}, | ||
| {"class": "CSZScoreNorm", "kwargs": {"fields_group": "label"}}, | ||
| ], | ||
| "data_loader_kwargs": { | ||
| "handler_config": pre_handler, | ||
| }, | ||
| }, | ||
| }, | ||
| "segments": { | ||
| "train": (datetime(*train_start_time), datetime(*train_end_time)), | ||
| "valid": (datetime(*valid_start_time), datetime(*valid_end_time)), | ||
| "test": (datetime(*test_start_time), datetime(*test_end_time)), | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| dataset = init_instance_by_config(dataset_config) | ||
|
|
||
| for rolling_offset in range(self.rolling_cnt): | ||
|
|
||
| print(f"===========rolling{rolling_offset} start===========") | ||
| if rolling_offset: | ||
| dataset.config( | ||
| handler_kwargs={ | ||
| "start_time": datetime(train_start_time[0] + rolling_offset, *train_start_time[1:]), | ||
| "end_time": datetime(test_end_time[0] + rolling_offset, *test_end_time[1:]), | ||
| "processor_kwargs": { | ||
| "fit_start_time": datetime(train_start_time[0] + rolling_offset, *train_start_time[1:]), | ||
| "fit_end_time": datetime(train_end_time[0] + rolling_offset, *train_end_time[1:]), | ||
| }, | ||
| }, | ||
| segments={ | ||
| "train": ( | ||
| datetime(train_start_time[0] + rolling_offset, *train_start_time[1:]), | ||
| datetime(train_end_time[0] + rolling_offset, *train_end_time[1:]), | ||
| ), | ||
| "valid": ( | ||
| datetime(valid_start_time[0] + rolling_offset, *valid_start_time[1:]), | ||
| datetime(valid_end_time[0] + rolling_offset, *valid_end_time[1:]), | ||
| ), | ||
| "test": ( | ||
| datetime(test_start_time[0] + rolling_offset, *test_start_time[1:]), | ||
| datetime(test_end_time[0] + rolling_offset, *test_end_time[1:]), | ||
| ), | ||
| }, | ||
| ) | ||
| dataset.setup_data( | ||
| handler_kwargs={ | ||
| "init_type": DataHandlerLP.IT_FIT_SEQ, | ||
| } | ||
| ) | ||
|
|
||
| dtrain, dvalid, dtest = dataset.prepare(["train", "valid", "test"]) | ||
| print(dtrain, dvalid, dtest) | ||
| ## print or dump data | ||
| print(f"===========rolling{rolling_offset} end===========") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| fire.Fire(RollingDataWorkflow) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.