-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add DataLoader Based on DataHandler & Add Rolling Process Example & Restructure the Config & Setup_data #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
1ca3c6a
add DataHandlerDL
bxdd b1a2835
black format
bxdd 1fcfe8e
add rolling process data
bxdd f6dc25b
update rolling process
bxdd 4ec3007
update rolling workflow
bxdd efe134e
update workflow
bxdd a04c6bd
balck format
bxdd 68246b3
update workflow
bxdd e119c85
black format
bxdd 9cc3b18
fix but
bxdd d6ff764
black format
bxdd 194217f
fix bug
bxdd 5f60d18
fix config_data bug
bxdd 4ee0240
black format
bxdd 31bc85b
restructure data layer config & setup
bxdd fb7f84f
fix ubg
bxdd 8743576
black format
bxdd d18c367
update README
bxdd 1074284
fix docstring
bxdd 136830b
update comments
bxdd f8da79b
fix readme
bxdd 0236034
fix readme
bxdd 7a2203f
update comments
bxdd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,17 @@ | ||
| # Rolling Process Data | ||
|
|
||
| This workflow is an example for `Rolling Process Data`. | ||
|
|
||
| ## Background | ||
|
|
||
| When rolling train the models, data also needs to be generated in the different rolling windows. When the rolling window moves, the training data will change, and the processor's learnable state (such as standard deviation, mean, etc.) will also change. | ||
|
|
||
| In order to avoid regenerating data, this example uses the `DataHandler-based DataLoader` to load the raw features that are not related to the rolling window, and then used Processors to generate processed-features related to the rolling window. | ||
|
|
||
|
|
||
| ## Run the Code | ||
|
|
||
| Run the example by running the following command: | ||
| ```bash | ||
| python workflow.py rolling_process | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,6 +6,7 @@ | |
| import bisect | ||
| import logging | ||
| import warnings | ||
| from inspect import getfullargspec | ||
| from typing import Union, Tuple, List, Iterator, Optional | ||
|
|
||
| import pandas as pd | ||
|
|
@@ -99,10 +100,10 @@ def __init__( | |
| self.fetch_orig = fetch_orig | ||
| if init_data: | ||
| with TimeInspector.logt("Init data"): | ||
| self.init() | ||
| self.setup_data() | ||
| super().__init__() | ||
|
|
||
| def conf_data(self, **kwargs): | ||
| def config(self, **kwargs): | ||
| """ | ||
| configuration of data. | ||
| # what data to be loaded from data source | ||
|
|
@@ -116,9 +117,15 @@ def conf_data(self, **kwargs): | |
| if k in attr_list: | ||
| setattr(self, k, v) | ||
|
|
||
| def init(self, enable_cache: bool = False): | ||
| for attr in attr_list: | ||
| if attr in kwargs: | ||
| kwargs.pop(attr) | ||
|
|
||
| super().config(**kwargs) | ||
|
|
||
| def setup_data(self, enable_cache: bool = False): | ||
| """ | ||
| initialize the data. | ||
| Set Up the data. | ||
| In case of running intialization for multiple time, it will do nothing for the second time. | ||
|
||
|
|
||
| It is responsible for maintaining following variable | ||
|
|
@@ -403,7 +410,7 @@ def process_data(self, with_fit: bool = False): | |
| if self.drop_raw: | ||
| del self._data | ||
|
|
||
| def conf_data(self, **kwargs): | ||
| def config(self, processor_kwargs: dict = None, **kwargs): | ||
| """ | ||
| configuration of data. | ||
| # what data to be loaded from data source | ||
|
|
@@ -412,27 +419,19 @@ def conf_data(self, **kwargs): | |
| The data will be initialized with different time range. | ||
|
|
||
| """ | ||
| attr_list = {"fit_start_time", "fit_end_time"} | ||
| for k, v in kwargs.items(): | ||
| if k in attr_list: | ||
| for infer_processor in self.infer_processors: | ||
| if getattr(infer_processor, k, None): | ||
| setattr(infer_processor, k, v) | ||
|
|
||
| for learn_processor in self.learn_processors: | ||
| if getattr(learn_processor, k, None): | ||
| setattr(learn_processor, k, v) | ||
|
|
||
| super().conf_data(**kwargs) | ||
| super().config(**kwargs) | ||
| if processor_kwargs is not None: | ||
| for processor in self.get_all_processors(): | ||
| processor.config(**processor_kwargs) | ||
|
|
||
| # init type | ||
| IT_FIT_SEQ = "fit_seq" # the input of `fit` will be the output of the previous processor | ||
| IT_FIT_IND = "fit_ind" # the input of `fit` will be the original df | ||
| IT_LS = "load_state" # The state of the object has been load by pickle | ||
|
|
||
| def init(self, init_type: str = IT_FIT_SEQ, enable_cache: bool = False): | ||
| def setup_data(self, init_type: str = IT_FIT_SEQ, **kwargs): | ||
| """ | ||
| Initialize the data of Qlib | ||
| Set up the data of Qlib | ||
|
|
||
| Parameters | ||
| ---------- | ||
|
|
@@ -447,7 +446,7 @@ def init(self, init_type: str = IT_FIT_SEQ, enable_cache: bool = False): | |
| when we call `init` next time | ||
| """ | ||
| # init raw data | ||
| super().init(enable_cache=enable_cache) | ||
| super().setup_data(**kwargs) | ||
|
|
||
| with TimeInspector.logt("fit & process data"): | ||
| if init_type == DataHandlerLP.IT_FIT_IND: | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.