|
| 1 | +# Data Collector |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +Scripts for data collection |
| 6 | + |
| 7 | +- yahoo: get *US/CN* stock data from *Yahoo Finance* |
| 8 | +- fund: get fund data from *http://fund.eastmoney.com* |
| 9 | +- cn_index: get *CN index* from *http://www.csindex.com.cn*, *CSI300*/*CSI100* |
| 10 | +- us_index: get *US index* from *https://en.wikipedia.org/wiki*, *SP500*/*NASDAQ100*/*DJIA*/*SP400* |
| 11 | +- contrib: scripts for some auxiliary functions |
| 12 | + |
| 13 | + |
| 14 | +## Custom Data Collection |
| 15 | + |
| 16 | +> Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo |
| 17 | +
|
| 18 | +1. Create a dataset code directory in the current directory |
| 19 | +2. Add `collector.py` |
| 20 | + - add collector class: |
| 21 | + ```python |
| 22 | + CUR_DIR = Path(__file__).resolve().parent |
| 23 | + sys.path.append(str(CUR_DIR.parent.parent)) |
| 24 | + from data_collector.base import BaseCollector, BaseNormalize, BaseRun |
| 25 | + class UserCollector(BaseCollector): |
| 26 | + ... |
| 27 | + ``` |
| 28 | + - add normalize class: |
| 29 | + ```python |
| 30 | + class UserNormalzie(BaseNormalize): |
| 31 | + ... |
| 32 | + ``` |
| 33 | + - add `CLI` class: |
| 34 | + ```python |
| 35 | + class Run(BaseRun): |
| 36 | + ... |
| 37 | + ``` |
| 38 | +3. add `README.md` |
| 39 | +4. add `requirements.txt` |
| 40 | + |
| 41 | + |
| 42 | +## Description of dataset |
| 43 | + |
| 44 | + | | Basic data | |
| 45 | + |------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------| |
| 46 | + | Features | **Price/Volume**: <br> - $close/$open/$low/$high/$volume/$change/$factor | |
| 47 | + | Calendar | **\<freq>.txt**: <br> - day.txt<br> - 1min.txt | |
| 48 | + | Instruments | **\<market>.txt**: <br> - required: **all.txt**; <br> - csi300.txt/csi500.txt/sp500.txt | |
| 49 | + |
| 50 | + - `Features`: data, **digital** |
| 51 | + - if not **adjusted**, **factor=1** |
| 52 | + |
| 53 | +### Data-dependent component |
| 54 | + |
| 55 | +> To make the component running correctly, the dependent data are required |
| 56 | + |
| 57 | + | Component | required data | |
| 58 | + |---------------------------------------------------|--------------------------------| |
| 59 | + | Data retrieval | Features, Calendar, Instrument | |
| 60 | + | Backtest | **Features[Price/Volume]**, Calendar, Instruments | |
0 commit comments