Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
49541c6
Init todo
you-n-g Jul 17, 2024
f3b097e
update all code
WinstonLiyt Jul 18, 2024
61dc8ce
update
WinstonLiyt Jul 18, 2024
6481278
Extract factors from financial reports loop finished
WinstonLiyt Jul 19, 2024
16e3b3a
Merge branch 'main' of https://github.com/microsoft/RD-Agent into fix…
WinstonLiyt Jul 19, 2024
f2d031e
Merge branch 'main' of https://github.com/microsoft/RD-Agent into fix…
WinstonLiyt Jul 19, 2024
ce30c04
Fix two small bugs.
WinstonLiyt Jul 19, 2024
aa2ffac
Delete rdagent/app/qlib_rd_loop/run_script.sh
WinstonLiyt Jul 19, 2024
cecb4c5
Minor mod
you-n-g Jul 19, 2024
61d352f
Delete rdagent/app/qlib_rd_loop/nohup.out
you-n-g Jul 19, 2024
367a1ce
Fix a small bug in file reading.
WinstonLiyt Jul 22, 2024
7887905
some updates
WinstonLiyt Jul 22, 2024
d5f36d9
Update the detailed process and prompt of factor loop.
WinstonLiyt Jul 22, 2024
b4594ef
Merge branch 'main' into fix_some_errors_when_debug_factor
WinstonLiyt Jul 22, 2024
aa4c7e5
Evaluation & dataset
taozhiwang Jul 23, 2024
6d022b8
Optimize the prompt for generating hypotheses and feedback in the fac…
WinstonLiyt Jul 23, 2024
c51a6f0
Generate new data
taozhiwang Jul 23, 2024
90bd7e3
dataset generation
taozhiwang Jul 24, 2024
4fd9733
Performed further optimizations on the factor loop and report extract…
WinstonLiyt Jul 24, 2024
1da2635
Merge branch 'main' into fix_some_errors_when_debug_factor
WinstonLiyt Jul 24, 2024
1d66f16
Update rdagent/components/coder/factor_coder/CoSTEER/evaluators.py
you-n-g Jul 24, 2024
b1bdfdd
Update package.txt for fitz.
WinstonLiyt Jul 24, 2024
50a8ff0
Merge branch 'fix_some_errors_when_debug_factor' of https://github.co…
WinstonLiyt Jul 24, 2024
864f5a0
add the result
taozhiwang Jul 24, 2024
048c6fe
Performed further optimizations on the factor loop and report extract…
WinstonLiyt Jul 24, 2024
f9b57b9
Analysis
taozhiwang Jul 24, 2024
b9d9194
Optimized log output.
WinstonLiyt Jul 24, 2024
9218e5f
Merge branch 'fix_some_errors_when_debug_factor' of https://github.co…
WinstonLiyt Jul 24, 2024
ec5cc64
Merge branch 'fix_some_errors_when_debug_factor' into main
WinstonLiyt Jul 24, 2024
db82b67
Factor update
taozhiwang Jul 24, 2024
dcb7e07
Optimized log output.
WinstonLiyt Jul 24, 2024
265b6b3
A draft of the "Quick Start" section for README
WinstonLiyt Jul 24, 2024
39282eb
Merge branch 'main' of https://github.com/microsoft/RD-Agent into doc…
WinstonLiyt Jul 24, 2024
68f0a75
Add scenario descriptions.
WinstonLiyt Jul 24, 2024
52dc938
Updates
taozhiwang Jul 25, 2024
11980dc
Adjust content
you-n-g Jul 25, 2024
12c0eba
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 25, 2024
c9809f2
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
98906af
Enable logging of backtesting in Qlib and store rich-text description…
WinstonLiyt Jul 25, 2024
b97f24f
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 25, 2024
b7a04c2
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
702c830
Reformat analysis.py
taozhiwang Jul 25, 2024
ac80c93
CI fix
taozhiwang Jul 25, 2024
eb1c04e
Refactor
you-n-g Jul 25, 2024
f9295e0
remove useless code
you-n-g Jul 25, 2024
cab4f46
Merge branch 'benchmark'
taozhiwang Jul 25, 2024
d2770c6
fix bugs (#111)
SH-Src Jul 25, 2024
f4d553a
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
22b176b
Fix two small bugs.
WinstonLiyt Jul 25, 2024
26f2f74
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 25, 2024
f44e4ae
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
fb1478e
Fix a merge bug.
WinstonLiyt Jul 25, 2024
09e2d88
Fix two small bugs.
WinstonLiyt Jul 26, 2024
33b70e2
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 26, 2024
cf568a5
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 26, 2024
05869ce
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 26, 2024
9c64f14
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 26, 2024
787450c
fix some bugs.
WinstonLiyt Jul 29, 2024
b36e1cf
Fix some format bugs.
WinstonLiyt Jul 29, 2024
3e42a7b
Restore a file.
WinstonLiyt Jul 29, 2024
87dba2d
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 29, 2024
fb28226
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 29, 2024
ad7d18d
Fix a format bug.
WinstonLiyt Jul 29, 2024
9384937
draft renew of evaluators
WinstonLiyt Jul 30, 2024
557f3a7
fix a small bug.
WinstonLiyt Jul 30, 2024
a06d7f4
fix a small bug
WinstonLiyt Jul 30, 2024
13df05e
Support Factor Report Loop
you-n-g Jul 30, 2024
0e7a90f
Update framework for extracting factors from research reports.
WinstonLiyt Jul 30, 2024
5860055
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 30, 2024
5f5675a
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 30, 2024
2a07947
Refactor report-based factor extraction and fix minor bugs.
WinstonLiyt Aug 1, 2024
f591636
fix a small bug of log.
WinstonLiyt Aug 1, 2024
4bdb1de
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Aug 1, 2024
4f743b2
Merge branch 'main' into docs_and_demo
WinstonLiyt Aug 1, 2024
34f335a
change some prompts
WinstonLiyt Aug 1, 2024
6dc4369
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Aug 1, 2024
a8cb022
Merge branch 'main' into docs_and_demo
WinstonLiyt Aug 1, 2024
f7046b0
improve factor_runner
WinstonLiyt Aug 2, 2024
ea5e114
fix a small bug
WinstonLiyt Aug 2, 2024
2ef60e9
change some prompts
WinstonLiyt Aug 2, 2024
6fd15a7
cancel some comments
WinstonLiyt Aug 2, 2024
a883a78
cancel some comments and fix some bugs
WinstonLiyt Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Generate new data
  • Loading branch information
taozhiwang committed Jul 23, 2024
commit c51a6f08a2b73f547697940c44022b4b5567b769
63 changes: 63 additions & 0 deletions rdagent/components/benchmark/analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import pickle
import os
import pandas as pd
import matplotlib.pyplot as plt

# Function to load and process each pickle file
def process_pickle_file(file_path):
try:
with open(file_path, 'rb') as file:
data = pickle.load(file)
# Assuming data is a DataFrame or similar
print(f"Data from {file_path} processed successfully.")
return data
except Exception as e:
print(f"Error processing {file_path}: {e}")
return None

def analysis(folder_path):
success_count = 0
fail_count = 0

# Logging the errors
error_log = open("error_log.log", "w")

# List to store data for visualization
data_frames = []

# Processing each file in the directory
for file_name in os.listdir(folder_path):
file_path = os.path.join(folder_path, file_name)
data = process_pickle_file(file_path)
if data is not None:
data_frames.append(data)

for df in data_frames:
if 'Execution succeeded' in df[0]:
success_count += 1
else:
fail_count += 1
error_log.write(f"{file_path}: \n{df[0]}\n")

# Writing summary
print(f"Number of successful files: {success_count}")
print(f"Number of failed files: {fail_count}")

# Closing the error log file
error_log.close()

def view_pickle_file(folder_path):
for file_name in os.listdir(folder_path):
file_path = os.path.join(folder_path, file_name)

print(f'the path of this file is: {file_path}\n')
with open(file_path, 'rb') as file:
data = pickle.load(file)
for i in range(len(data)):
print(data[i])


if __name__ == '__main__':
folder_path = '/data/userdata/v-taozhiwang/RD-Agent/git_ignore_folder/factor_implementation_execution_cache'

analysis(folder_path)
6 changes: 3 additions & 3 deletions rdagent/components/benchmark/example.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"20-day turnover rate": "Average turnover rate over the past 20 days.",
"Market Capitalization": "Total market value of a company's outstanding shares."
},
"gt_code": "import pandas as pd\n\ndata_f = pd.read_hdf('daily_f.h5')\n\ndata = data_f.reset_index()\nwindow_size = 20\n\nnominator=data.groupby('instrument')[['30\u65e5\u6362\u624b\u7387']].rolling(window=window_size).mean().reset_index(0, drop=True)\n# transfer to series\nnew=nominator['30\u65e5\u6362\u624b\u7387']\ndata['Turnover_Rate_Factor']=new/data['\u6d41\u901aA\u80a1']\n\n# # set the datetime and instrument as index and drop the original index\nresult=pd.DataFrame(data['Turnover_Rate_Factor']).set_index(data_f.index)\n\n# transfer the result to series\nresult=result['Turnover_Rate_Factor']\nresult.to_hdf(\"result.h5\", key=\"data\")\n"
"gt_code": "import pandas as pd\n\ndata_f = pd.read_hdf('daily_f.h5')\n\ndata = data_f.reset_index()\nwindow_size = 20\n\nnominator=data.groupby('instrument')[['TurnoverRate_30D']].rolling(window=window_size).mean().reset_index(0, drop=True)\n# transfer to series\nnew=nominator['TurnoverRate_30D']\ndata['Turnover_Rate_Factor']=new/data['TradableACapital']\n\n# set the datetime and instrument as index and drop the original index\nresult=pd.DataFrame(data['Turnover_Rate_Factor']).set_index(data_f.index)\n\n# transfer the result to series\nresult=result['Turnover_Rate_Factor']\nresult.to_hdf(\"result.h5\", key=\"data\")"
},
"PctTurn20": {
"description": "A factor representing the percentage change in turnover rate over the past 20 trading days, market-value neutralized.",
Expand All @@ -16,7 +16,7 @@
"Turnover_{i, t}": "Turnover of stock i at day t.",
"Turnover_{i, t-20}": "Turnover of stock i at day t-20."
},
"gt_code": "import pandas as pd\nfrom statsmodels import api as sm\n\n\ndef fill_mean(s: pd.Series) -> pd.Series:\n return s.fillna(s.mean()).fillna(0.0)\n\n\ndef market_value_neutralize(s: pd.Series, mv: pd.Series) -> pd.Series:\n s = s.groupby(\"datetime\", group_keys=False).apply(fill_mean)\n mv = mv.groupby(\"datetime\", group_keys=False).apply(fill_mean)\n\n df_f = mv.to_frame(\"\u5e02\u503c\")\n df_f[\"const\"] = 1\n X = df_f[[\"\u5e02\u503c\", \"const\"]]\n\n # Perform the Ordinary Least Squares (OLS) regression\n model = sm.OLS(s, X)\n results = model.fit()\n\n # Calculate the residuals\n df_f[\"residual\"] = results.resid\n df_f[\"norm_resi\"] = df_f.groupby(level=\"datetime\", group_keys=False)[\"residual\"].apply(\n lambda x: (x - x.mean()) / x.std(),\n )\n return df_f[\"norm_resi\"]\n\n\n# get_turnover\ndf_pv = pd.read_hdf(\"daily_pv.h5\", key=\"data\")\ndf_f = pd.read_hdf(\"daily_f.h5\", key=\"data\")\nturnover = df_pv[\"$money\"] / df_f[\"\u6d41\u901a\u5e02\u503c\"]\n\nf = turnover.groupby(\"instrument\").pct_change(periods=20)\n\nf_neutralized = market_value_neutralize(f, df_f[\"\u6d41\u901a\u5e02\u503c\"])\n\nf_neutralized.to_hdf(\"result.h5\", key=\"data\")\n"
"gt_code": "import pandas as pd\nfrom statsmodels import api as sm\n\ndef fill_mean(s: pd.Series) -> pd.Series:\n return s.fillna(s.mean()).fillna(0.0)\n\ndef market_value_neutralize(s: pd.Series, mv: pd.Series) -> pd.Series:\n s = s.groupby(\"datetime\", group_keys=False).apply(fill_mean)\n mv = mv.groupby(\"datetime\", group_keys=False).apply(fill_mean)\n\n df_f = mv.to_frame(\"MarketValue\")\n df_f[\"const\"] = 1\n X = df_f[[\"MarketValue\", \"const\"]]\n\n # Perform the Ordinary Least Squares (OLS) regression\n model = sm.OLS(s, X)\n results = model.fit()\n\n # Calculate the residuals\n df_f[\"residual\"] = results.resid\n df_f[\"norm_resi\"] = df_f.groupby(level=\"datetime\", group_keys=False)[\"residual\"].apply(\n lambda x: (x - x.mean()) / x.std(),\n )\n return df_f[\"norm_resi\"]\n\n\n# get_turnover\ndf_pv = pd.read_hdf(\"daily_pv.h5\", key=\"data\")\ndf_f = pd.read_hdf(\"daily_f.h5\", key=\"data\")\nturnover = df_pv[\"$money\"] / df_f[\"TradableMarketValue\"]\n\nf = turnover.groupby(\"instrument\").pct_change(periods=20)\n\nf_neutralized = market_value_neutralize(f, df_f[\"TradableMarketValue\"])\n\nf_neutralized.to_hdf(\"result.h5\", key=\"data\")"
},
"PB_ROE": {
"description": "Constructed using the ranking difference between PB and ROE, with PB and ROE replacing original PB and ROE to obtain reconstructed factor values.",
Expand All @@ -25,6 +25,6 @@
"\\text{rank}(PB_t)": "Ranking PB on cross-section at time t.",
"\\text{rank}(ROE_t)": "Ranking single-quarter ROE on cross-section at time t."
},
"gt_code": "#!/usr/bin/env python\n\nimport pandas as pd\n\ndata_f = pd.read_hdf('daily_f.h5')\n\ndata = data_f.reset_index()\n\n# Calculate the rank of PB and ROE\ndata['PB_rank'] = data.groupby('datetime')['B/P'].rank()\ndata['ROE_rank'] = data.groupby('datetime')['ROE'].rank()\n\n# Calculate the difference between the ranks\ndata['PB_ROE'] = data['PB_rank'] - data['ROE_rank']\n\n# set the datetime and instrument as index and drop the original index\nresult=pd.DataFrame(data['PB_ROE']).set_index(data_f.index)\n\n# transfer the result to series\nresult=result['PB_ROE']\nresult.to_hdf(\"result.h5\", key=\"data\")\n"
"gt_code": "#!/usr/bin/env python\n\nimport pandas as pd\n\ndata_f = pd.read_hdf('daily_f.h5')\n\ndata = data_f.reset_index()\n\n# Calculate the rank of PB and ROE\ndata['PB_rank'] = data.groupby('datetime')['B/P'].rank()\ndata['ROE_rank'] = data.groupby('datetime')['ROE'].rank()\n\n# Calculate the difference between the ranks\ndata['PB_ROE'] = data['PB_rank'] - data['ROE_rank']\n\n# set the datetime and instrument as index and drop the original index\nresult=pd.DataFrame(data['PB_ROE']).set_index(data_f.index)\n\n# transfer the result to series\nresult=result['PB_ROE']\nresult.to_hdf(\"result.h5\", key=\"data\")"
}
}