Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
49541c6
Init todo
you-n-g Jul 17, 2024
f3b097e
update all code
WinstonLiyt Jul 18, 2024
61dc8ce
update
WinstonLiyt Jul 18, 2024
6481278
Extract factors from financial reports loop finished
WinstonLiyt Jul 19, 2024
16e3b3a
Merge branch 'main' of https://github.com/microsoft/RD-Agent into fix…
WinstonLiyt Jul 19, 2024
f2d031e
Merge branch 'main' of https://github.com/microsoft/RD-Agent into fix…
WinstonLiyt Jul 19, 2024
ce30c04
Fix two small bugs.
WinstonLiyt Jul 19, 2024
aa2ffac
Delete rdagent/app/qlib_rd_loop/run_script.sh
WinstonLiyt Jul 19, 2024
cecb4c5
Minor mod
you-n-g Jul 19, 2024
61d352f
Delete rdagent/app/qlib_rd_loop/nohup.out
you-n-g Jul 19, 2024
367a1ce
Fix a small bug in file reading.
WinstonLiyt Jul 22, 2024
7887905
some updates
WinstonLiyt Jul 22, 2024
d5f36d9
Update the detailed process and prompt of factor loop.
WinstonLiyt Jul 22, 2024
b4594ef
Merge branch 'main' into fix_some_errors_when_debug_factor
WinstonLiyt Jul 22, 2024
aa4c7e5
Evaluation & dataset
taozhiwang Jul 23, 2024
6d022b8
Optimize the prompt for generating hypotheses and feedback in the fac…
WinstonLiyt Jul 23, 2024
c51a6f0
Generate new data
taozhiwang Jul 23, 2024
90bd7e3
dataset generation
taozhiwang Jul 24, 2024
4fd9733
Performed further optimizations on the factor loop and report extract…
WinstonLiyt Jul 24, 2024
1da2635
Merge branch 'main' into fix_some_errors_when_debug_factor
WinstonLiyt Jul 24, 2024
1d66f16
Update rdagent/components/coder/factor_coder/CoSTEER/evaluators.py
you-n-g Jul 24, 2024
b1bdfdd
Update package.txt for fitz.
WinstonLiyt Jul 24, 2024
50a8ff0
Merge branch 'fix_some_errors_when_debug_factor' of https://github.co…
WinstonLiyt Jul 24, 2024
864f5a0
add the result
taozhiwang Jul 24, 2024
048c6fe
Performed further optimizations on the factor loop and report extract…
WinstonLiyt Jul 24, 2024
f9b57b9
Analysis
taozhiwang Jul 24, 2024
b9d9194
Optimized log output.
WinstonLiyt Jul 24, 2024
9218e5f
Merge branch 'fix_some_errors_when_debug_factor' of https://github.co…
WinstonLiyt Jul 24, 2024
ec5cc64
Merge branch 'fix_some_errors_when_debug_factor' into main
WinstonLiyt Jul 24, 2024
db82b67
Factor update
taozhiwang Jul 24, 2024
dcb7e07
Optimized log output.
WinstonLiyt Jul 24, 2024
265b6b3
A draft of the "Quick Start" section for README
WinstonLiyt Jul 24, 2024
39282eb
Merge branch 'main' of https://github.com/microsoft/RD-Agent into doc…
WinstonLiyt Jul 24, 2024
68f0a75
Add scenario descriptions.
WinstonLiyt Jul 24, 2024
52dc938
Updates
taozhiwang Jul 25, 2024
11980dc
Adjust content
you-n-g Jul 25, 2024
12c0eba
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 25, 2024
c9809f2
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
98906af
Enable logging of backtesting in Qlib and store rich-text description…
WinstonLiyt Jul 25, 2024
b97f24f
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 25, 2024
b7a04c2
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
702c830
Reformat analysis.py
taozhiwang Jul 25, 2024
ac80c93
CI fix
taozhiwang Jul 25, 2024
eb1c04e
Refactor
you-n-g Jul 25, 2024
f9295e0
remove useless code
you-n-g Jul 25, 2024
cab4f46
Merge branch 'benchmark'
taozhiwang Jul 25, 2024
d2770c6
fix bugs (#111)
SH-Src Jul 25, 2024
f4d553a
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
22b176b
Fix two small bugs.
WinstonLiyt Jul 25, 2024
26f2f74
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 25, 2024
f44e4ae
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 25, 2024
fb1478e
Fix a merge bug.
WinstonLiyt Jul 25, 2024
09e2d88
Fix two small bugs.
WinstonLiyt Jul 26, 2024
33b70e2
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 26, 2024
cf568a5
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 26, 2024
05869ce
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 26, 2024
9c64f14
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 26, 2024
787450c
fix some bugs.
WinstonLiyt Jul 29, 2024
b36e1cf
Fix some format bugs.
WinstonLiyt Jul 29, 2024
3e42a7b
Restore a file.
WinstonLiyt Jul 29, 2024
87dba2d
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 29, 2024
fb28226
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 29, 2024
ad7d18d
Fix a format bug.
WinstonLiyt Jul 29, 2024
9384937
draft renew of evaluators
WinstonLiyt Jul 30, 2024
557f3a7
fix a small bug.
WinstonLiyt Jul 30, 2024
a06d7f4
fix a small bug
WinstonLiyt Jul 30, 2024
13df05e
Support Factor Report Loop
you-n-g Jul 30, 2024
0e7a90f
Update framework for extracting factors from research reports.
WinstonLiyt Jul 30, 2024
5860055
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Jul 30, 2024
5f5675a
Merge branch 'main' into docs_and_demo
WinstonLiyt Jul 30, 2024
2a07947
Refactor report-based factor extraction and fix minor bugs.
WinstonLiyt Aug 1, 2024
f591636
fix a small bug of log.
WinstonLiyt Aug 1, 2024
4bdb1de
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Aug 1, 2024
4f743b2
Merge branch 'main' into docs_and_demo
WinstonLiyt Aug 1, 2024
34f335a
change some prompts
WinstonLiyt Aug 1, 2024
6dc4369
Merge branch 'main' of https://github.com/microsoft/RD-Agent into main
WinstonLiyt Aug 1, 2024
a8cb022
Merge branch 'main' into docs_and_demo
WinstonLiyt Aug 1, 2024
f7046b0
improve factor_runner
WinstonLiyt Aug 2, 2024
ea5e114
fix a small bug
WinstonLiyt Aug 2, 2024
2ef60e9
change some prompts
WinstonLiyt Aug 2, 2024
6fd15a7
cancel some comments
WinstonLiyt Aug 2, 2024
a883a78
cancel some comments and fix some bugs
WinstonLiyt Aug 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Extract factors from financial reports loop finished
  • Loading branch information
WinstonLiyt committed Jul 19, 2024
commit 64812781805699b58aa9239036f115e6b1fe0ee6
4 changes: 4 additions & 0 deletions rdagent/app/qlib_rd_loop/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,9 @@ class Config:
py_bin: str = "/usr/bin/python"
local_qlib_folder: Path = Path("/home/rdagent/qlib")

origin_report_path: str = "data/report_origin"
local_report_path: str = "data/report"
report_result_json_file_path: str = "git_ignore_folder/res_dict.json"


PROP_SETTING = PropSetting()
122 changes: 122 additions & 0 deletions rdagent/app/qlib_rd_loop/factor_from_report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
import json
from pathlib import Path
import pickle
from dotenv import load_dotenv
from jinja2 import Environment, StrictUndefined
import pandas as pd

from rdagent.app.qlib_rd_loop.conf import PROP_SETTING
from rdagent.components.document_reader.document_reader import load_and_process_pdfs_by_langchain
from rdagent.core.prompts import Prompts
from rdagent.core.scenario import Scenario
from rdagent.core.utils import import_class
from rdagent.log import rdagent_logger as logger
from rdagent.oai.llm_utils import APIBackend
from rdagent.scenarios.qlib.developer.factor_coder import QlibFactorCoSTEER
from rdagent.scenarios.qlib.experiment.factor_experiment import QlibFactorScenario, QlibFactorExperiment
from rdagent.scenarios.qlib.factor_experiment_loader.pdf_loader import (
FactorExperimentLoaderFromPDFfiles,
classify_report_from_dict,
)

from rdagent.core.proposal import (
Hypothesis2Experiment,
HypothesisExperiment2Feedback,
HypothesisGen,
Hypothesis,
Trace,
)

from rdagent.core.exception import FactorEmptyException
from rdagent.core.developer import Developer

assert load_dotenv()

scen: Scenario = import_class(PROP_SETTING.factor_scen)()

hypothesis_gen: HypothesisGen = import_class(PROP_SETTING.factor_hypothesis_gen)(scen)

hypothesis2experiment: Hypothesis2Experiment = import_class(PROP_SETTING.factor_hypothesis2experiment)()

qlib_factor_coder: Developer = import_class(PROP_SETTING.factor_coder)(scen)

qlib_factor_runner: Developer = import_class(PROP_SETTING.factor_runner)(scen)

qlib_factor_summarizer: HypothesisExperiment2Feedback = import_class(PROP_SETTING.factor_summarizer)(scen)

with open(PROP_SETTING.report_result_json_file_path, 'r') as f:
judge_pdf_data = json.load(f)

prompts_path = Path(__file__).parent / "prompts.yaml"
prompts = Prompts(file_path=prompts_path)

def generate_hypothesis(factor_result: dict, report_content: str) -> str:
system_prompt = Environment(undefined=StrictUndefined).from_string(prompts["hypothesis_generation"]["system"]).render()
user_prompt = Environment(undefined=StrictUndefined).from_string(prompts["hypothesis_generation"]["user"]).render(
factor_descriptions=json.dumps(factor_result),
report_content=report_content
)

response = APIBackend().build_messages_and_create_chat_completion(
user_prompt=user_prompt,
system_prompt=system_prompt,
json_mode=True,
)

response_json = json.loads(response)
hypothesis_text = response_json.get("hypothesis", "No hypothesis generated.")
reason_text = response_json.get("reason", "No reason provided.")

return Hypothesis(hypothesis=hypothesis_text, reason=reason_text)

def extract_factors_and_implement(report_file_path: str) -> tuple:
scenario = QlibFactorScenario()

with logger.tag("extract_factors_and_implement"):
with logger.tag("load_factor_tasks"):

exp = FactorExperimentLoaderFromPDFfiles().load(report_file_path)
if exp is None or exp.sub_tasks == []:
return None, None

docs_dict = load_and_process_pdfs_by_langchain(Path(report_file_path))

factor_result = {
task.factor_name: {
"description": task.factor_description,
"formulation": task.factor_formulation,
"variables": task.variables,
"resources": task.factor_resources
}
for task in exp.sub_tasks
}

report_content = "\n".join(docs_dict.values())
hypothesis = generate_hypothesis(factor_result, report_content)

return exp, hypothesis

trace = Trace(scen=scen)

for file_path, attributes in judge_pdf_data.items():
if attributes["class"] == 1:
report_file_path = Path(file_path.replace(PROP_SETTING.origin_report_path, PROP_SETTING.local_report_path))
if report_file_path.exists():
logger.info(f"Processing {report_file_path}")
exp, hypothesis = extract_factors_and_implement(str(report_file_path))
if exp is None:
continue
exp.based_experiments = [t[1] for t in trace.hist if t[2]]
if len(exp.based_experiments) == 0:
exp.based_experiments.append(QlibFactorExperiment(sub_tasks=[]))
exp = qlib_factor_coder.develop(exp)
exp = qlib_factor_runner.develop(exp)
if exp is None:
logger.error(f"Factor extraction failed for {report_file_path}. Skipping to the next report.")
continue
feedback = qlib_factor_summarizer.generateFeedback(exp, hypothesis, trace)

trace.hist.append((hypothesis, exp, feedback))
logger.info(f"Processed {report_file_path}: Result: {exp}")
else:
logger.error(f"File not found: {report_file_path}")
147 changes: 147 additions & 0 deletions rdagent/app/qlib_rd_loop/factor_from_report_sh.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
import json
from pathlib import Path
import pickle
from dotenv import load_dotenv
from jinja2 import Environment, StrictUndefined
import pandas as pd

from rdagent.app.qlib_rd_loop.conf import PROP_SETTING
from rdagent.components.document_reader.document_reader import load_and_process_pdfs_by_langchain
from rdagent.core.prompts import Prompts
from rdagent.core.scenario import Scenario
from rdagent.core.utils import import_class
from rdagent.log import rdagent_logger as logger
from rdagent.oai.llm_utils import APIBackend
from rdagent.scenarios.qlib.developer.factor_coder import QlibFactorCoSTEER
from rdagent.scenarios.qlib.experiment.factor_experiment import QlibFactorScenario, QlibFactorExperiment
from rdagent.scenarios.qlib.factor_experiment_loader.pdf_loader import (
FactorExperimentLoaderFromPDFfiles,
classify_report_from_dict,
)

from rdagent.core.proposal import (
Hypothesis2Experiment,
HypothesisExperiment2Feedback,
HypothesisGen,
Hypothesis,
Trace,
)

from rdagent.core.exception import FactorEmptyException
from rdagent.core.developer import Developer

assert load_dotenv()

scen: Scenario = import_class(PROP_SETTING.factor_scen)()

hypothesis_gen: HypothesisGen = import_class(PROP_SETTING.factor_hypothesis_gen)(scen)

hypothesis2experiment: Hypothesis2Experiment = import_class(PROP_SETTING.factor_hypothesis2experiment)()

qlib_factor_coder: Developer = import_class(PROP_SETTING.factor_coder)(scen)

qlib_factor_runner: Developer = import_class(PROP_SETTING.factor_runner)(scen)

qlib_factor_summarizer: HypothesisExperiment2Feedback = import_class(PROP_SETTING.factor_summarizer)(scen)

json_file_path = "/home/finco/v-yuanteli/RD-Agent/git_ignore_folder/res_dict.json"
with open(json_file_path, 'r') as f:
judge_pdf_data = json.load(f)

prompts_path = Path(__file__).parent / "prompts.yaml"
prompts = Prompts(file_path=prompts_path)

progress_file = "/home/finco/v-yuanteli/RD-Agent/git_ignore_folder/progress.pkl"

def save_progress(trace, current_index):
with open(progress_file, "wb") as f:
pickle.dump((trace, current_index), f)

def load_progress():
if Path(progress_file).exists():
with open(progress_file, "rb") as f:
return pickle.load(f)
return Trace(scen=scen), 0

def generate_hypothesis(factor_result: dict, report_content: str) -> str:
system_prompt = Environment(undefined=StrictUndefined).from_string(prompts["hypothesis_generation"]["system"]).render()
user_prompt = Environment(undefined=StrictUndefined).from_string(prompts["hypothesis_generation"]["user"]).render(
factor_descriptions=json.dumps(factor_result),
report_content=report_content
)

response = APIBackend().build_messages_and_create_chat_completion(
user_prompt=user_prompt,
system_prompt=system_prompt,
json_mode=True,
)

response_json = json.loads(response)
hypothesis_text = response_json.get("hypothesis", "No hypothesis generated.")
reason_text = response_json.get("reason", "No reason provided.")

return Hypothesis(hypothesis=hypothesis_text, reason=reason_text)

def extract_factors_and_implement(report_file_path: str) -> tuple:
scenario = QlibFactorScenario()

with logger.tag("extract_factors_and_implement"):
with logger.tag("load_factor_tasks"):

exp = FactorExperimentLoaderFromPDFfiles().load(report_file_path)
if exp is None or exp.sub_tasks == []:
return None, None

docs_dict = load_and_process_pdfs_by_langchain(Path(report_file_path))

factor_result = {
task.factor_name: {
"description": task.factor_description,
"formulation": task.factor_formulation,
"variables": task.variables,
"resources": task.factor_resources
}
for task in exp.sub_tasks
}

report_content = "\n".join(docs_dict.values())
hypothesis = generate_hypothesis(factor_result, report_content)

return exp, hypothesis

trace, start_index = load_progress()

try:
judge_pdf_data_items = list(judge_pdf_data.items())
for index in range(start_index, len(judge_pdf_data_items)):
if index > 1000:
break
file_path, attributes = judge_pdf_data_items[index]
if attributes["class"] == 1:
report_file_path = Path(file_path.replace("/data/home/xiaoyang/data/ftp/amc_origin_file/report", "/home/finco/data/report"))
if report_file_path.exists():
print(f"Processing {report_file_path}")
exp, hypothesis = extract_factors_and_implement(str(report_file_path))
if exp is None:
continue
exp.based_experiments = [t[1] for t in trace.hist if t[2]]
if len(exp.based_experiments) == 0:
exp.based_experiments.append(QlibFactorExperiment(sub_tasks=[]))
exp = qlib_factor_coder.develop(exp)
exp = qlib_factor_runner.develop(exp)
if exp is None:
logger.error(f"Factor extraction failed for {report_file_path}. Skipping to the next report.")
continue
feedback = qlib_factor_summarizer.generateFeedback(exp, hypothesis, trace)

trace.hist.append((hypothesis, exp, feedback))
print(f"Processed {report_file_path}: Result: {exp}")

# Save progress after processing each report
save_progress(trace, index + 1)
else:
print(f"File not found: {report_file_path}")
except Exception as e:
logger.error(f"An error occurred: {e}")
save_progress(trace, index)
raise
15 changes: 15 additions & 0 deletions rdagent/app/qlib_rd_loop/prompts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
hypothesis_generation:
system: |-
You are an expert in financial analysis. Your task is to generate a well-reasoned hypothesis based on the provided financial factors and report content.
Please ensure your response is in JSON format as shown below:
{
"hypothesis": "A clear and concise hypothesis based on the provided information.",
"reason": "A detailed explanation supporting the generated hypothesis."
}

user: |-
The following are the financial factors and their descriptions:
{{ factor_descriptions }}

The report content is as follows:
{{ report_content }}
24 changes: 24 additions & 0 deletions rdagent/app/qlib_rd_loop/run_script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

max_retries=1000
count=0
log_file="/home/finco/v-yuanteli/RD-Agent/rdagent/app/qlib_rd_loop/run_script.log"

while [ $count -lt $max_retries ]; do
echo "$(date) - Attempt $count of $max_retries" >> $log_file
/home/finco/anaconda3/envs/rdagent/bin/python /home/finco/v-yuanteli/RD-Agent/rdagent/app/qlib_rd_loop/factor_from_report_sh.py >> $log_file 2>&1
if [ $? -eq 0 ]; then
echo "$(date) - Script completed successfully on attempt $count" >> $log_file
break
fi
count=$((count + 1))
echo "$(date) - Restarting script after crash... Attempt $count of $max_retries" >> $log_file
done

if [ $count -ge $max_retries ]; then
echo "$(date) - Script failed after $max_retries attempts." >> $log_file
else
echo "$(date) - Script completed successfully." >> $log_file
fi

# chmod +x /home/finco/v-yuanteli/RD-Agent/rdagent/app/qlib_rd_loop/run_script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,14 @@ def evolve(
implementation_factors_per_round = int(
FACTOR_IMPLEMENT_SETTINGS.select_ratio * len(to_be_finished_task_index)
)

# Ensure at least one task is selected
if implementation_factors_per_round == 0:
implementation_factors_per_round = 1

if implementation_factors_per_round > len(to_be_finished_task_index):
implementation_factors_per_round = len(to_be_finished_task_index)

if FACTOR_IMPLEMENT_SETTINGS.select_method == "random":
to_be_finished_task_index = RandomSelect(
to_be_finished_task_index,
Expand Down
3 changes: 2 additions & 1 deletion rdagent/components/coder/factor_coder/factor.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ def execute(self, store_result: bool = False, data_type: str = "Debug") -> Tuple
raise CodeFormatException(self.FB_CODE_NOT_SET)
else:
# TODO: to make the interface compatible with previous code. I kept the original behavior.
raise ValueError(self.FB_CODE_NOT_SET)
# raise ValueError(self.FB_CODE_NOT_SET)
return self.FB_CODE_NOT_SET, None
with FileLock(self.workspace_path / "execution.lock"):
if FACTOR_IMPLEMENT_SETTINGS.enable_execution_cache:
# NOTE: cache the result for the same code and same data type
Expand Down
4 changes: 3 additions & 1 deletion rdagent/scenarios/qlib/developer/factor_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,9 @@ def develop(self, exp: QlibFactorExperiment) -> QlibFactorExperiment:
new_factors = self.process_factor_data(exp)

if new_factors.empty:
raise FactorEmptyException("No valid factor data found to merge.")
# raise FactorEmptyException("No valid factor data found to merge.")
logger.error("No valid factor data found to merge.")
return None

# Combine the SOTA factor and new factors if SOTA factor exists
if SOTA_factor is not None and not SOTA_factor.empty:
Expand Down
Loading