Skip to content

feat: add LLM-finetune scenario#1314

Merged
Jensen246 merged 423 commits intomainfrom
finetune
Mar 2, 2026
Merged

feat: add LLM-finetune scenario#1314
Jensen246 merged 423 commits intomainfrom
finetune

Conversation

@XianBW
Copy link
Collaborator

@XianBW XianBW commented Dec 15, 2025

Description

Motivation and Context

How Has This Been Tested?

  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

📚 Documentation preview 📚: https://RDAgent--1314.org.readthedocs.build/en/1314/

peteryang1 and others added 30 commits November 25, 2025 08:54
* feat: add iterative evolve and evaluation support with partial chain stop

* feat: add FTDataEvaluator and support multiple implement functions in finetune
…1303)

* feat:(1) support for multi layer dataset extraction (2) add category.json for dataset in datasets/

* fix: fix bug for generate category.json

* feat: add get_dataset_folder_desc

* init data proposal and merge qzli/ft

* update data proposal prompts and add max_position_embeddings and resolve confilcts

* remove sample counts in data proposal

* turn data and train to unified hypo_gen

* refine prompts

* remove category.json and add it to dataset_info

* fix jinja problem and proposal done

* lint

* add ai-generated description and raw readme into dataset_info.json

* update prompt for description

* add datasets

* initial fix for proposal of data

* final version for data proposal

* lint
* refactor(dataset): add stats into dataset_info.json, and remove dataset from gitignore_folder

* feat: enable data coder and run data process
* feat: implement finetune data coding, evaluation, and config improvements

* fix: deepspeed config path

* fix: dataset info columns

---------

Co-authored-by: Young <afe.young@gmail.com>
@XianBW XianBW marked this pull request as ready for review February 6, 2026 02:55
Copilot AI review requested due to automatic review settings February 6, 2026 02:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive LLM fine-tuning system to RDAgent, adding support for automated fine-tuning experiments with benchmarking, dataset management, and evaluation pipelines. The changes span ~15,000+ lines across multiple modules including core framework modifications, scenario implementation, benchmarking infrastructure, and UI components.

Changes:

  • Added LLM fine-tuning scenario with training pipeline, benchmark evaluation, and dataset management
  • Modified core framework to support iterative evaluation and evolving strategies
  • Added Docker environments for training (LLaMA-Factory) and benchmarking (OpenCompass)
  • Implemented UI for monitoring fine-tuning jobs and experiments
  • Extended configuration system with fine-tuning specific settings

Reviewed changes

Copilot reviewed 115 out of 121 changed files in this pull request and generated 56 comments.

Show a summary per file
File Description
rdagent/core/experiment.py Changed stdout handling from truncated to full output
rdagent/core/evaluation.py Made evaluate() method optional instead of abstract
rdagent/core/evolving_framework.py Added iterative evaluation support
rdagent/core/evolving_agent.py Implemented RAGEvaluator with evaluate_iter
rdagent/core/proposal.py Added SOTA tracking and DAG parent synchronization
rdagent/core/exception.py Added CodeBlockParseError for extraction failures
rdagent/utils/workflow/loop.py Added skip_loop_error_stepname for error recovery
rdagent/components/coder/CoSTEER/* Extended with iterative evolving and evaluation
rdagent/scenarios/finetune/* Complete fine-tuning scenario implementation
rdagent/oai/backend/* Enhanced code block parsing and token counting
test/* Added test files for fine-tuning components
Comments suppressed due to low confidence (1)

rdagent/scenarios/data_science/dev/runner/eval.py:91

  • This assignment to 'stdout' is unnecessary as it is redefined before this value is used.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""
result = self.run(env, entry)
return result.get_truncated_stdout() # NOTE: truncating just for aligning with the old code.
return result.stdout # NOTE: truncating just for aligning with the old code.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_truncated_stdout() method calls have been replaced with direct stdout access. Verify that the stdout attribute contains the full output or is appropriately handled in all calling code, as this changes the behavior from truncated to full output which may cause issues with very large outputs.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +106
def assign_code_list_to_evo(self, code_list: list[dict | None], evo) -> None:
"""Assign code modifications to evolving item.
For runner, coder already generated full training config, so typically no modifications.
But this method is required by the abstract base class.
"""
for index in range(len(evo.sub_tasks)):
if code_list[index] is None:
continue
if evo.sub_workspace_list[index] is None:
evo.sub_workspace_list[index] = evo.experiment_workspace

# If there are any modifications (usually empty for runner)
if code_list[index]:
# Handle change summary if present
if self.KEY_CHANGE_SUMMARY in code_list[index]:
evo.sub_workspace_list[index].change_summary = code_list[index].pop(self.KEY_CHANGE_SUMMARY)
# Inject any modified files
evo.sub_workspace_list[index].inject_files(**code_list[index])

return evo
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate method definition assign_code_list_to_evo at line 86 will override the abstract method at line 75. This appears to be a concrete implementation that should either replace the abstract method or be renamed. The duplicate definition will cause the abstract method to be shadowed.

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +254
if self.skip_loop_error_stepname:
next_step_idx = self.steps.index(self.skip_loop_error_stepname)
if next_step_idx <= si:
raise RuntimeError(
f"Cannot skip backwards or to same step. Current: {si} ({name}), Target: {next_step_idx} ({self.skip_loop_error_stepname})"
) from e
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the skip_loop_error handling, when skip_loop_error_stepname is provided, the code raises a RuntimeError if the target step is before or at the current step. However, this exception is raised using from e, which chains it with the original skip_loop_error exception. This might lead to confusion about which exception caused the failure. Consider whether this is the intended behavior or if a separate exception type would be clearer.

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +173
# for path in Path(local_path).rglob("*"):
# p = str(path.relative_to(Path(local_path)))
# if p.startswith("__pycache__"):
# continue
# data_key.append(p)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Copilot uses AI. Check for mistakes.
Comment on lines +439 to +440
# if entry.name.lower() in {"readme.md", "readme.txt"}:
# results.append(entry)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Copilot uses AI. Check for mistakes.
dict: Merged configuration (model-specific overrides default)
Uses exact match first, then longest prefix match, finally default only.
"""
config_data = yaml.safe_load(open(Path(__file__).parent / "configs" / "models.yaml", "r"))
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is opened but is not closed.

Copilot uses AI. Check for mistakes.
vllm>=0.12.0

# OpenCompass benchmark framework (custom fork with cascade eval support)
opencompass @ git+https://github.com/Jensen246/opencompass.git
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conda requirements file installs opencompass directly from a mutable GitHub URL (opencompass @ git+https://github.com/Jensen246/opencompass.git), so each environment build can pull and execute different code over time from that external repo. If the upstream repository or its default branch is compromised, attackers can introduce malicious code into your evaluation environment and potentially exfiltrate credentials (e.g., HF tokens) used there. Prefer pinning this VCS dependency to a specific commit SHA or signed release artifact, or mirroring it to a controlled internal registry.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,20 @@
FROM hiyouga/llamafactory:0.9.4
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile bases the fine-tuning environment on the third-party image hiyouga/llamafactory:0.9.4 referenced only by a mutable tag, which is a single point of supply-chain trust for all subsequent workloads that may handle API keys or training data. If that image tag is ever replaced or compromised in the upstream registry, builds will silently pull a tampered image and execute attacker-controlled code. To mitigate this, pin the base image to a specific immutable digest (and/or mirror it to a trusted internal registry) so builds are reproducible and resilient to upstream tag hijacking.

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +46
BLOB_URL="https://${ACCOUNT}.blob.core.windows.net/${CONTAINER}/${REMOTE_PATH}?${TOKEN}"
echo "Full Blob URL:"
echo "$BLOB_URL"
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gen_token.sh prints the full Azure Blob SAS URL including the TOKEN to stdout (echo "$BLOB_URL"), which can leak write-enabled storage credentials into shell history or centralized logs. Anyone with access to these logs could reuse the SAS URL to read, write or delete blob data for the configured container/path. To reduce exposure, avoid echoing the full SAS token/URL (or gate it behind an explicit debug flag) and ensure tokens are only written to controlled files or displayed interactively when absolutely necessary.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +19
RUN git clone https://github.com/Jensen246/opencompass.git /opencompass
WORKDIR /opencompass

RUN pip install ".[vllm]" --no-cache-dir
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile clones https://github.com/Jensen246/opencompass.git at the default branch and immediately runs pip install ".[vllm]", meaning image builds always execute mutable third-party code fetched directly from GitHub. If that repository or its default branch is compromised, an attacker can inject arbitrary code into your build and runtime environment with access to any secrets mounted into the container. To harden the supply chain, pin the dependency to an immutable reference (tagged release or commit SHA) and, if possible, vendor or mirror the code under tighter control.

Copilot uses AI. Check for mistakes.
@XianBW XianBW changed the title RDAgent Finetune LLM feat: add LLM-finetune scenario Feb 12, 2026
@Jensen246 Jensen246 force-pushed the finetune branch 3 times, most recently from cbb6281 to f986ad5 Compare February 13, 2026 05:45
When skip_loop_error exception happens and skip_loop_error_stepname is not
explicitly set, default to jumping to 'feedback' step if it exists,
otherwise fall back to the last step (record).

This prevents KeyError when record step tries to access feedback data that
doesn't exist because we skipped the feedback phase.

Also removed redundant skip_loop_error_stepname from finetune loop since
it's now the default behavior.
@peteryang1
Copy link
Contributor

how about merge this PR? @Jensen246

@XianBW
Copy link
Collaborator Author

XianBW commented Feb 28, 2026

how about merge this PR? @Jensen246

Only data science scenario has a bug to fix.

@Jensen246 Jensen246 merged commit 1824e1c into main Mar 2, 2026
10 checks passed
@Jensen246 Jensen246 deleted the finetune branch March 2, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants