DAPO Integration in Trinity: Algorithm Registration, Dynamic Sampling Pipeline, Overlong Handling, and Reviewer-Focused Docs by artaasd95 · Pull Request #552 · agentscope-ai/Trinity-RFT

artaasd95 · 2026-05-25T17:19:26Z

Description

PR Description
This PR introduces DAPO as a first-class algorithm in Trinity and wires it end-to-end across algorithm registration, reward processing, experience filtering, examples, and tests. The implementation is additive and does not change existing algorithm defaults.

as we have discussed in the issue #551
you can check the changes below:

Algorithm integration and defaults
Added DAPO algorithm type class at algorithm.py.
DAPO default configuration is defined at algorithm.py to algorithm.py:
- repeat_times: 16
- advantage_fn: grpo
- sample_strategy: default
- policy_loss_fn: ppo
- kl_penalty_fn: none
- kl_loss_fn: none
- entropy_loss_fn: default
Registered algorithm key dapo in the global algorithm registry at init.py.
Added dapo to grouped-advantage handling in config manager at algorithm_config_manager.py and algorithm_config_manager.py.
Dynamic sampling operator (DAPO behavior)
Added DAPODynamicSamplingFilter at reward_filter.py.
Outcome score extraction logic at reward_filter.py.
Correctness decision logic at reward_filter.py.
Group filtering conditions:
- Drop all-wrong groups at reward_filter.py
- Drop all-correct groups at reward_filter.py
Registered operator names in operator registry:
- dapo_dynamic_sampling at init.py
- mask_response_truncated at init.py
Overlong response handling
Added MaskResponseTruncatedOperator at reward_filter.py.
Workflow-level masking safety added in math workflows:
- Sync workflow path at customized_math_workflows.py and customized_math_workflows.py
- Async workflow path at customized_math_workflows.py and customized_math_workflows.py
Reward function updates for DAPO
Added MathDAPORewardFn class at dapo_reward.py.
Accuracy score uses symmetric rule:
- correctness thresholding at dapo_reward.py
- +1/-1 assignment at dapo_reward.py
Overlong penalty logic:
- enable flag branch at dapo_reward.py
- penalty function at dapo_reward.py
- expected length formula at dapo_reward.py
- hard overlong penalty return at dapo_reward.py
Registered reward type math_dapo_reward at init.py.
Example configuration and docs
End-to-end DAPO example config:
- algorithm type at dapo.yaml
- clip settings at dapo.yaml and dapo.yaml
- token-level aggregation at dapo.yaml
- pipeline operators at dapo.yaml and dapo.yaml
- overlong reward args at dapo.yaml, dapo.yaml, dapo.yaml, and dapo.yaml
- default reward binding at dapo.yaml
DAPO feature documentation:
- DAPO listed in main README at README.md
- Dataset perspective update (English) at example_dataset_perspective.md
- Dataset perspective update (Chinese) at example_dataset_perspective.md
- DAPO behavior note for dynamic sampling and std_threshold guidance at README.md
- Metrics guidance at README.md
Tests and platform compatibility
Added DAPO test suite at test_dapo_algorithm.py.
Covered cases:
- registry at test_dapo_algorithm.py
- default config at test_dapo_algorithm.py
- no-reference policy expectation at test_dapo_algorithm.py
- dynamic sampling all-correct at test_dapo_algorithm.py
- dynamic sampling all-wrong at test_dapo_algorithm.py
- dynamic sampling mixed correctness at test_dapo_algorithm.py
- overlong-different but all-correct behavior at test_dapo_algorithm.py
- truncation masking at test_dapo_algorithm.py
- symmetric reward correctness at test_dapo_algorithm.py
Windows collection compatibility for posix-only resource import:
- guard and mock setup at conftest.py, conftest.py, and conftest.py
Reviewer-focused behavior summary
This is additive: no default behavior changes for non-DAPO algorithms.
DAPO path is activated through algorithm_type dapo and dedicated pipeline/reward wiring.
Dynamic sampling keeps only mixed-correctness task groups, which can reduce effective sample count but should improve training signal quality.
Overlong handling is enforced both in pipeline operator and workflow mask path.
Documentation and examples are complete for reproducibility.

Validation

DAPO test command executed in meridian-test environment.
Result: 9 passed.

If you want, I can also give you a shorter reviewer checklist version (one page) plus an expanded architecture version (for design review) from this same material.
Note: Considering that is my first PR in this project, please notify me if I did not considered anything!

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

- Introduced DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) algorithm in the codebase. - Updated example configurations and README files to reflect DAPO usage on the DAPO-Math-17k dataset. - Added tests for DAPO algorithm registration and dynamic sampling filter functionality. - Enhanced existing documentation to include DAPO techniques and implementation details. - Updated experience processing to support DAPO-specific features such as dynamic sampling and response truncation handling.

…rd processing

artaasd95 added 4 commits May 25, 2026 16:47

Merge branch 'agentscope-ai:main' into main

1cc85eb

Add detailed docstrings for DAPOAlgorithm and related filters in rewa…

04052a4

…rd processing

Merge branch 'develop'

53ee1e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAPO Integration in Trinity: Algorithm Registration, Dynamic Sampling Pipeline, Overlong Handling, and Reviewer-Focused Docs#552

DAPO Integration in Trinity: Algorithm Registration, Dynamic Sampling Pipeline, Overlong Handling, and Reviewer-Focused Docs#552
artaasd95 wants to merge 4 commits into
agentscope-ai:mainfrom
artaasd95:main

artaasd95 commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

artaasd95 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

artaasd95 commented May 25, 2026 •

edited

Loading