DAPO Integration in Trinity: Algorithm Registration, Dynamic Sampling Pipeline, Overlong Handling, and Reviewer-Focused Docs#552
Open
artaasd95 wants to merge 4 commits into
Open
Conversation
- Introduced DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) algorithm in the codebase. - Updated example configurations and README files to reflect DAPO usage on the DAPO-Math-17k dataset. - Added tests for DAPO algorithm registration and dynamic sampling filter functionality. - Enhanced existing documentation to include DAPO techniques and implementation details. - Updated experience processing to support DAPO-specific features such as dynamic sampling and response truncation handling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
PR Description
This PR introduces DAPO as a first-class algorithm in Trinity and wires it end-to-end across algorithm registration, reward processing, experience filtering, examples, and tests. The implementation is additive and does not change existing algorithm defaults.
as we have discussed in the issue #551
you can check the changes below:
Algorithm integration and defaults
Added DAPO algorithm type class at algorithm.py.
DAPO default configuration is defined at algorithm.py to algorithm.py:
Registered algorithm key dapo in the global algorithm registry at init.py.
Added dapo to grouped-advantage handling in config manager at algorithm_config_manager.py and algorithm_config_manager.py.
Dynamic sampling operator (DAPO behavior)
Added DAPODynamicSamplingFilter at reward_filter.py.
Outcome score extraction logic at reward_filter.py.
Correctness decision logic at reward_filter.py.
Group filtering conditions:
Registered operator names in operator registry:
Overlong response handling
Added MaskResponseTruncatedOperator at reward_filter.py.
Workflow-level masking safety added in math workflows:
Reward function updates for DAPO
Added MathDAPORewardFn class at dapo_reward.py.
Accuracy score uses symmetric rule:
Overlong penalty logic:
Registered reward type math_dapo_reward at init.py.
Example configuration and docs
End-to-end DAPO example config:
DAPO feature documentation:
Tests and platform compatibility
Added DAPO test suite at test_dapo_algorithm.py.
Covered cases:
Windows collection compatibility for posix-only resource import:
Reviewer-focused behavior summary
This is additive: no default behavior changes for non-DAPO algorithms.
DAPO path is activated through algorithm_type dapo and dedicated pipeline/reward wiring.
Dynamic sampling keeps only mixed-correctness task groups, which can reduce effective sample count but should improve training signal quality.
Overlong handling is enforced both in pipeline operator and workflow mask path.
Documentation and examples are complete for reproducibility.
Validation
If you want, I can also give you a shorter reviewer checklist version (one page) plus an expanded architecture version (for design review) from this same material.
Note: Considering that is my first PR in this project, please notify me if I did not considered anything!
Checklist
Please check the following items before code is ready to be reviewed.