Skip to content

Conversation

@HaochenYuan
Copy link
Contributor

@HaochenYuan HaochenYuan commented Dec 4, 2025

What does this PR do ?

Related issue: #2102
This PR removes fp16 assert in moe_grouped_gemm & EP.
In RL, compared to BF16, FP16 shows faster convergence speed, higher acc, and lower mismatch between training and rollout. See details in verl-project/verl#4158

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share discuss a design-doc with the team.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

@ko3n1g ko3n1g added this to the Core 0.16 milestone Dec 4, 2025
@HaochenYuan HaochenYuan changed the title enable fp16 in moe emove fp16 assert in grouped_moe_gemm & EP Dec 4, 2025
@HaochenYuan HaochenYuan changed the title emove fp16 assert in grouped_moe_gemm & EP remove fp16 assert in grouped_moe_gemm & EP Dec 4, 2025
@yaox12 yaox12 added the Expert Review Apply this label to indicate that your PR is ready for expert review. label Dec 10, 2025
@Phlip79
Copy link
Member

Phlip79 commented Dec 16, 2025

Why does GroupedMLP still require bf16?

@yaox12
Copy link
Member

yaox12 commented Dec 16, 2025

Why does GroupedMLP still require bf16?

The legacy GroupedMLP depends on a third-party grouped gemm implementation, which only supports BF16.

Co-authored-by: Philip Petrakian <pgpetrak@gmail.com>
@Phlip79 Phlip79 added the Final Review Apply this label to indicate that your PR is ready for final review. label Dec 22, 2025
@ericharper ericharper removed the Expert Review Apply this label to indicate that your PR is ready for expert review. label Dec 22, 2025
@Victarry Victarry enabled auto-merge December 22, 2025 06:23
@Victarry Victarry added this pull request to the merge queue Dec 22, 2025
Merged via the queue into NVIDIA:main with commit 4193f3a Dec 22, 2025
50 of 52 checks passed
maanug-nv pushed a commit to maanug-nv/Megatron-LM that referenced this pull request Jan 10, 2026
Co-authored-by: Philip Petrakian <pgpetrak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: low Final Review Apply this label to indicate that your PR is ready for final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants