Fix ragged gather reduce correctness by NuojCheng · Pull Request #4262 · AI-Hypercomputer/maxtext

NuojCheng · 2026-06-24T22:01:18Z

Description

This PR does three things:

Add the mask in ragged_gather_reduce kernel solving correctness using ragged sort kernels;
Remove fallback mechanism in ragged_gather_reduce kernel of small tensors. It breaks our ragged sort unit test;
Add fallback/cost estimate flags, which are missing in the previous PRs.

Tests

CI tests

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-06-24T22:05:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

When prefuse_moe_weights=True (requires sparse_matmul=True), the two FFN1 expert weight matrices [G,K,N] are concatenated into [G,K,2N] and dispatched as a single grouped GEMM, then split. This halves FFN1 kernel launches and reads input activations from HBM once instead of twice. Backend-agnostic: works with Megablox, Tokamax, and jax.lax.ragged_dot. When attention=vllm_rpa the fused tensor is passed directly to the vLLM-TPU serving kernel. correct sort kernel convergence

NuojCheng added the pull ready label Jun 24, 2026

NuojCheng removed the pull ready label Jun 24, 2026

NuojCheng force-pushed the chengnuojin-ragged-followup branch 3 times, most recently from 3b816f6 to e4b450a Compare June 26, 2026 17:05

NuojCheng added pull ready and removed pull ready labels Jun 26, 2026

NuojCheng force-pushed the chengnuojin-ragged-followup branch from e4b450a to 8c14942 Compare June 27, 2026 00:13

NuojCheng added the pull ready label Jun 27, 2026

NuojCheng force-pushed the chengnuojin-ragged-followup branch 2 times, most recently from 64a11c0 to 9e10916 Compare June 29, 2026 18:03

abhinavgoel95 and others added 2 commits June 29, 2026 18:04

revert optimization from PR#4166

bd32f80

NuojCheng force-pushed the chengnuojin-ragged-followup branch from 9e10916 to bd32f80 Compare June 29, 2026 18:04

NuojCheng removed the pull ready label Jun 29, 2026

NuojCheng marked this pull request as ready for review June 29, 2026 18:45

NuojCheng requested review from A9isha, RissyRan, SurbhiJainUSC, abhinavclemson, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, khatwanimohit, richjames0, shralex and vipannalla as code owners June 29, 2026 18:45

NuojCheng requested review from NicoGrande, darisoy, dipannita08, igorts-git, jiangjy1982 and suexu1025 as code owners June 29, 2026 18:45

gobbleturk approved these changes Jun 29, 2026

View reviewed changes

Shuwen-Fang approved these changes Jun 29, 2026

View reviewed changes

NuojCheng added the pull ready label Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ragged gather reduce correctness#4262

Fix ragged gather reduce correctness#4262
NuojCheng wants to merge 2 commits into
mainfrom
chengnuojin-ragged-followup

NuojCheng commented Jun 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

NuojCheng commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NuojCheng commented Jun 24, 2026 •

edited

Loading

codecov Bot commented Jun 24, 2026 •

edited

Loading