Refactoring Mooncake TE as a shared distributed component by ShangmingCai · Pull Request #17810 · sgl-project/sglang

ShangmingCai · 2026-01-27T08:50:13Z

Motivation and Modifications

By migrating the Mooncake transfer engine from the disaggregation module to the distributed module, various advanced features such as PD, HiCache, and EPD can reuse the same transfer engine instance, instead of initializing and using different instances in different modules, which would otherwise waste resources.

TODO

ib_device setup refactor
mooncake store reuse mooncake TE
mooncake_ep reuse mooncake TE @UNIDY2002

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist · 2026-01-27T08:50:42Z

Summary of Changes

Hello @ShangmingCai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Mooncake Transfer Engine, transitioning it from an isolated component within the disaggregation module to a shared, globally accessible entity within the distributed module. This change is designed to allow various advanced features, such as PD, HiCache, and EPD, to leverage a single, consistent instance of the transfer engine, thereby enhancing resource efficiency and simplifying its integration across the system. The modifications involve centralizing the engine's initialization and access mechanisms, ensuring that all dependent modules retrieve the same instance rather than creating their own.

Highlights

Shared Mooncake Transfer Engine: The Mooncake Transfer Engine has been refactored to operate as a shared, distributed component, moving from a module-specific implementation to a centralized one.
Centralized Initialization and Access: New functions, init_mooncake_transfer_engine and get_mooncake_transfer_engine, have been introduced to manage a single, global instance of the engine, ensuring consistent access across the system.
Resource Optimization: By reusing a single engine instance, the system avoids redundant initializations, leading to improved resource utilization for features like PD, HiCache, and EPD.
Codebase Structure: The MooncakeTransferEngine class has been relocated to python/sglang/srt/distributed/device_communicators/mooncake_transfer_engine.py to better reflect its role as a shared distributed component.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist

Code Review

This pull request refactors the Mooncake Transfer Engine into a shared distributed component, allowing various advanced features like PD, HiCache, and EPD to reuse a single instance. This change effectively centralizes the management and initialization of the transfer engine, which should lead to better resource utilization and a more consistent approach across modules. The introduction of init_mooncake_transfer_engine and get_mooncake_transfer_engine functions in parallel_state.py and mooncake_transfer_engine.py is a good design choice for managing this shared resource. The changes are well-aligned with the stated motivation.

python/sglang/srt/distributed/device_communicators/mooncake_transfer_engine.py

python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_store.py

ShangmingCai · 2026-01-27T08:53:53Z

/tag-and-rerun-ci

ShangmingCai · 2026-01-27T08:58:20Z

CC: @mickqian
Future usecases can change init_shared_mooncake_transfer_engine and use from sglang.srt.distributed.parallel_state import get_mooncake_transfer_engine to utilize mooncake to register/transfer data.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai · 2026-01-31T06:36:51Z

/rerun-failed-ci

ShangmingCai · 2026-02-08T10:19:18Z

The only failed test is not related:

ShangmingCai · 2026-02-08T10:20:34Z

@yizhang2077 Do you have time to help me review this PR?

…t#17810) Signed-off-by: Shangming Cai <csmthu@gmail.com>

* www/pr/ks: (265 commits) [BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (sgl-project#17483) Refactoring Mooncake TE as a shared distributed component (sgl-project#17810) [ModelOPT] Support Qwen 3 Next Coder NVFP4 (sgl-project#18224) Update author information in pyproject.toml (sgl-project#18453) [Kimi-K2.5] Fix missing `quant_config` in `KimiK25` (sgl-project#18440) Add tensor parallelism support to LFM2 ShortConv layers (sgl-project#17777) [diffusion] chore: revise process title (sgl-project#18446) Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (sgl-project#18396) [diffusion] refactor: group component loaders under the component_loaders/ directory (sgl-project#18438) [ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (sgl-project#18189) [diffusion] feat: support efficient sequence shard (sgl-project#18161) [CI] fix: notebook ci may not working (sgl-project#18417) fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (sgl-project#18394) [Fix] Fix backend selection after flashinfer version update (sgl-project#18364) [diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (sgl-project#13662) fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (sgl-project#18370) [diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (sgl-project#18253) [diffusion] fix: respect dist_timeout option (sgl-project#18386) [Doc] Fix outdated `--fp4-gemm-backend` documentation (sgl-project#18350) [diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (sgl-project#18382) ...

Refactoring Mooncake TE as a shared distributed component

5089cb8

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai requested review from ByronHsu, Fridge003, Ying1123, ch-wan, hanming-lu, hnyls2002, ispobock, merrymercy, xiezhq-hermann and yizhang2077 as code owners January 27, 2026 08:50

upd

ec230dc

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

python/sglang/srt/distributed/device_communicators/mooncake_transfer_engine.py Show resolved Hide resolved

python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_store.py Outdated Show resolved Hide resolved

github-actions bot added the run-ci label Jan 27, 2026

fix ascend

122e1a5

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai requested review from iforgetmyname and ping1jing2 as code owners January 27, 2026 09:31

ShangmingCai added 5 commits January 27, 2026 17:56

move mc init before torch

3ca707a

Signed-off-by: Shangming Cai <csmthu@gmail.com>

fix lint

73741af

Signed-off-by: Shangming Cai <csmthu@gmail.com>

fix

99c6c4a

Signed-off-by: Shangming Cai <csmthu@gmail.com>

upd

2412970

Signed-off-by: Shangming Cai <csmthu@gmail.com>

diable hicache reuse temporary

37a4dd9

Signed-off-by: Shangming Cai <csmthu@gmail.com>

revert mooncake store resue TE

f268304

ShangmingCai added the high priority label Jan 31, 2026

ShangmingCai mentioned this pull request Feb 5, 2026

chore: bump mooncake version to 0.3.9 #18316

Merged

5 tasks

Merge branch 'main' into abstract_mc_te

de6667c

ping1jing2 approved these changes Feb 7, 2026

View reviewed changes

ShangmingCai assigned yizhang2077 Feb 8, 2026

yizhang2077 approved these changes Feb 9, 2026

View reviewed changes

ShangmingCai merged commit bffd765 into main Feb 9, 2026
349 of 364 checks passed

ShangmingCai deleted the abstract_mc_te branch February 9, 2026 02:53

ShangmingCai mentioned this pull request Feb 9, 2026

Reuse initialized transfer engine in mooncake store #18460

Merged

5 tasks

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

Refactoring Mooncake TE as a shared distributed component (sgl-projec…

e1201a2

…t#17810) Signed-off-by: Shangming Cai <csmthu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring Mooncake TE as a shared distributed component#17810

Refactoring Mooncake TE as a shared distributed component#17810
ShangmingCai merged 10 commits intomainfrom
abstract_mc_te

ShangmingCai commented Jan 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ShangmingCai commented Jan 27, 2026

Uh oh!

ShangmingCai commented Jan 27, 2026

Uh oh!

ShangmingCai commented Jan 31, 2026

Uh oh!

ShangmingCai commented Feb 8, 2026

Uh oh!

ShangmingCai commented Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShangmingCai commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Modifications

TODO

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ShangmingCai commented Jan 27, 2026

Uh oh!

ShangmingCai commented Jan 27, 2026

Uh oh!

ShangmingCai commented Jan 31, 2026

Uh oh!

ShangmingCai commented Feb 8, 2026

Uh oh!

ShangmingCai commented Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShangmingCai commented Jan 27, 2026 •

edited

Loading