Skip to content

Refactoring Mooncake TE as a shared distributed component#17810

Merged
ShangmingCai merged 10 commits intomainfrom
abstract_mc_te
Feb 9, 2026
Merged

Refactoring Mooncake TE as a shared distributed component#17810
ShangmingCai merged 10 commits intomainfrom
abstract_mc_te

Conversation

@ShangmingCai
Copy link
Collaborator

@ShangmingCai ShangmingCai commented Jan 27, 2026

Motivation and Modifications

By migrating the Mooncake transfer engine from the disaggregation module to the distributed module, various advanced features such as PD, HiCache, and EPD can reuse the same transfer engine instance, instead of initializing and using different instances in different modules, which would otherwise waste resources.

TODO

  • ib_device setup refactor
  • mooncake store reuse mooncake TE
  • mooncake_ep reuse mooncake TE @UNIDY2002

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

Signed-off-by: Shangming Cai <csmthu@gmail.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ShangmingCai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Mooncake Transfer Engine, transitioning it from an isolated component within the disaggregation module to a shared, globally accessible entity within the distributed module. This change is designed to allow various advanced features, such as PD, HiCache, and EPD, to leverage a single, consistent instance of the transfer engine, thereby enhancing resource efficiency and simplifying its integration across the system. The modifications involve centralizing the engine's initialization and access mechanisms, ensuring that all dependent modules retrieve the same instance rather than creating their own.

Highlights

  • Shared Mooncake Transfer Engine: The Mooncake Transfer Engine has been refactored to operate as a shared, distributed component, moving from a module-specific implementation to a centralized one.
  • Centralized Initialization and Access: New functions, init_mooncake_transfer_engine and get_mooncake_transfer_engine, have been introduced to manage a single, global instance of the engine, ensuring consistent access across the system.
  • Resource Optimization: By reusing a single engine instance, the system avoids redundant initializations, leading to improved resource utilization for features like PD, HiCache, and EPD.
  • Codebase Structure: The MooncakeTransferEngine class has been relocated to python/sglang/srt/distributed/device_communicators/mooncake_transfer_engine.py to better reflect its role as a shared distributed component.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mooncake Transfer Engine into a shared distributed component, allowing various advanced features like PD, HiCache, and EPD to reuse a single instance. This change effectively centralizes the management and initialization of the transfer engine, which should lead to better resource utilization and a more consistent approach across modules. The introduction of init_mooncake_transfer_engine and get_mooncake_transfer_engine functions in parallel_state.py and mooncake_transfer_engine.py is a good design choice for managing this shared resource. The changes are well-aligned with the stated motivation.

@ShangmingCai
Copy link
Collaborator Author

/tag-and-rerun-ci

@ShangmingCai
Copy link
Collaborator Author

CC: @mickqian
Future usecases can change init_shared_mooncake_transfer_engine and use from sglang.srt.distributed.parallel_state import get_mooncake_transfer_engine to utilize mooncake to register/transfer data.

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
@ShangmingCai
Copy link
Collaborator Author

/rerun-failed-ci

@ShangmingCai
Copy link
Collaborator Author

The only failed test is not related:
image

@ShangmingCai
Copy link
Collaborator Author

@yizhang2077 Do you have time to help me review this PR?

@ShangmingCai ShangmingCai merged commit bffd765 into main Feb 9, 2026
349 of 364 checks passed
@ShangmingCai ShangmingCai deleted the abstract_mc_te branch February 9, 2026 02:53
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
1StepForever pushed a commit to 1StepForever/sglang that referenced this pull request Feb 26, 2026
* www/pr/ks: (265 commits)
  [BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (sgl-project#17483)
  Refactoring Mooncake TE as a shared distributed component (sgl-project#17810)
  [ModelOPT] Support Qwen 3 Next Coder NVFP4 (sgl-project#18224)
  Update author information in pyproject.toml (sgl-project#18453)
  [Kimi-K2.5] Fix missing `quant_config` in `KimiK25` (sgl-project#18440)
  Add tensor parallelism support to LFM2 ShortConv layers (sgl-project#17777)
  [diffusion] chore: revise process title (sgl-project#18446)
  Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (sgl-project#18396)
  [diffusion] refactor: group component loaders under the component_loaders/ directory (sgl-project#18438)
  [ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (sgl-project#18189)
  [diffusion] feat: support efficient sequence shard (sgl-project#18161)
  [CI] fix: notebook ci may not working (sgl-project#18417)
  fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (sgl-project#18394)
  [Fix] Fix backend selection after flashinfer version update (sgl-project#18364)
  [diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (sgl-project#13662)
  fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (sgl-project#18370)
  [diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (sgl-project#18253)
  [diffusion] fix: respect dist_timeout option (sgl-project#18386)
  [Doc] Fix outdated `--fp4-gemm-backend` documentation (sgl-project#18350)
  [diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (sgl-project#18382)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants