Skip to content

Hint users when wrongly execute it with partial ranks in dumper#19014

Merged
fzyzcjy merged 12 commits intosgl-project:mainfrom
fzyzcjy:ac8398/1
Feb 20, 2026
Merged

Hint users when wrongly execute it with partial ranks in dumper#19014
fzyzcjy merged 12 commits intosgl-project:mainfrom
fzyzcjy:ac8398/1

Conversation

@fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented Feb 19, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

When SGLANG_DUMPER_CLEANUP=1, all existing sglang_dump_* directories
under the base dir are removed before the first dump write to disk.
Rank 0 performs the cleanup with a distributed barrier for sync.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @fzyzcjy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the dumper utility by introducing an optional automatic cleanup mechanism for previously generated dump files. When enabled via a new environment variable, the dumper will now remove old sglang_dump_* directories before writing new data, helping to manage disk space and keep the debug output tidy.

Highlights

  • Automatic Dump Cleanup: Introduced an automatic cleanup feature for old dumper output directories, which can be enabled via a new environment variable.
  • New Environment Variable: Added SGLANG_DUMPER_CLEANUP_PREVIOUS environment variable to control the automatic cleanup behavior.
  • Cleanup Utility Function: Implemented a _cleanup_old_dumps utility function to safely remove previously generated sglang_dump_* directories.
  • Unit Tests: Included new unit tests to thoroughly verify the functionality and correct behavior of the automatic dump cleanup feature.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/debug_utils/dumper.py
    • Added documentation for the new SGLANG_DUMPER_CLEANUP_PREVIOUS environment variable.
    • Introduced cleanup_previous parameter to the _Dumper constructor and initialized a corresponding internal state.
    • Integrated the cleanup_previous flag into the from_env class method to read from the environment variable.
    • Implemented a conditional cleanup call within _dump_single to remove old dumps before new data is written.
    • Defined a new private utility function _cleanup_old_dumps responsible for deleting sglang_dump_* directories, ensuring it runs only on rank 0 in distributed settings.
  • test/registered/debug_utils/test_dumper.py
    • Added a new test class TestCleanup to validate the automatic dump cleanup functionality.
    • Included a test case test_cleanup_removes_old_dumps to confirm that old dump directories are deleted when cleanup is enabled.
    • Added test_no_cleanup_by_default to ensure that cleanup does not occur when the feature is not explicitly enabled.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to automatically clean up old dump directories from the dumper utility, controlled by the SGLANG_DUMPER_CLEANUP_PREVIOUS environment variable. The implementation correctly handles distributed environments by performing the cleanup on rank 0 and synchronizing with other ranks. New tests are added to verify this functionality.

My review includes a correction for a typo in a test case parameter and a suggestion to improve code style by moving an import statement to the top of the file, following PEP 8 guidelines.

old_dir.mkdir()
(old_dir / "dummy.pt").touch()

dumper = _make_test_dumper(tmp_path, needs_cleanup=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a typo in the parameter name. The _Dumper constructor expects cleanup_previous, but needs_cleanup is used here. This will prevent the test from correctly enabling the cleanup feature.

Suggested change
dumper = _make_test_dumper(tmp_path, needs_cleanup=True)
dumper = _make_test_dumper(tmp_path, cleanup_previous=True)



def _cleanup_old_dumps(base_dir: Path) -> None:
import shutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, imports should be at the top of the file. Please remove this line and add import shutil to the file's top-level imports. This improves code readability and consistency.

Collective operations (broadcast_object_list, all_gather_object) hang
silently when not all ranks participate. Add a configurable timeout
(default 60s) that prints a warning if a collective op doesn't complete,
helping users diagnose missing rank participation.
Add unit test (TestCollectiveTimeout) verifying the watchdog fires and
prints a warning when a collective op exceeds the timeout, and a
distributed test in TestDumperDistributed exercising the real
on_forward_pass_start broadcast with staggered rank joins.
@fzyzcjy fzyzcjy merged commit fc1500a into sgl-project:main Feb 20, 2026
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant