Add concurrency tracking to runner utilization report by Kangyan-Zhou · Pull Request #17963 · sgl-project/sglang

Kangyan-Zhou · 2026-01-30T01:24:52Z

Add calculate_concurrency_metrics() using sweep line algorithm to track:
- Peak concurrent runners in use
- Average concurrent runners over time
- Saturation time (when all runners busy)
- Peak queue depth (jobs waiting)
Use parallel API fetching with ThreadPoolExecutor for faster data collection
Add effective runner capacity based on observed peak (handles offline runners)
Add Concurrency Analysis section and Recommendations to report output

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

- Add calculate_concurrency_metrics() using sweep line algorithm to track: - Peak concurrent runners in use - Average concurrent runners over time - Saturation time (when all runners busy) - Peak queue depth (jobs waiting) - Use parallel API fetching with ThreadPoolExecutor for faster data collection - Add effective runner capacity based on observed peak (handles offline runners) - Add Concurrency Analysis section and Recommendations to report output Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist · 2026-01-30T01:25:07Z

Summary of Changes

Hello @Kangyan-Zhou, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the runner utilization report by integrating advanced concurrency tracking and optimization features. It provides a deeper understanding of how runners are being utilized, identifies bottlenecks through queue depth and saturation analysis, and offers practical recommendations for scaling runner pools. The changes also improve the report generation speed by parallelizing API calls, making the tool more efficient and insightful for CI/CD pipeline management.

Highlights

Concurrency Metrics: Introduced a new calculate_concurrency_metrics() function utilizing a sweep line algorithm to track key performance indicators such as peak concurrent runners, average concurrent runners over time, saturation time (when all runners are busy), and peak queue depth.
Parallel API Fetching: Implemented parallel API fetching for job data using ThreadPoolExecutor, significantly improving the speed of data collection for the runner utilization report.
Effective Runner Capacity: Added logic to determine 'effective runner capacity' based on observed peak concurrency, which provides a more realistic view of available resources by accounting for offline or underutilized runners.
Enhanced Reporting: The report output now includes a dedicated 'Concurrency Analysis' section and a 'Recommendations' section, offering actionable insights based on saturation levels and queue buildup.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly enhances the runner utilization report by introducing concurrency analysis, which is a great addition. The use of a sweep-line algorithm is well-suited for these metrics, and parallelizing the job fetching will substantially improve the script's performance. The introduction of an "effective runner capacity" is a clever way to achieve more accurate saturation metrics. My review includes a few suggestions to address a logic bug, enhance code clarity, and refactor for improved maintainability.

gemini-code-assist · 2026-01-30T01:27:17Z

scripts/ci/utils/runner_utilization_report.py

+        effective_runners = min(num_runners, concurrency_initial["peak_concurrent"])
+        if effective_runners < num_runners and effective_runners > 0:
+            # Recalculate with effective capacity for accurate saturation
+            concurrency = calculate_concurrency_metrics(
+                jobs, window_start, window_end, effective_runners
+            )
+        else:
+            concurrency = concurrency_initial
+            effective_runners = num_runners


There's a logic bug in the calculation of effective_runners. In the else block, effective_runners is incorrectly reset to num_runners. This happens when peak_concurrent is 0, causing effective_runners to be reported as num_runners instead of 0. The logic can be simplified to fix this bug and correctly handle all cases.

effective_runners = min(num_runners, concurrency_initial["peak_concurrent"]) if 0 < effective_runners < num_runners: # Recalculate with effective capacity for accurate saturation concurrency = calculate_concurrency_metrics( jobs, window_start, window_end, effective_runners ) else: concurrency = concurrency_initial

gemini-code-assist · 2026-01-30T01:27:17Z

scripts/ci/utils/runner_utilization_report.py

+    if not jobs:
+        return {
+            "peak_concurrent": 0,
+            "avg_concurrent": 0.0,
+            "saturation_seconds": 0,
+            "saturation_pct": 0.0,
+            "peak_queue": 0,
+        }
+
+    window_seconds = (window_end - window_start).total_seconds()
+    if window_seconds <= 0:
+        return {
+            "peak_concurrent": 0,
+            "avg_concurrent": 0.0,
+            "saturation_seconds": 0,
+            "saturation_pct": 0.0,
+            "peak_queue": 0,
+        }


To improve maintainability and reduce code duplication, you can define the dictionary for empty results as a constant at the beginning of the function and reuse it in the early return paths.

EMPTY_RESULT = { "peak_concurrent": 0, "avg_concurrent": 0.0, "saturation_seconds": 0, "saturation_pct": 0.0, "peak_queue": 0, } if not jobs: return EMPTY_RESULT window_seconds = (window_end - window_start).total_seconds() if window_seconds <= 0: return EMPTY_RESULT

gemini-code-assist · 2026-01-30T01:27:17Z

scripts/ci/utils/runner_utilization_report.py

+    # Create events for running jobs: +1 at start, -1 at end
+    running_events = []
+    for job in jobs:
+        start = job["start"]
+        end = job["end"]
+        # Clamp to window
+        if end < window_start or start > window_end:
+            continue
+        clamped_start = max(start, window_start)
+        clamped_end = min(end, window_end)
+        running_events.append((clamped_start, 1, "start"))  # +1 for start
+        running_events.append((clamped_end, -1, "end"))  # -1 for end
+
+    # Create events for queue tracking (jobs created but not started)
+    queue_events = []
+    for job in jobs:
+        created_at = job.get("created_at")
+        started_at = job["start"]
+        if created_at and created_at < started_at:
+            # Clamp to window
+            if started_at < window_start or created_at > window_end:
+                continue
+            clamped_created = max(created_at, window_start)
+            clamped_started = min(started_at, window_end)
+            queue_events.append((clamped_created, 1, "queued"))
+            queue_events.append((clamped_started, -1, "dequeued"))


For better performance and readability, the two separate loops over jobs to create running_events and queue_events can be combined into a single loop.

# Create events for running and queued jobs running_events = [] queue_events = [] for job in jobs: start = job["start"] end = job["end"] # Running events if not (end < window_start or start > window_end): clamped_start = max(start, window_start) clamped_end = min(end, window_end) running_events.append((clamped_start, 1, "start")) running_events.append((clamped_end, -1, "end")) # Queue events created_at = job.get("created_at") if created_at and created_at < start: if not (start < window_start or created_at > window_end): clamped_created = max(created_at, window_start) clamped_started = min(start, window_end) queue_events.append((clamped_created, 1, "queued")) queue_events.append((clamped_started, -1, "dequeued"))

gemini-code-assist · 2026-01-30T01:27:17Z

scripts/ci/utils/runner_utilization_report.py

+    if not has_recommendations and results:
+        lines.append("All runner pools have healthy utilization.")


The summary message "All runner pools have healthy utilization." is redundant because each healthy runner pool already gets a "✓ Healthy utilization..." message. Removing this summary line will make the recommendations section cleaner when all pools are healthy.

) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

Kangyan-Zhou merged commit 2cd2c31 into sgl-project:main Jan 30, 2026
55 of 59 checks passed

charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Jan 30, 2026

Add concurrency tracking to runner utilization report (sgl-project#17963

be30c24

) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Chen-0210 pushed a commit to Chen-0210/sglang that referenced this pull request Jan 30, 2026

Add concurrency tracking to runner utilization report (sgl-project#17963

e9596ed

) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026

Add concurrency tracking to runner utilization report (sgl-project#17963

750d62e

) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

Add concurrency tracking to runner utilization report (sgl-project#17963

b7b8ade

) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add concurrency tracking to runner utilization report#17963

Add concurrency tracking to runner utilization report#17963
Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
Kangyan-Zhou:add-concurrency-tracking

Kangyan-Zhou commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if not has_recommendations and results:
		lines.append("All runner pools have healthy utilization.")

Conversation

Kangyan-Zhou commented Jan 30, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant