Add concurrency tracking to runner utilization report#17963
Add concurrency tracking to runner utilization report#17963Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
Conversation
- Add calculate_concurrency_metrics() using sweep line algorithm to track: - Peak concurrent runners in use - Average concurrent runners over time - Saturation time (when all runners busy) - Peak queue depth (jobs waiting) - Use parallel API fetching with ThreadPoolExecutor for faster data collection - Add effective runner capacity based on observed peak (handles offline runners) - Add Concurrency Analysis section and Recommendations to report output Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @Kangyan-Zhou, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the runner utilization report by integrating advanced concurrency tracking and optimization features. It provides a deeper understanding of how runners are being utilized, identifies bottlenecks through queue depth and saturation analysis, and offers practical recommendations for scaling runner pools. The changes also improve the report generation speed by parallelizing API calls, making the tool more efficient and insightful for CI/CD pipeline management. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly enhances the runner utilization report by introducing concurrency analysis, which is a great addition. The use of a sweep-line algorithm is well-suited for these metrics, and parallelizing the job fetching will substantially improve the script's performance. The introduction of an "effective runner capacity" is a clever way to achieve more accurate saturation metrics. My review includes a few suggestions to address a logic bug, enhance code clarity, and refactor for improved maintainability.
| effective_runners = min(num_runners, concurrency_initial["peak_concurrent"]) | ||
| if effective_runners < num_runners and effective_runners > 0: | ||
| # Recalculate with effective capacity for accurate saturation | ||
| concurrency = calculate_concurrency_metrics( | ||
| jobs, window_start, window_end, effective_runners | ||
| ) | ||
| else: | ||
| concurrency = concurrency_initial | ||
| effective_runners = num_runners |
There was a problem hiding this comment.
There's a logic bug in the calculation of effective_runners. In the else block, effective_runners is incorrectly reset to num_runners. This happens when peak_concurrent is 0, causing effective_runners to be reported as num_runners instead of 0. The logic can be simplified to fix this bug and correctly handle all cases.
effective_runners = min(num_runners, concurrency_initial["peak_concurrent"])
if 0 < effective_runners < num_runners:
# Recalculate with effective capacity for accurate saturation
concurrency = calculate_concurrency_metrics(
jobs, window_start, window_end, effective_runners
)
else:
concurrency = concurrency_initial| if not jobs: | ||
| return { | ||
| "peak_concurrent": 0, | ||
| "avg_concurrent": 0.0, | ||
| "saturation_seconds": 0, | ||
| "saturation_pct": 0.0, | ||
| "peak_queue": 0, | ||
| } | ||
|
|
||
| window_seconds = (window_end - window_start).total_seconds() | ||
| if window_seconds <= 0: | ||
| return { | ||
| "peak_concurrent": 0, | ||
| "avg_concurrent": 0.0, | ||
| "saturation_seconds": 0, | ||
| "saturation_pct": 0.0, | ||
| "peak_queue": 0, | ||
| } |
There was a problem hiding this comment.
To improve maintainability and reduce code duplication, you can define the dictionary for empty results as a constant at the beginning of the function and reuse it in the early return paths.
EMPTY_RESULT = {
"peak_concurrent": 0,
"avg_concurrent": 0.0,
"saturation_seconds": 0,
"saturation_pct": 0.0,
"peak_queue": 0,
}
if not jobs:
return EMPTY_RESULT
window_seconds = (window_end - window_start).total_seconds()
if window_seconds <= 0:
return EMPTY_RESULT| # Create events for running jobs: +1 at start, -1 at end | ||
| running_events = [] | ||
| for job in jobs: | ||
| start = job["start"] | ||
| end = job["end"] | ||
| # Clamp to window | ||
| if end < window_start or start > window_end: | ||
| continue | ||
| clamped_start = max(start, window_start) | ||
| clamped_end = min(end, window_end) | ||
| running_events.append((clamped_start, 1, "start")) # +1 for start | ||
| running_events.append((clamped_end, -1, "end")) # -1 for end | ||
|
|
||
| # Create events for queue tracking (jobs created but not started) | ||
| queue_events = [] | ||
| for job in jobs: | ||
| created_at = job.get("created_at") | ||
| started_at = job["start"] | ||
| if created_at and created_at < started_at: | ||
| # Clamp to window | ||
| if started_at < window_start or created_at > window_end: | ||
| continue | ||
| clamped_created = max(created_at, window_start) | ||
| clamped_started = min(started_at, window_end) | ||
| queue_events.append((clamped_created, 1, "queued")) | ||
| queue_events.append((clamped_started, -1, "dequeued")) |
There was a problem hiding this comment.
For better performance and readability, the two separate loops over jobs to create running_events and queue_events can be combined into a single loop.
# Create events for running and queued jobs
running_events = []
queue_events = []
for job in jobs:
start = job["start"]
end = job["end"]
# Running events
if not (end < window_start or start > window_end):
clamped_start = max(start, window_start)
clamped_end = min(end, window_end)
running_events.append((clamped_start, 1, "start"))
running_events.append((clamped_end, -1, "end"))
# Queue events
created_at = job.get("created_at")
if created_at and created_at < start:
if not (start < window_start or created_at > window_end):
clamped_created = max(created_at, window_start)
clamped_started = min(start, window_end)
queue_events.append((clamped_created, 1, "queued"))
queue_events.append((clamped_started, -1, "dequeued"))| if not has_recommendations and results: | ||
| lines.append("All runner pools have healthy utilization.") |
There was a problem hiding this comment.
Add calculate_concurrency_metrics() using sweep line algorithm to track:
Use parallel API fetching with ThreadPoolExecutor for faster data collection
Add effective runner capacity based on observed peak (handles offline runners)
Add Concurrency Analysis section and Recommendations to report output
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci