eval skill: auto-detect predefined per-cluster execution configs#1599
eval skill: auto-detect predefined per-cluster execution configs#1599cjluo-nv wants to merge 2 commits into
Conversation
Some NEL installs ship ready-made per-cluster execution configs as an internal/slurm/<cluster> group (optional nemo_evaluator_launcher_internal package). When present, the matching one pre-fills hostname/partition/gres (and node-exclusivity), so the user only sets account/output_dir/walltime. SKILL.md Step 4 now runs a discovery check FIRST: list the available cluster->hostname pairs from the installed package at runtime and, on a match, use internal/slurm/<cluster> instead of slurm/default; otherwise fall back to slurm/default and fill hostname/account/output_dir manually. Discovery-based by design: cluster names / hostnames / accounts are read from the install at runtime and never hardcoded here. example_eval.yaml gets a short name-free pointer to the check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
📝 WalkthroughWalkthroughAdds documentation and comments that explain how to discover and use optional internal per-cluster SLURM execution configs (internal/slurm/) instead of the generic ChangesSLURM Configuration Guidance
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Compress the Step 4 discovery guidance and the example_eval.yaml comment added in the prior commit; no behavior change (discovery snippet unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/skills/evaluation/SKILL.md:
- Line 205: Update the fallback instructions so the explicit fields users must
fill when no internal SLURM profile matches include partition, gres, and
walltime in addition to hostname/account/output_dir; specifically modify the
sentence about replacing `slurm/default` and removing `execution.hostname` (the
`defaults: - execution: internal/slurm/<cluster>` guidance) to state that if no
match is found you should keep `slurm/default` and manually set `hostname`,
`account`, `output_dir`, `partition`, `gres`, and `walltime` so it matches the
earlier claim that internal profiles auto-fill partition/gres/walltime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: aa91707f-d38e-4d05-8a24-1445c9bc1e1e
📒 Files selected for processing (2)
.claude/skills/evaluation/SKILL.md.claude/skills/evaluation/recipes/examples/example_eval.yaml
| echo "$(basename "$f" .yaml) -> $(grep -E '^hostname:' "$f" | awk '{print $2}')"; done | ||
| ``` | ||
|
|
||
| If a listed hostname matches the target SLURM cluster, set `defaults: - execution: internal/slurm/<cluster>` (replacing `slurm/default`), and remove any now-redundant `execution.hostname` (keep `account` / `output_dir` / `walltime` overrides). Confirm it resolves with a `--dry-run`. If the package isn't installed or nothing matches, keep `slurm/default` and fill `hostname` / `account` / `output_dir` manually. |
There was a problem hiding this comment.
Align fallback fields with the fields you say are auto-filled
Line 205’s fallback omits partition/gres (and walltime mention), even though Line 196 says the internal profile pre-fills those. Please make the fallback list explicit and consistent so users don’t miss required manual values when no internal profile matches.
Suggested doc edit
-If the package isn't installed or nothing matches, keep `slurm/default` and fill `hostname` / `account` / `output_dir` manually.
+If the package isn't installed or nothing matches, keep `slurm/default` and fill `hostname` / `partition` / `gres` / `account` / `output_dir` / `walltime` manually.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| If a listed hostname matches the target SLURM cluster, set `defaults: - execution: internal/slurm/<cluster>` (replacing `slurm/default`), and remove any now-redundant `execution.hostname` (keep `account` / `output_dir` / `walltime` overrides). Confirm it resolves with a `--dry-run`. If the package isn't installed or nothing matches, keep `slurm/default` and fill `hostname` / `account` / `output_dir` manually. | |
| If a listed hostname matches the target SLURM cluster, set `defaults: - execution: internal/slurm/<cluster>` (replacing `slurm/default`), and remove any now-redundant `execution.hostname` (keep `account` / `output_dir` / `walltime` overrides). Confirm it resolves with a `--dry-run`. If the package isn't installed or nothing matches, keep `slurm/default` and fill `hostname` / `partition` / `gres` / `account` / `output_dir` / `walltime` manually. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/skills/evaluation/SKILL.md at line 205, Update the fallback
instructions so the explicit fields users must fill when no internal SLURM
profile matches include partition, gres, and walltime in addition to
hostname/account/output_dir; specifically modify the sentence about replacing
`slurm/default` and removing `execution.hostname` (the `defaults: - execution:
internal/slurm/<cluster>` guidance) to state that if no match is found you
should keep `slurm/default` and manually set `hostname`, `account`,
`output_dir`, `partition`, `gres`, and `walltime` so it matches the earlier
claim that internal profiles auto-fill partition/gres/walltime.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.claude/skills/evaluation/SKILL.md (1)
205-205:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix fallback field list to match the pre-fill contract.
Line 205 still omits
partition,gres, andwalltime, which conflicts with Line 196’s statement about what internal profiles pre-fill. Please make the fallback explicit and consistent.Proposed doc fix
-Hostname match → set `defaults: - execution: internal/slurm/<cluster>`, drop the redundant `execution.hostname` (keep account/output_dir/walltime), verify with `--dry-run`. Else keep `slurm/default` and fill hostname/account/output_dir manually. +Hostname match → set `defaults: - execution: internal/slurm/<cluster>`, drop the redundant `execution.hostname` (keep account/output_dir/walltime), verify with `--dry-run`. Else keep `slurm/default` and fill hostname/partition/gres/account/output_dir/walltime manually.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/evaluation/SKILL.md at line 205, Update the fallback field list in SKILL.md so it matches the pre-fill contract: when hostname matches set defaults to "execution: internal/slurm/<cluster>" and remove the redundant "execution.hostname" entry, but explicitly include and pre-fill the fields account, output_dir, walltime, partition, and gres; note the user should verify with "--dry-run". Otherwise keep "slurm/default" and instruct filling hostname/account/output_dir/partition/gres/walltime manually. Ensure the text around the existing "Hostname match → set `defaults: - execution: internal/slurm/<cluster>`" line is changed to list the full fallback fields (account, output_dir, walltime, partition, gres) to be consistent with the statement at line 196.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.claude/skills/evaluation/SKILL.md:
- Line 205: Update the fallback field list in SKILL.md so it matches the
pre-fill contract: when hostname matches set defaults to "execution:
internal/slurm/<cluster>" and remove the redundant "execution.hostname" entry,
but explicitly include and pre-fill the fields account, output_dir, walltime,
partition, and gres; note the user should verify with "--dry-run". Otherwise
keep "slurm/default" and instruct filling
hostname/account/output_dir/partition/gres/walltime manually. Ensure the text
around the existing "Hostname match → set `defaults: - execution:
internal/slurm/<cluster>`" line is changed to list the full fallback fields
(account, output_dir, walltime, partition, gres) to be consistent with the
statement at line 196.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e94bef91-0dc6-4ebb-972c-6ead79589133
📒 Files selected for processing (2)
.claude/skills/evaluation/SKILL.md.claude/skills/evaluation/recipes/examples/example_eval.yaml
✅ Files skipped from review due to trivial changes (1)
- .claude/skills/evaluation/recipes/examples/example_eval.yaml
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1599 +/- ##
=======================================
Coverage 77.38% 77.38%
=======================================
Files 479 479
Lines 52435 52435
=======================================
Hits 40578 40578
Misses 11857 11857
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
meenchen
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Docs-only change (+13/-0) adding a "check FIRST" discovery snippet for optional nemo_evaluator_launcher_internal per-cluster execution configs, plus a pointer comment in example_eval.yaml. No cluster-specific data is hardcoded — discovery is at runtime and falls back gracefully to slurm/default when the package isn't installed. No correctness, licensing, or test concerns.
What does this PR do?
Type of change: documentation
Some NEL installs ship ready-made per-cluster execution configs as an
internal/slurm/<cluster>group (via the optionalnemo_evaluator_launcher_internalpackage). When present, the matching one pre-fills the cluster'shostname/partition/gres(and node-exclusivity), so the user only setsaccount/output_dir/walltimeinstead of hand-entering the hostname.SKILL.mdStep 4 — adds a "check FIRST" step: run a discovery snippet that lists the availablecluster → hostnamepairs from the installed package at runtime; on a hostname match, usedefaults: - execution: internal/slurm/<cluster>(replacingslurm/default) and drop the now-redundantexecution.hostname. If the package isn't installed or nothing matches, fall back toslurm/defaultand fill the fields manually.example_eval.yaml— a short, name-free comment on thedefaultsblock pointing to that check.Discovery-based by design (no internal data committed): cluster names, hostnames, and accounts are read from the install at runtime and are not hardcoded here. The only internal references in the repo are the package name
nemo_evaluator_launcher_internaland the genericinternal/slurm/<cluster>group pattern. External users without the package degrade gracefully (import fails →slurm/default).Usage
N/A — documentation / skill guidance only.
Testing
Ran the discovery snippet verbatim (lists
cluster → hostnamefrom the installed package) and confirmedinternal/slurm/<cluster>resolves via a--dry-run.pre-commit runpasses (markdownlint + YAML format). Grep-verified no cluster names/hostnames are hardcoded in the changed files.Before your PR is "Ready for review"
slurm/default)CONTRIBUTING.md: N/A🤖 Generated with Claude Code
Summary by CodeRabbit