[gh-aw] Refactor nightly-fix-finder into per-category scripts#11451
[gh-aw] Refactor nightly-fix-finder into per-category scripts#11451jonathanpeppers wants to merge 6 commits into
Conversation
Adopts three improvements from mono/SkiaSharp#3975: 1. **Scripts-folder architecture**: each category lives in its own .sh file under`.github/workflows/nightly-fix-finder/`. Adding a category = drop a new file; `shuf -n 1` picks one per run. No more two-places-to-edit between the YAML case statement and the prompt's category table. 2. **Self-documenting scripts**: each script prints its own `GUIDANCE` heredoc (what to look for / how to fix / what NOT to flag) right next to its `grep`. The agent reads one combined output instead of cross-referencing a separate prompt table. 3. **Confidence-gate scoring**: Phase 3 adds a 30-point rubric (actionability + safety + scope). Below 22/30 or safety < 6 ⇒ `noop`. Stops the worst failure mode — noisy or risky issues getting auto-assigned to Copilot. All 9 existing categories ported verbatim. Bash syntax-checked, smoke-tested locally. lock.yml regenerated via `gh aw compile`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors the nightly fix-finder agentic workflow from an inlined case statement into a per-category script architecture, moving category guidance into the scripts themselves and adding a confidence-gate rubric to reduce low-quality/unsafe auto-filed issues.
Changes:
- Introduced
.github/workflows/nightly-fix-finder/*.shper-category scripts that emit guidance + scan data. - Updated
nightly-fix-finder.mdto select and run a script (randomly or viaworkflow_dispatchchoice) and added a 30-point confidence rubric gate before filing issues. - Added
.gitattributesenforcement for LF endings on*.shand regeneratednightly-fix-finder.lock.yml.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/nightly-fix-finder.md | Picks a script (random or chosen), runs it to generate scan results, and updates the agent prompt with rubric gating + script-based guidance. |
| .github/workflows/nightly-fix-finder.lock.yml | Regenerated compiled workflow reflecting the new script selection + tool allowlist changes. |
| .github/workflows/nightly-fix-finder/00-todo-fixme-hack.sh | Category script: emits guidance + TODO/FIXME/HACK scan sample and counts. |
| .github/workflows/nightly-fix-finder/01-nullable-reference-types.sh | Category script: emits nullable opt-in guidance + sample candidates and count. |
| .github/workflows/nightly-fix-finder/02-obsolete-api-usage.sh | Category script: emits obsolete-API guidance + sample matches. |
| .github/workflows/nightly-fix-finder/03-performance-antipatterns.sh | Category script: emits perf guidance + samples for concat-in-loops / sync-over-async / LINQ allocs / formatting. |
| .github/workflows/nightly-fix-finder/04-missing-xml-docs.sh | Category script: emits Mono.Android XML docs guidance + sample public declarations. |
| .github/workflows/nightly-fix-finder/05-general-mistakes.sh | Category script: emits “read for real bugs” guidance + random file sample list. |
| .github/workflows/nightly-fix-finder/06-unused-using-directives.sh | Category script: emits guidance + reports files with high using counts. |
| .github/workflows/nightly-fix-finder/07-error-handling.sh | Category script: emits guidance + sample catch (Exception) matches. |
| .github/workflows/nightly-fix-finder/08-string-literal-error-messages.sh | Category script: emits resources/localization guidance + sample hardcoded log-string matches. |
| .gitattributes | Forces LF for *.sh to keep scripts consistent across platforms. |
Three fixes from the Copilot review bot: 1. Guard against an empty scripts directory — emit a clear error before basename gets an empty arg. 2. Run scripts with `bash -o pipefail` so the `| shuf | head -N || echo 'None found'` fallback fires when grep matches nothing. Without pipefail, head exits 0 even when its upstream returned no rows, swallowing the fallback. 3. Drop `bash:*` from the tools allowlist — the scan step runs before the agent starts, so the agent never needs to invoke bash itself. Stay least-privilege. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
|
✅ Android PR Reviewer completed successfully! |
There was a problem hiding this comment.
🤖 Review Summary
Verdict: ✅ LGTM (with one minor docs fix)
Nice refactor — the per-script architecture is a clear improvement over the monolithic case statement. The self-documenting GUIDANCE heredocs, the confidence-gate rubric, and the improved exclusion filters (now properly filtering *.generated.cs/*.Designer.cs across all categories) are all solid wins.
Issues found
| Severity | Count | Category |
|---|---|---|
| 1 | Documentation — "Adding a New Category" section omits the dropdown sync step |
Positive callouts
- 🎯 The confidence rubric (Phase 3) with hard safety floor is a smart addition — prevents risky auto-filed issues
- 🧹 Scripts now consistently exclude generated files, which is an improvement over the old inline code
- ✅ Error handling in the script runner (file-not-found check, empty-dir check) is thorough
- 📋
bash -o pipefailon the script invocation is a good safety net
Generated by Android PR Reviewer for issue #11451 · ● 44.4M
…ory' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rowIfNull regressions PR #11455 emitted `ArgumentNullException.ThrowIfNull` into `Xamarin.Android.Build.Tasks.csproj` (which targets `netstandard2.0` where that API doesn't exist), breaking the build. Two changes to make sure that class of bug can't slip through again: 1. **01-nullable-reference-types.sh** now walks up from each candidate `.cs` file to find its owning `*.csproj` and prints the project's `<TargetFramework[s]>` value next to it. The agent literally cannot miss the TFM now — it's right there in the scan data. 2. **Phase 2.5 (TFM / Language-Version Sanity Check)** added to the prompt. A small `API → minimum TFM` table covers the most common compile-break sources (`ArgumentNullException.ThrowIfNull`/.NET 6, `string.Contains (char)`/netstandard2.1, collection expressions/C# 12, etc.) with safe fallbacks for older TFMs. The instruction explicitly cites PR #11455 so the precedent is on record. The script's nullable guidance heredoc also gained a CRITICAL section restating the .NET 6 rule with the regression's PR number so the agent reading the scan output sees the warning before the candidates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Issue #11457 was filed against `src/Xamarin.Android.Build.Tasks/Tests/Xamarin.ProjectTools/Common/SolutionBuilder.cs` — test infrastructure, not shipped code. We don't spend cycles enabling NRT on non-shipped code. Fix: `01-nullable-reference-types.sh` now excludes `/Tests/`, `/Test/`, `/tests/`, `*Test.cs`, and `*Tests.cs` from both the candidate pick and the total count. Guidance text updated to make the scope (shipped code only) explicit and to instruct the agent to `noop` rather than file an issue for a test file it somehow encounters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why
The nightly
fix-finderagentic workflow has been growing organically: every category lives as a branch in one giantcasestatement inside.github/workflows/nightly-fix-finder.md, and the per-category guidance lives separately in the prompt prose. The two have started drifting (e.g. category 5 "General Mistakes" has prose guidance but the bash just dumps 5 random files). The workflow also files issues with no objective quality bar.mono/SkiaSharp#3975 hit the same problems and landed a clean refactor. This PR adopts the top three ideas from it.
What changed
Scripts-folder architecture. Each category is now a self-contained
.shfile under.github/workflows/nightly-fix-finder/. The workflow's bash step picks one at random withshuf -n 1(or honors thescriptdropdown input on manual dispatch). Adding a category = drop a new.shfile + add its name to the dropdown options. No morecasestatement.Self-documenting scripts. Each script prints a
GUIDANCEheredoc (what to look for / how to fix / what NOT to flag) directly above itsgrep. The agent reads one combined output; the prompt no longer carries a parallel category table that can drift.Confidence-gate scoring (Phase 3). The prompt adds a 30-point rubric across
actionability,safety, andscope. The agent must score the proposed fix andnoopif the total is below 22/30 orsafetyis below 6. The PR description in SkiaSharp#3975 includes validation data showing the rubric correctly stopped a risky behavior-change fix; the same protection now applies here, where issues are auto-assigned to Copilot.All 9 existing categories were ported verbatim into individual scripts; behavior is otherwise unchanged.
.gitattributesgains*.sh eol=lfso the scripts stay LF on Windows checkouts.nightly-fix-finder.lock.ymlregenerated viagh aw compile(0 errors, 0 warnings).Notes for reviewers
workflow_dispatchscriptinput is a dropdown (kept astype: choice) so it remains usable from the GitHub Actions UI. The script list there must stay in sync with files in.github/workflows/nightly-fix-finder/-- that is the one place duplication still exists.bash -nsyntax-checked, and00-todo-fixme-hack.shwas smoke-run locally and produced real candidates fromsrc/.