Skip to content

[gh-aw] Refactor nightly-fix-finder into per-category scripts#11451

Open
jonathanpeppers wants to merge 6 commits into
mainfrom
jonathanpeppers/nightly-fix-finder-scripts
Open

[gh-aw] Refactor nightly-fix-finder into per-category scripts#11451
jonathanpeppers wants to merge 6 commits into
mainfrom
jonathanpeppers/nightly-fix-finder-scripts

Conversation

@jonathanpeppers
Copy link
Copy Markdown
Member

Why

The nightly fix-finder agentic workflow has been growing organically: every category lives as a branch in one giant case statement inside .github/workflows/nightly-fix-finder.md, and the per-category guidance lives separately in the prompt prose. The two have started drifting (e.g. category 5 "General Mistakes" has prose guidance but the bash just dumps 5 random files). The workflow also files issues with no objective quality bar.

mono/SkiaSharp#3975 hit the same problems and landed a clean refactor. This PR adopts the top three ideas from it.

What changed

  1. Scripts-folder architecture. Each category is now a self-contained .sh file under .github/workflows/nightly-fix-finder/. The workflow's bash step picks one at random with shuf -n 1 (or honors the script dropdown input on manual dispatch). Adding a category = drop a new .sh file + add its name to the dropdown options. No more case statement.

  2. Self-documenting scripts. Each script prints a GUIDANCE heredoc (what to look for / how to fix / what NOT to flag) directly above its grep. The agent reads one combined output; the prompt no longer carries a parallel category table that can drift.

  3. Confidence-gate scoring (Phase 3). The prompt adds a 30-point rubric across actionability, safety, and scope. The agent must score the proposed fix and noop if the total is below 22/30 or safety is below 6. The PR description in SkiaSharp#3975 includes validation data showing the rubric correctly stopped a risky behavior-change fix; the same protection now applies here, where issues are auto-assigned to Copilot.

All 9 existing categories were ported verbatim into individual scripts; behavior is otherwise unchanged. .gitattributes gains *.sh eol=lf so the scripts stay LF on Windows checkouts. nightly-fix-finder.lock.yml regenerated via gh aw compile (0 errors, 0 warnings).

Notes for reviewers

  • The workflow_dispatch script input is a dropdown (kept as type: choice) so it remains usable from the GitHub Actions UI. The script list there must stay in sync with files in .github/workflows/nightly-fix-finder/ -- that is the one place duplication still exists.
  • All 9 scripts were bash -n syntax-checked, and 00-todo-fixme-hack.sh was smoke-run locally and produced real candidates from src/.
  • No unit tests (this is workflow infrastructure, not product code).

jonathanpeppers and others added 2 commits May 22, 2026 10:01
Adopts three improvements from mono/SkiaSharp#3975:

1. **Scripts-folder architecture**: each category lives in its own .sh file under`.github/workflows/nightly-fix-finder/`. Adding a category = drop a new file; `shuf -n 1` picks one per run. No more two-places-to-edit between the YAML case statement and the prompt's category table.

2. **Self-documenting scripts**: each script prints its own `GUIDANCE` heredoc (what to look for / how to fix / what NOT to flag) right next to its `grep`. The agent reads one combined output instead of cross-referencing a separate prompt table.

3. **Confidence-gate scoring**: Phase 3 adds a 30-point rubric (actionability + safety + scope). Below 22/30 or safety < 6 ⇒ `noop`. Stops the worst failure mode — noisy or risky issues getting auto-assigned to Copilot.

All 9 existing categories ported verbatim. Bash syntax-checked, smoke-tested locally. lock.yml regenerated via `gh aw compile`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the nightly fix-finder agentic workflow from an inlined case statement into a per-category script architecture, moving category guidance into the scripts themselves and adding a confidence-gate rubric to reduce low-quality/unsafe auto-filed issues.

Changes:

  • Introduced .github/workflows/nightly-fix-finder/*.sh per-category scripts that emit guidance + scan data.
  • Updated nightly-fix-finder.md to select and run a script (randomly or via workflow_dispatch choice) and added a 30-point confidence rubric gate before filing issues.
  • Added .gitattributes enforcement for LF endings on *.sh and regenerated nightly-fix-finder.lock.yml.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
.github/workflows/nightly-fix-finder.md Picks a script (random or chosen), runs it to generate scan results, and updates the agent prompt with rubric gating + script-based guidance.
.github/workflows/nightly-fix-finder.lock.yml Regenerated compiled workflow reflecting the new script selection + tool allowlist changes.
.github/workflows/nightly-fix-finder/00-todo-fixme-hack.sh Category script: emits guidance + TODO/FIXME/HACK scan sample and counts.
.github/workflows/nightly-fix-finder/01-nullable-reference-types.sh Category script: emits nullable opt-in guidance + sample candidates and count.
.github/workflows/nightly-fix-finder/02-obsolete-api-usage.sh Category script: emits obsolete-API guidance + sample matches.
.github/workflows/nightly-fix-finder/03-performance-antipatterns.sh Category script: emits perf guidance + samples for concat-in-loops / sync-over-async / LINQ allocs / formatting.
.github/workflows/nightly-fix-finder/04-missing-xml-docs.sh Category script: emits Mono.Android XML docs guidance + sample public declarations.
.github/workflows/nightly-fix-finder/05-general-mistakes.sh Category script: emits “read for real bugs” guidance + random file sample list.
.github/workflows/nightly-fix-finder/06-unused-using-directives.sh Category script: emits guidance + reports files with high using counts.
.github/workflows/nightly-fix-finder/07-error-handling.sh Category script: emits guidance + sample catch (Exception) matches.
.github/workflows/nightly-fix-finder/08-string-literal-error-messages.sh Category script: emits resources/localization guidance + sample hardcoded log-string matches.
.gitattributes Forces LF for *.sh to keep scripts consistent across platforms.

Comment thread .github/workflows/nightly-fix-finder.md
Comment thread .github/workflows/nightly-fix-finder.md Outdated
Comment thread .github/workflows/nightly-fix-finder.md Outdated
Three fixes from the Copilot review bot:

1. Guard against an empty scripts directory — emit a clear error before basename gets an empty arg.
2. Run scripts with `bash -o pipefail` so the `| shuf | head -N || echo 'None found'` fallback fires when grep matches nothing. Without pipefail, head exits 0 even when its upstream returned no rows, swallowing the fallback.
3. Drop `bash:*` from the tools allowlist — the scan step runs before the agent starts, so the agent never needs to invoke bash itself. Stay least-privilege.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jonathanpeppers
Copy link
Copy Markdown
Member Author

/review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 22, 2026

Android PR Reviewer completed successfully!

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review Summary

Verdict: ✅ LGTM (with one minor docs fix)

Nice refactor — the per-script architecture is a clear improvement over the monolithic case statement. The self-documenting GUIDANCE heredocs, the confidence-gate rubric, and the improved exclusion filters (now properly filtering *.generated.cs/*.Designer.cs across all categories) are all solid wins.

Issues found

Severity Count Category
⚠️ warning 1 Documentation — "Adding a New Category" section omits the dropdown sync step

Positive callouts

  • 🎯 The confidence rubric (Phase 3) with hard safety floor is a smart addition — prevents risky auto-filed issues
  • 🧹 Scripts now consistently exclude generated files, which is an improvement over the old inline code
  • ✅ Error handling in the script runner (file-not-found check, empty-dir check) is thorough
  • 📋 bash -o pipefail on the script invocation is a good safety net

Generated by Android PR Reviewer for issue #11451 · ● 44.4M

Comment thread .github/workflows/nightly-fix-finder.md Outdated
jonathanpeppers and others added 2 commits May 22, 2026 11:04
…ory'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rowIfNull regressions

PR #11455 emitted `ArgumentNullException.ThrowIfNull` into `Xamarin.Android.Build.Tasks.csproj` (which targets `netstandard2.0` where that API doesn't exist), breaking the build. Two changes to make sure that class of bug can't slip through again:

1. **01-nullable-reference-types.sh** now walks up from each candidate `.cs` file to find its owning `*.csproj` and prints the project's `<TargetFramework[s]>` value next to it. The agent literally cannot miss the TFM now — it's right there in the scan data.

2. **Phase 2.5 (TFM / Language-Version Sanity Check)** added to the prompt. A small `API → minimum TFM` table covers the most common compile-break sources (`ArgumentNullException.ThrowIfNull`/.NET 6, `string.Contains (char)`/netstandard2.1, collection expressions/C# 12, etc.) with safe fallbacks for older TFMs. The instruction explicitly cites PR #11455 so the precedent is on record.

The script's nullable guidance heredoc also gained a CRITICAL section restating the .NET 6 rule with the regression's PR number so the agent reading the scan output sees the warning before the candidates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Issue #11457 was filed against `src/Xamarin.Android.Build.Tasks/Tests/Xamarin.ProjectTools/Common/SolutionBuilder.cs` — test infrastructure, not shipped code. We don't spend cycles enabling NRT on non-shipped code.

Fix: `01-nullable-reference-types.sh` now excludes `/Tests/`, `/Test/`, `/tests/`, `*Test.cs`, and `*Tests.cs` from both the candidate pick and the total count. Guidance text updated to make the scope (shipped code only) explicit and to instruct the agent to `noop` rather than file an issue for a test file it somehow encounters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants