[ci-scan] Skip stackoverflowtester under interpreter mode (refs #127899)#128737
[ci-scan] Skip stackoverflowtester under interpreter mode (refs #127899)#128737github-actions[bot] wants to merge 2 commits into
Conversation
The stackoverflowtester test hits a StackFrameIterator assert (m_crawl.GetCodeInfo()->IsValid()) when running under the CoreCLR interpreter, causing the process to fail-fast with 0xC0000602 instead of the expected stack overflow exit code. This has been consistently failing on windows-arm64 and windows-x64 in the runtime-interpreter pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg |
|
This issue needs to be understood, there is no inherent reason why the codeinfo would be invalid. The cases when it occured in the CI don't have the dumps / logs available anymore, so I don't think we should disable the test without trying to understand the problem. |
|
windows runtime tests was completely broken up until recently so these failures started being reported now. Here is the failure as I'm seeing it in the latest run: I'm also very aggressive with fixing these pipelines without disabling any tests. I would normally take a look at this next week unless you want to look into this @janvorli |
|
@BrzVlad I'll try to take a look today. |
…enced-issue checks (#128760) ## Description Three prompt edits to `.github/workflows/ci-failure-scan.md` proposed by the ci-failure-scan-feedback meta-loop in #128755. Step 4.7 now also reads the body and latest 5 comments of any issue referenced in the KBE, so maintainer "do not disable" signals on the root-cause issue override the KBE candidate (motivated by #128737). The `ActiveIssue` section gains the full 4-parameter overload `(string, TestPlatforms, TargetFrameworkMonikers, TestRuntimes)` with an explicit warning not to place a `TestRuntimes` value in the `TargetFrameworkMonikers` slot (motivated by #128469, which failed to compile from that exact swap). A short pre-emit compile-validation checklist is added for test-disable PRs. No `gh aw compile` needed: the scanner workflow loads this file at runtime via `{{#runtime-import}}`. Closes #128755. Co-authored-by: Milos Kotlar <miloskotlar@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…signal (#128837) ## Description Extends Step 4.7 of `.github/workflows/ci-failure-scan.md` so the CI failure scanner respects maintainer closes of prior `[ci-scan]` test-disable PRs. A `MEMBER`/`OWNER` close of a recent (within 30 days) test-disable PR for the same test or KBE is now treated as a do-not-disable signal, and re-filing requires fresh evidence such as a new maintainer comment on the KBE greenlighting the disable or a clearly different failure signature. PR #128793 re-filed the `stackoverflowtester` interpreter-mode disable that @BrzVlad had closed the day before in PR #128737 with a pushback to investigate the assert rather than mute the test. The earlier prompt fix in PR #128760 did not catch this case for two reasons: 1. It was merged on 2026-06-01, after the scanner run on 2026-05-30 that produced PR #128793. 2. Even if applied, it only inspects the KBE issue body and the bodies of issues referenced from it. KBE #127899 has no maintainer comments. The do-not-disable signal lived on closed PR #128737, not on the KBE. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrites Step 7 of ci-failure-scan-feedback.md to produce a smaller, plainer tracker focused on what maintainers actually want to see: opened/closed counts, a single wrong-closure rate, and a red/green table of outage signals on the analyzed CI. Dropped: - Acceptance classification with Wilson 95% lower bound. The bucket rules were opinionated and the Wilson math was overkill for an N=27 sample. Replaced by a flat 'wrong-closure rate' in the Quality block. - 90-day window. Identical to 30d while the scanner is < 30 days old; will become useful later but adds rows now. - CI environment section sourced from agent-log tally extraction. This rendered as '— (tally extraction unavailable)' most ticks because the parser was fragile; drop it rather than ship n/a. - Coverage and time-to-KBE percentiles. Also rendered as n/a. - Per-artifact rubric scorecard table in the agent log. Kept: - Tracker bootstrap, window-start caching, in-scope search, integrity-gated comment reads. Added: an explicit Outage signals table reflecting the health of the CI being monitored (not the scanner workflow). Five signals with fixed thresholds and a 🔴/🟢 status icon: - New-KBE burst (day > 2x trailing 30d median) - Build-break spike (>= 2 in any 24h) - Multi-pipeline outage (>= 3 distinct pipelines in 24h) - KBE re-filed after maintainer close (any in 7d) — would have caught dotnet#128737/dotnet#128793 - Wrong-closure rate >= 30% with N>=10 Red rows must append a 'details:' line citing the offending artifact so maintainers can jump straight to it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Description Rewrites Step 7 of `.github/workflows/ci-failure-scan-feedback.md` to produce a smaller, plainer KPI tracker (#128742) focused on what maintainers actually want to see at a glance. ## Motivation Current tracker headline metrics are overengineered for a scanner that's < 30 days old and mostly render as `n/a` (tally extraction unavailable, distinct_signatures < 10, etc.). Maintainers want simple counts plus a clear signal when CI itself is degrading. Per direct feedback: simple measures for now, plus signals for bigger outages. ## What changes ### Removed - Acceptance classification with Wilson 95% lower bound. The bucket rules were opinionated and the Wilson math is overkill at N=27. Replaced by a flat wrong-closure rate. - 90-day window. Identical to 30d while the scanner is < 30 days old. - CI environment section sourced from agent-log tally extraction. Renders as `— (tally extraction unavailable)` most ticks; dropped rather than shipped as n/a. - Coverage ratio and time-to-KBE percentiles. Also n/a most ticks. - Per-artifact rubric scorecard table appended to the agent log. ### Kept - Tracker bootstrap, window-start caching, in-scope search, integrity-gated comment reads. ### Added: Outage signals (analyzed CI) Five signals with fixed thresholds and a 🔴/🟢 status icon. These reflect the health of the **CI being monitored**, not the scanner workflow itself: | signal | threshold | |---|---| | New-KBE burst | day > 2x trailing 30d median (min 3) | | Build-break spike | >= 2 in any 24h | | Multi-pipeline outage | >= 3 distinct pipelines in 24h | | KBE re-filed after maintainer close | any in 7d | | Wrong-closure rate | >= 30% with N>=10 | The re-file signal would have flagged the #128737 / #128793 episode. When a row is 🔴, the tracker appends a `details:` line citing the offending artifact so maintainers can jump straight to it. ## New body shape ``` ## Snapshot — <UTC timestamp> ### Activity (last 7d) | artifact | opened | closed (good) | closed (wrong) | | Issues | … | completed | not_planned/duplicate | | PRs | … | merged | closed unmerged | ### Quality (last 30d) | metric | count | rate | | Total artifacts opened | | — | | Wrong closures | | % | | Maintainer rejection comments | | — | | Duplicate KBEs | | — | ### Outage signals (analyzed CI) | signal | threshold | 24h | 7d | status | | New-KBE burst | day > 2x 30d median (min 3) | | | | | Build-break spike | >= 2 in any 24h | | | | | Multi-pipeline outage | >= 3 distinct in 24h | | | | | KBE re-filed after maintainer close | any in 7d | | | | | Wrong-closure rate (30d) | >= 30% with N>=10 | — | | | ``` cc @kg > [!NOTE] > This PR was authored with assistance from GitHub Copilot CLI. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line dedup (#128840) Two precision fixes to the CI Outer-Loop Failure Scanner prompt (`.github/workflows/ci-failure-scan.md`), driven by observed scanner misbehavior: a test-disable PR rejected by maintainers who wanted to investigate (PR #128737), and a duplicate KBE filed for the same signature under two pipeline definitions (#128697, dup of #128531). ### Step 4.7 — Stronger do-not-disable detection - Added six rejection phrases to the MEMBER/OWNER phrase list: `trying to understand`, `without disabling`, `i'm fixing`, `i am fixing`, `understand the problem`, `root cause`. - Added a heuristic catch-all so the scanner skips a test-disable when any MEMBER/OWNER comment reads as investigation or fix-forward intent, even when no exact phrase matches. ### Step 4.0 — Cross-definition dedup - Added a second dedup check on the definition-independent key `<queue>|<stress_mode>|<signature_norm>`. Since a KBE matches on signature text regardless of pipeline definition, the same signature surfacing under different definition IDs in one run is now skipped as a cross-def dup instead of filed as a second KBE. - Clarified scope: cross-def dedup applies to KBE filing (Branch A); a distinct test-disable PR per definition is still permitted. Markdown-only change to the workflow prompt. The body is read from the repo at runtime and is not embedded in `ci-failure-scan.lock.yml`, and frontmatter is unchanged, so the generated lock file is unaffected. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: kotlarmilos <11523312+kotlarmilos@users.noreply.github.com> Co-authored-by: Milos Kotlar <kotlarmilos@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Milos Kotlar <115233127+kotlarmilos@users.noreply.github.com>
Reasoning
The
stackoverflowtestertest hits aStackFrameIteratorassert (m_crawl.GetCodeInfo()->IsValid()) when running under the CoreCLR interpreter. The interpreter does not produce validCodeInfofor its frames during stack overflow unwinding, causing the process to fail-fast (exit code 0xC0000602) instead of producing the expected stack overflow exit code (0xC00000FD). This is a known interpreter-mode incompatibility, not a product bug in the non-interpreter path.Linked KBE: #127899
Match verification (from Step 4.8):
baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmdin runtime-interpreter pipelinem_crawl.GetCodeInfo()->IsValid()assert followed by 0xC0000602 exit code (2 matches in failure.log)Impact on platforms
Errors log
First build it occurred
Linked issue
#127899
Filed by
ci-failure-scan, which scans dnceng-public outer-loop pipelines onmainand converts stable failures into KBEs and test-disable PRs. Comment here or on the workflow file to suggest changes;ci-failure-scan-feedbackreads in-scope feedback daily and opens (or updates) a PR with prompt edits.Note
🔒 Integrity filter blocked 1 item
The following item was blocked because it doesn't meet the GitHub integrity level.
search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".To allow these resources, lower
min-integrityin your GitHub frontmatter: