…view (#5819)
## Summary
Two minimal resilience patches for the dominant transient failures
observed in the daily-regen audit (2026-05-06):
1. **`impl-review.yml` — \"Extract PR info\" step** (5 failures in 24h).
Wrap `gh pr view` in a 3-attempt retry with exponential backoff. When
the GitHub API blips, the entire review job aborts and the PR ends up
unlabeled — blocking the review → repair → merge cascade. Concretely
caused 4 of the 14 stuck PRs we recovered today (#5696, #5789, #5796,
#5801 had no labels at all because review never made it past step 1).
2. **`impl-generate.yml` — \"Create library metadata file\" step** (6
failures in 24h). Wrap the final `git push origin \"\$BRANCH\"` in a
3-attempt retry that does `fetch + rebase` between attempts. The
dominant failure mode is racing against Claude's earlier push to the
same branch — when the metadata commit hits a non-fast-forward, the
whole generation aborts and the PR never opens.
Both fixes are inline bash retries — no new action dependency. Each step
still hard-fails after 3 attempts so persistent issues still surface (we
don't want to mask real bugs).
## Out of scope (deferred)
- **`daily-regen.yml` \"pick\" job clean exit**: the existing logic
already writes `count=0` when no specs are eligible, and downstream is
gated on `if: needs.pick.outputs.count != '0'`. The 2 reported
\"cancellations\" in the audit period look unrelated to the pick step
itself (likely scheduler-level events).
- **Auto-retry on `ai-review-failed`**: would require a more invasive
`if: failure()` job-level step. Holding until we see whether the
Extract-PR-info retry alone reduces the rate enough.
## Context
This branch is the Stage 5 follow-up to today's recovery work, which
manually shepherded 13 of 14 stuck PRs through the review/repair/merge
pipeline. The full investigation + recovery plan lives at
\`/home/tirao/.claude/plans/bitte-schaue-dir-alle-peppy-bunny.md\`
(local).
## Test plan
- [x] YAML-validate both edited workflows (passes)
- [x] No-op for the happy path: first attempt of each retry preserves
existing behavior exactly
- [ ] CI green on this PR
- [ ] Post-merge: watch the next 24h of impl-generate / impl-review run
conclusions; expect failure rate to drop from ~10% to near 0% on these
two steps
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implementation:
heatmap-annotated- python/bokehImplements the python/bokeh version of
heatmap-annotated.File:
plots/heatmap-annotated/implementations/python/bokeh.pyParent Issue: #1824
🤖 impl-generate workflow