feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep#565
feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep#565carlos-alm merged 66 commits intomainfrom
Conversation
…housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches
|
Claude finished @carlos-alm's task —— View job PR Review: Maintenance SkillsReviewing this PR that adds 4 new maintenance skills for Claude Code. Todo List:
Root Cause AnalysisProblem Being Solved: This PR addresses maintenance debt and technical hygiene gaps that accumulate in active development repositories. The skills automate routine maintenance tasks that are typically done manually or inconsistently. Why This Approach: These are documentation-only additions that define Claude Code skills (not executable code), following the established Risk Assessment: Low risk - these are pure documentation files that define maintenance workflows. No code changes, no dependencies added. Backlog Compliance
Critical ConcernsNone identified. The skills are:
Quality AssessmentExcellent Design Patterns:
Minor Observations:
Technical Validation:
Final Recommendation
|
Greptile SummaryThis PR introduces four maintenance skills ( Key changes in this diff:
Issues found:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
START([skill invoked]) --> P0[Phase 0 — Pre-flight\nparse args, check repo]
P0 --> BC_P1[bench-check\nPhase 1: run benchmarks\ntimeout 300 each]
BC_P1 --> BC_ABORT{metrics empty?}
BC_ABORT -- yes --> BC_P6A[Phase 6: ABORTED report]
BC_ABORT -- no --> BC_P3{SAVE_ONLY\nor no baseline?}
BC_P3 -- yes --> BC_P4S[Phase 4: First-run\nor Save-baseline verdict]
BC_P3 -- no --> BC_P3C[Phase 3: compare\ncompute delta_pct]
BC_P3C --> BC_P4V{regressions?}
BC_P4V -- yes --> BC_P6F[Phase 6: FAILED report\nno baseline update]
BC_P4V -- no --> BC_P5{COMPARE_ONLY?}
BC_P5 -- no --> BC_P5S[Phase 5: save baseline\ngit commit files]
BC_P5 -- yes --> BC_P6P[Phase 6: PASSED report\nno baseline update]
BC_P5S --> BC_P6P
BC_P4S --> BC_P6B[Phase 6: BASELINE SAVED report]
BC_P6A --> BC_P7[Phase 7: print summary]
BC_P6F --> BC_P7
BC_P6P --> BC_P7
BC_P6B --> BC_P7
P0 --> DA_P1[deps-audit\nPhase 0: stash if --fix\ncapture STASH_REF by name]
DA_P1 --> DA_P2[Phases 1–5: audit\nsecurity/outdated/unused\nlicense/duplicates]
DA_P2 --> DA_P6[Phase 6: report]
DA_P6 --> DA_P7{AUTO_FIX?}
DA_P7 -- no --> DA_END([done])
DA_P7 -- yes --> DA_TEST[npm test]
DA_TEST --> DA_PASS{pass?}
DA_PASS -- yes + STASH_REF non-empty --> DA_POP[git stash pop\nnpm install\nre-run npm test]
DA_PASS -- yes + STASH_REF empty --> DA_END
DA_POP --> DA_END
DA_PASS -- no + STASH_REF non-empty --> DA_RESTORE[git checkout HEAD\ngit stash pop\nnpm ci]
DA_PASS -- no + STASH_REF empty --> DA_REVERT[git checkout\nnpm ci]
DA_RESTORE --> DA_END
DA_REVERT --> DA_END
P0 --> TH_P1[test-health\nPhase 1: mktemp RUN_DIR\nrun FLAKY_RUNS × vitest\ntimeout 180 each]
TH_P1 --> TH_P1A[exclude invalid runs\nmin 2 valid runs\nfor flaky detection]
TH_P1A --> TH_P2[Phase 2: dead tests\nPhase 3: coverage json-summary\nPhase 4: structure]
TH_P2 --> TH_P5[Phase 5: report\nrm -rf RUN_DIR]
P0 --> HK_P1[housekeep\nPhase 1: worktree prune\n+ confirm stale removal]
HK_P1 --> HK_P2[Phase 2: dirt files\ndual discovery\nlsof lock guard]
HK_P2 --> HK_P3[Phase 3: git fetch\ngit pull --no-rebase]
HK_P3 --> HK_P4[Phase 4: prune merged branches\nconfirm each]
HK_P4 --> HK_P5[Phase 5: source-repo guard\nalways skipped here]
HK_P5 --> HK_P6[Phase 6: health checks\ncoverage / graph / git fsck]
HK_P6 --> HK_P7[Phase 7: console report]
|
.claude/skills/bench-check/SKILL.md
Outdated
| - `nodeCount`, `edgeCount` — graph size | ||
|
|
||
| ### 1b. Incremental Benchmark | ||
|
|
||
| ```bash | ||
| node scripts/incremental-benchmark.js 2>/dev/null | ||
| ``` | ||
|
|
||
| Extract: | ||
| - `noOpRebuild` (ms) — time for no-change rebuild | ||
| - `singleFileRebuild` (ms) — time after one file change | ||
| - `importResolution` (ms) — resolution throughput | ||
|
|
||
| ### 1c. Query Depth Benchmark | ||
|
|
||
| ```bash | ||
| node scripts/query-benchmark.js 2>/dev/null | ||
| ``` | ||
|
|
||
| Extract: | ||
| - `fnDeps` scaling by depth | ||
| - `fnImpact` scaling by depth | ||
| - `diffImpact` latency | ||
|
|
||
| ### 1d. Embedding Benchmark (optional) | ||
|
|
||
| ```bash | ||
| node scripts/embedding-benchmark.js 2>/dev/null | ||
| ``` | ||
|
|
||
| Extract: | ||
| - `embeddingTime` (ms) | ||
| - `recall` at Hit@1, Hit@3, Hit@5, Hit@10 | ||
|
|
||
| > **Timeout:** Each benchmark gets 5 minutes max. If it times out, record `"timeout"` for that suite and continue. |
There was a problem hiding this comment.
2>/dev/null discards error messages that should be recorded
All four benchmark invocations pipe stderr to /dev/null, yet the Rules section (line 222) and the error-handling note on line 90 both say to record "error: <message>" when a script exits non-zero. With stderr silenced there is no message to capture — the skill can only detect that a failure occurred, not why.
Replace the silent discard with a captured-stderr pattern:
# Instead of:
node scripts/benchmark.js 2>/dev/null
# Use:
output=$(node scripts/benchmark.js 2>&1)
exit_code=$?
if [ $exit_code -ne 0 ]; then
# record "error: $output" and continue
fiThis applies to all four benchmark invocations (lines 48, 59, 70, 81).
There was a problem hiding this comment.
Fixed in commit a562b52. All four benchmark invocations now use output=$(... 2>&1) plus an exit_code check, so stderr is captured and recorded as "error: $output" rather than silently discarded.
There was a problem hiding this comment.
Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5, so the commit is skipped when the baseline values are identical to the previously committed version.
| - **Improvement**: metric decreased by more than `THRESHOLD`% (time) or increased (quality) | ||
| - **Stable**: within threshold | ||
|
|
||
| > **Direction awareness:** For latency metrics (ms), higher = worse. For recall/quality metrics, higher = better. For count metrics (nodes, edges), changes are informational only — not regressions. |
There was a problem hiding this comment.
Division-by-zero when baseline metric is
0
delta_pct = ((current - baseline) / baseline) * 100
If baseline is 0 (e.g. graph.edges on the very first build, or embedding.recall.hit1 on a cold model), this formula produces NaN or Infinity. The skill should guard against a zero baseline before computing the percentage:
if baseline == 0:
mark as "N/A — baseline was zero"
else:
delta_pct = ((current - baseline) / baseline) * 100
There was a problem hiding this comment.
Fixed in commit a562b52. Phase 3 now guards before dividing: when baseline == 0, the delta is marked "N/A — baseline was zero" and the metric is treated as informational only (no regression/improvement classification).
There was a problem hiding this comment.
The division-by-zero guard was already added in a562b52 (Round 2). The current Phase 3 text at step 2 reads: "Guard against division-by-zero: if baseline == 0, mark the delta as N/A and treat the metric as informational only (not a regression or improvement)." No further change needed.
…line - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule
- After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code
…erion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>`
…at/maintenance-skills
|
Addressing the 5 items from the "Comments Outside Diff" section of the Greptile review: 1. deps-audit — incomplete revert (line 388–392): Fixed in commit 4fc994d. The Phase 7 revert path now runs 2. housekeep — Phase 5 self-update incoherent in source repo (line 572–607): Fixed in commit 89aef6b. Added a source-repo guard at the top of Phase 5: when running inside the codegraph source repo (confirmed by pre-flight), the phase is skipped entirely and prints 3. housekeep — stale-worktree creation timestamp not available (line 445–451): Fixed in commit 89aef6b. Replaced "created more than 7 days ago with no commits since" with a criterion that is actually determinable: the branch's last commit is more than 7 days old (via 4. bench-check — baseline declared committed but no git add/commit (line 226–227): Fixed in commit a562b52. Phase 5 now includes explicit 5. deps-audit — |
| ## Phase 6 — Report | ||
|
|
||
| Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`: | ||
|
|
||
| ```markdown |
There was a problem hiding this comment.
Auto-commit fails silently when baseline values are unchanged
If the benchmark results are numerically identical to the stored baseline (e.g. a second back-to-back run on the same commit), generated/bench-check/baseline.json and history.ndjson will still be rewritten with a new timestamp. However, git commit will exit non-zero with "nothing to commit" if git detects the content hash is the same — causing the Phase 5 step to produce a spurious error.
Guard with --allow-empty (not ideal) or, better, only run the commit when the files actually changed:
git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
if ! git diff --cached --quiet; then
git commit -m "chore: update bench-check baseline (<gitRef>)"
fiThis keeps Phase 5 idempotent and avoids a noisy error on successive clean runs.
There was a problem hiding this comment.
Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5. When the baseline values are identical to the previously committed version, git add stages nothing and the commit is now skipped instead of erroring out.
|
Addressing Round 3 outside-diff feedback (ce5d811): test-health/SKILL.md � Changed both the flaky-detection loop and the coverage run from housekeep/SKILL.md � Changed Phase 4a from deps-audit/SKILL.md � Replaced the |
.claude/skills/deps-audit/SKILL.md
Outdated
| **Before running any auto-fix** (in Phase 1/2), save the original manifests so pre-existing unstaged changes are preserved: | ||
| ```bash | ||
| git stash push -m "deps-audit-backup" -- package.json package-lock.json | ||
| ``` | ||
|
|
||
| Summarize all changes made: | ||
| 1. List each package updated/fixed | ||
| 2. Run `npm test` to verify nothing broke | ||
| 3. If tests pass: drop the saved state (`git stash drop`) | ||
| 4. If tests fail: | ||
| - Restore the saved manifests: `git stash pop` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed |
There was a problem hiding this comment.
Stash runs too late — backup is taken after files are already modified
The git stash push is described inside Phase 7, which executes after Phase 1 (npm audit fix) and Phase 2 (npm update) have already modified package.json and package-lock.json. An agent following the phases sequentially will reach Phase 7 with the manifests already changed, making the stash a backup of the modified state rather than the original state. If tests then fail and git stash pop is invoked, it restores the already-modified files — the recovery is a no-op.
The stash command must run inside Phase 0 (immediately after AUTO_FIX=true is detected), before any npm commands touch the manifests. For example, add a step to Phase 0:
4. Parse `$ARGUMENTS` — set `AUTO_FIX=true` if `--fix` is present
5. **If `AUTO_FIX` is set:** stash the original manifests now, before any modifications:
```bash
git stash push -m "deps-audit-backup" -- package.json package-lock.json
Then remove the stash instruction from Phase 7 (keeping only the summary/rollback logic there).
There was a problem hiding this comment.
Fixed — moved the stash command to Phase 0, step 5, so manifests are backed up before any npm commands run. Removed the stash from Phase 7.
| For stale worktrees with merged branches: | ||
| - List them and ask the user for confirmation before removing | ||
| - If confirmed (or `--full` without `--dry-run`): | ||
| ```bash | ||
| git worktree remove <path> | ||
| git branch -d <branch> # only if fully merged | ||
| ``` | ||
|
|
||
| **If `DRY_RUN`:** Just list what would be removed, don't do it. | ||
|
|
||
| > **Never force-remove** a worktree with uncommitted changes. List it as "has uncommitted work" and skip. |
There was a problem hiding this comment.
Default
--full silently removes stale worktrees without user confirmation
The cleanup condition at line 57 reads: "If confirmed (or --full without --dry-run)". Because --full is the default mode (no arguments = full cleanup), every default run of /housekeep will remove stale worktrees without asking. The project requires every Claude Code session to start with /worktree (CLAUDE.md: "Every session must start with /worktree to get an isolated copy of the repo"), meaning there may be active worktrees belonging to concurrent sessions that the "7-day old last commit" heuristic could incorrectly classify as stale — especially when a session hasn't committed anything yet but the branch tip is old.
The "never force-remove a worktree with uncommitted changes" guard (line 65) only catches worktrees with locally-modified files; it does not protect sessions that branched off main, haven't committed anything yet, but have no staged or unstaged changes either.
The confirmation prompt should always be required for worktree removal (regardless of --full), since this is a potentially destructive, irreversible action. Reserve auto-removal only for explicitly prunable worktrees (missing directory — git worktree prune), which are already guaranteed safe:
For stale worktrees with merged branches:
- **Always** list them and ask the user for confirmation before removing
(even with `--full` — worktree deletion is irreversible)
- If confirmed (and not `--dry-run`): ...There was a problem hiding this comment.
Fixed — worktree removal now always requires user confirmation, regardless of --full mode. Only git worktree prune (missing-directory cleanup) remains automatic.
| { | ||
| "savedAt": "<ISO 8601>", | ||
| "version": "<package version>", | ||
| "gitRef": "<HEAD short SHA>", |
There was a problem hiding this comment.
Hardcoded threshold in saved
baseline.json
Line 180 writes "threshold": 15 literally into baseline.json, regardless of the --threshold N argument parsed in Phase 0. If the user runs /bench-check --threshold 20, the stored baseline will record 15, making the artifact misleading when inspected later. While the skill always re-parses the threshold from $ARGUMENTS rather than reading it from the file, an operator debugging a regression from the JSON file will see the wrong value.
Replace the literal with the parsed variable:
| "gitRef": "<HEAD short SHA>", | |
| "threshold": <THRESHOLD>, |
There was a problem hiding this comment.
Fixed — baseline.json now writes "threshold": $THRESHOLD (the parsed value) instead of the hardcoded 15.
| ### First run (no baseline) | ||
| - Print: `BENCH-CHECK — initial baseline saved` | ||
| - Save current results as baseline | ||
|
|
There was a problem hiding this comment.
--compare-only doesn't guard the "First run" baseline save
Phase 4 has two result paths:
- "No regressions found" correctly guards with "If not
COMPARE_ONLY: update baseline." - "First run (no baseline)" has no such guard — it unconditionally saves a baseline.
An agent running /bench-check --compare-only against a repo with no prior baseline will fall through to the "First run" path and save a baseline, contradicting the --compare-only semantics ("compare against baseline without updating it").
Add the same guard to the first-run path:
### First run (no baseline)
- If `COMPARE_ONLY`: print a warning that no baseline exists and exit
- Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baselineThere was a problem hiding this comment.
Fixed — the first-run path now checks COMPARE_ONLY: if set, prints a warning that no baseline exists and exits without saving. Otherwise proceeds to save the initial baseline as before.
.claude/skills/test-health/SKILL.md
Outdated
| ```bash | ||
| for i in $(seq 1 $FLAKY_RUNS); do | ||
| npx vitest run --reporter=json 2>&1 | ||
| done | ||
| ``` | ||
|
|
||
| For each run, parse the JSON reporter output to get per-test results. |
There was a problem hiding this comment.
Flaky detection loop discards output — nothing to parse
The Phase 1 loop runs:
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json 2>&1
doneThe JSON reporter output streams to stdout and is not captured in any variable, so there is no in-memory data for the "parse the JSON reporter output to get per-test results" step immediately below. Each iteration's output will be mixed into the session's terminal output and lost before comparison is possible.
Each run's output must be captured and stored for cross-run comparison. For example:
mkdir -p /tmp/test-health-runs
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err
doneThen read /tmp/test-health-runs/run-*.json to build the per-test pass/fail matrix.
There was a problem hiding this comment.
Fixed — the flaky-detection loop now redirects each run's output to /tmp/test-health-runs/run-.json (stdout) and /tmp/test-health-runs/run-.err (stderr). The parsing step reads from these per-run files for cross-run comparison.
| If `AUTO_FIX` was set: | ||
|
|
||
| Summarize all changes made: | ||
| 1. List each package updated/fixed | ||
| 2. Run `npm test` to verify nothing broke | ||
| 3. If tests pass: drop the saved state (`git stash drop`) | ||
| 4. If tests fail: | ||
| - Restore the saved manifests: `git stash pop` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed |
There was a problem hiding this comment.
Stash pop/drop operates on wrong entry when Phase 0 stash was a no-op
git stash push -- package.json package-lock.json is a no-op when neither file has any local modifications (the common case — a fresh branch before running the skill). Git outputs "No local changes to save" and exits non-zero without creating a stash entry.
Phase 7 then unconditionally calls either git stash drop (success path) or git stash pop (failure path). With no deps-audit-backup entry on the stack:
- On success:
git stash dropsilently pops a pre-existing, unrelated stash entry (data loss). - On failure:
git stash poprestores the wrong stash, while the brokenpackage.json/package-lock.jsonfrom the failednpm audit fix/npm updaterun remain on disk.
The stash guard in Phase 0 should track whether the stash was actually created:
5. **If `AUTO_FIX` is set:**
```bash
git stash push -m "deps-audit-backup" -- package.json package-lock.json
STASH_CREATED=$?
# STASH_CREATED=0 → stash entry exists; STASH_CREATED=1 → nothing was stashed
Then in Phase 7, only call `git stash drop`/`git stash pop` when `STASH_CREATED` was `0`. When `STASH_CREATED` was `1`, the manifests were already clean and can simply be restored with `git checkout -- package.json package-lock.json` after a test failure.
There was a problem hiding this comment.
Fixed — Phase 0 now captures STASH_CREATED=$? after the stash push. Phase 7 branches on this: when STASH_CREATED=0 (stash was created), uses stash drop/pop as before; when STASH_CREATED=1 (no-op), uses git checkout -- package.json package-lock.json instead. The Rules section was updated to match.
.claude/skills/test-health/SKILL.md
Outdated
| mkdir -p /tmp/test-health-runs | ||
| for i in $(seq 1 $FLAKY_RUNS); do | ||
| npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err | ||
| done |
There was a problem hiding this comment.
Hardcoded
/tmp/test-health-runs/ path corrupts results under concurrent sessions
CLAUDE.md explicitly states that "Multiple Claude Code instances run concurrently in this repo." If two sessions invoke /test-health simultaneously, both write to the same /tmp/test-health-runs/run-$i.json files. Run files from session A get overwritten by session B mid-loop, and the cross-run comparison in Phase 1 silently mixes results from two independent executions — producing false flaky-test results.
Use an isolated, unique working directory per invocation:
RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX)
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err"
doneUsing mktemp -d guarantees each invocation gets its own directory, preventing cross-session pollution. Clean up with rm -rf "$RUN_DIR" at the end of Phase 1 (or Phase 5 after the report is written).
There was a problem hiding this comment.
Fixed — Replaced hardcoded /tmp/test-health-runs/ with RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX). All references now use $RUN_DIR. Added rm -rf "$RUN_DIR" cleanup after analysis is complete.
| ## Phase 4 — Verdict | ||
|
|
||
| Based on comparison results: | ||
|
|
||
| ### No regressions found | ||
| - Print: `BENCH-CHECK PASSED — no regressions beyond {THRESHOLD}% threshold` | ||
| - If not `COMPARE_ONLY`: update baseline with current results | ||
|
|
||
| ### Regressions found | ||
| - Print: `BENCH-CHECK FAILED — {N} regressions detected` | ||
| - List each regression with metric name, baseline value, current value, delta % | ||
| - Do NOT update the baseline | ||
| - Suggest investigation: | ||
| - `git log --oneline <baseline-ref>..HEAD` to find what changed | ||
| - `codegraph diff-impact <baseline-ref> -T` to find structural changes | ||
| - Re-run individual benchmarks to confirm (not flaky) | ||
|
|
||
| ### First run (no baseline) | ||
| - If `COMPARE_ONLY` is set: print a warning that no baseline exists and exit without saving | ||
| - Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline |
There was a problem hiding this comment.
Missing verdict path when
--save-baseline is passed and a baseline already exists
Phase 3 is correctly skipped when SAVE_ONLY=true, but Phase 4 only defines three verdict paths:
- "No regressions found" — only reachable after a comparison run
- "Regressions found" — only reachable after a comparison run
- "First run (no baseline)" — only applies when no baseline file was found
When --save-baseline is passed against a repo that already has baseline.json, none of these paths apply. The agent has no clear verdict to follow and may fall through to an undefined state (e.g. silently skipping Phase 5 and never saving the baseline, or incorrectly entering the "First run" path even though a baseline was present).
Add a fourth explicit path:
### Save-baseline run (`--save-baseline`)
- Print: `BENCH-CHECK — baseline updated (previous baseline overwritten)`
- Proceed directly to Phase 5 to save the current results as the new baselineThis ensures the skill has a deterministic branch for every combination of flags × baseline-file-present.
There was a problem hiding this comment.
Fixed — Added a 4th explicit verdict path: "Save-baseline with existing baseline (--save-baseline)" which prints BENCH-CHECK — baseline overwritten (previous: <old gitRef>, new: <new gitRef>) and saves current results as the new baseline.
.claude/skills/bench-check/SKILL.md
Outdated
| For each metric in the current run: | ||
|
|
||
| 1. Look up the same metric in the baseline | ||
| 2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement) |
There was a problem hiding this comment.
Corrupted em-dash character in "N/A" string
The string "N/A � baseline was zero" contains a corrupted em-dash (the UTF-8 replacement character U+FFFD). This garbled text will appear verbatim in the regression table written to BENCH_REPORT_*.md and in the baseline.json marker, making the report hard to read.
| 2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement) | |
| 2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A — baseline was zero"` and treat the metric as **informational only** (not a regression or improvement) |
There was a problem hiding this comment.
Fixed — Replaced the corrupted character with a proper em-dash. The string now reads "N/A — baseline was zero".
| ```bash | ||
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | ||
| git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)" | ||
| ``` |
There was a problem hiding this comment.
Commit command deviates from project's "specific file paths" convention
CLAUDE.md requires: "Commit with specific file paths: git commit <files> -m "msg"". The current Phase 5 does git add <files> followed by a bare git commit -m "...", which stages then commits everything in the index — potentially picking up unrelated staged changes from an ongoing session.
| ```bash | |
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | |
| git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)" | |
| ``` | |
| After writing both files, commit the baseline so it is a shared reference point: | |
| ```bash | |
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | |
| git diff --cached --quiet || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
There was a problem hiding this comment.
Fixed — Replaced git add + git commit -m with explicit file paths per project convention: git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "...".
When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists.
…ent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis.
…plicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention.
…ress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion.
.claude/skills/deps-audit/SKILL.md
Outdated
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) | ||
| If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists) |
There was a problem hiding this comment.
Success +
STASH_CREATED=1 incorrectly reverts npm changes
When STASH_CREATED=1, it means the files had no pre-existing modifications before the skill ran — the stash push was a no-op. The npm commands then modified package.json/package-lock.json, and tests passed. In this success case the npm changes should be kept, but step 3 runs git checkout -- package.json package-lock.json, which reverts the files to HEAD — silently undoing every fix made by npm audit fix and npm update. This makes --fix a complete no-op on a clean branch.
The git checkout here was correct for the failure path (step 5), not the success path. The success path for STASH_CREATED=1 should be a no-op:
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) | |
| If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists) | |
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) | |
| If tests pass and `STASH_CREATED` is `1`: nothing to do — npm changes are already on disk and there is no stash entry to clean up |
There was a problem hiding this comment.
Fixed — the success path (STASH_CREATED=1, tests pass) no longer runs git checkout to discard manifest changes. The npm audit fix changes are good when tests pass, so no action is needed. Only the failure path reverts.
.claude/skills/bench-check/SKILL.md
Outdated
|
|
||
| After writing both files, commit the baseline so it is a shared reference point: | ||
| ```bash | ||
| git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
There was a problem hiding this comment.
git diff --quiet misses new files — baseline never committed on first run
git diff [<paths>] (without --cached or a tree-ish) compares the working tree to the index. On the very first run, baseline.json and history.ndjson are freshly created and are untracked — they have no index entry at all. For untracked files, git diff produces no output and exits 0, so the || short-circuits and git commit is never reached. The baseline is written to disk but silently never committed to git, contradicting the rule "The baseline file is committed to git."
Use git diff HEAD to compare the working tree against the last commit, which correctly shows new/untracked files as additions:
| git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" | |
| git diff HEAD --quiet -- generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
This exits non-zero for both new files (first run) and modified files (subsequent runs), so the commit fires whenever there is genuinely new content.
There was a problem hiding this comment.
Fixed — replaced git diff --quiet with git add followed by git diff --cached --quiet. This ensures newly created files (first run) are staged and detected, so the baseline commit is not silently skipped.
| **If `DRY_RUN`:** List branches that would be deleted. | ||
|
|
||
| **Otherwise:** Delete merged branches: | ||
| ```bash | ||
| git branch -d <branch> # safe delete, only if fully merged | ||
| ``` | ||
|
|
||
| > **Never use `git branch -D`** (force delete). If `-d` fails, the branch has unmerged work — skip it. |
There was a problem hiding this comment.
Branch deletion silently skips user confirmation unlike worktree removal
Phase 1c (worktree removal) always requires user confirmation, even for clearly stale worktrees. Phase 4d ("Otherwise: Delete merged branches") runs git branch -d immediately with no equivalent confirmation step. While git branch -d is safe (refuses to delete unmerged work), users may still want to review the list of branches about to be pruned — especially in a repo where multiple concurrent sessions may have feature branches.
Consider adding a brief "List branches to be deleted and ask for confirmation" step before the git branch -d loop, consistent with Phase 1c's pattern.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — branch deletion now requires user confirmation before each delete, consistent with worktree removal in Phase 1c.
When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix.
.claude/skills/bench-check/SKILL.md
Outdated
|
|
||
| ## Phase 6 — Report | ||
|
|
||
| **Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed.** That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved. |
There was a problem hiding this comment.
ABORTED +
--compare-only + no baseline silently produces no report
When all of the following are true simultaneously:
COMPARE_ONLY=true- No baseline exists (first run)
- All benchmark suites fail or timeout (ABORTED)
The flow is:
- Phase 4 ABORTED pre-condition fires: "Stop here and skip to Phase 6"
- Phase 6 skip guard: "Skip this phase if
COMPARE_ONLYwas set and no baseline existed"
Because the ABORTED pre-condition is orthogonal to the baseline existence check, the Phase 6 skip guard fires on the ABORTED result too. The agent writes no report and prints no ABORTED message — the run silently completes with no output. This is particularly confusing in CI contexts where a broken benchmarking environment would go undetected.
The Phase 6 skip guard should be narrowed to exclude the ABORTED case:
| **Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed.** That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved. | |
| **Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed, AND the ABORTED pre-condition was not triggered.** That early-exit case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved. When ABORTED, always write the ABORTED report regardless of other flags. |
There was a problem hiding this comment.
Fixed — added explicit ABORTED skip guard to Phase 5, and narrowed the Phase 6 skip condition to exclude the ABORTED case. When ABORTED, Phase 5 is now skipped (no empty baseline written) and Phase 6 always writes the ABORTED report regardless of COMPARE_ONLY or baseline state.
| - If the pop applies cleanly: | ||
| a. Run `npm install` to re-sync `node_modules/` with the merged manifest. | ||
| b. Re-run `npm test` to confirm nothing broke with the merged dependency state. | ||
| c. If tests still pass: confirm the project is consistent. | ||
| d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. |
There was a problem hiding this comment.
No recovery path when tests fail after clean-pop +
npm install
When the stash pops cleanly and npm install re-syncs node_modules/, the skill re-runs npm test. If those tests fail (step d), the skill warns the user that pre-existing changes conflict with the audit fixes — but the stash has already been consumed by git stash pop. There is no way to return to either prior state:
- The stash entry is gone, so the pre-existing manifest state cannot be automatically restored.
npm audit fix/npm updatechanges are already merged with the pre-existing changes in the working tree.
Without a recovery path, the user is left with a mixed, broken state and must manually reconstruct which changes to keep. Add explicit recovery guidance:
d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
Recovery options:
- To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
- To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
- To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix`There was a problem hiding this comment.
Fixed — step 169d now lists three explicit recovery options: undo all changes (git checkout + npm ci), keep only audit fixes (manual edit + npm ci), or keep only pre-existing changes (re-run without --fix).
…Script (Phase 5.4) (#579) * feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches * fix(bench-check): capture stderr, guard division-by-zero, commit baseline - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule * fix(deps-audit): run npm ci after revert, document tokenizer skip reason - After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code * fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>` * fix: address Round 3 Greptile review feedback * fix: move deps-audit stash to Phase 0, before npm commands modify manifests * fix: capture flaky-detection loop output to per-run files for comparison * fix: always require confirmation for stale worktree removal * fix: use parsed threshold in baseline.json, guard --compare-only on first run * fix(deps-audit): track stash creation to avoid operating on wrong entry When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists. * fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis. * fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention. * docs(roadmap): update Phase 5 TypeScript migration with accurate progress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion. * fix: deps-audit success path should keep npm changes, not revert (#565) When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix. * fix: bench-check use git add + diff --cached to detect new files (#565) git diff --quiet ignores untracked files, so on the first run when baseline.json and history.ndjson are newly created, the commit was skipped. Stage first with git add, then check with --cached. * fix: housekeep require confirmation before branch deletion (#565) Branch deletion now asks for user confirmation before each delete, consistent with worktree removal in Phase 1c. * fix: scope git diff --cached to bench-check files only (#565) * fix: use json-summary reporter to match coverage-summary.json output (#565) * fix: capture stash ref by name to avoid position-based targeting (#565) * fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565) * fix: use dynamic threshold variable in bench-check Phase 6 report template (#565) * fix: address open review items in maintenance skills (#565) - bench-check: add timeout 300 wrappers to all 4 benchmark invocations with exit code 124 check for timeout detection - bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry - housekeep: fix grep portability — use grep -cE instead of GNU \| syntax - test-health: add timeout 180 wrapper in flaky detection loop - test-health: fix find command -o precedence with grouping parentheses * fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health * fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs * fix: add empty-string guard for stat size check in housekeep (#565) When both stat variants (GNU and BSD) fail, $size is empty and the arithmetic comparison errors out. Add a [ -z "$size" ] && continue guard so the loop skips files whose size cannot be determined. * fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565) Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a shortened report with "Verdict: BASELINE SAVED" instead of the full comparison report. Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with explicit "If timeout / Else if non-zero" so the two conditions are clearly mutually exclusive. * docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7) Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5 merged via PRs #553, #554, #555, #566 but was listed with stale counts. Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7 with 76 of 283 modules migrated (~27%). * docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress Previous commit incorrectly marked 5.3-5.5 as complete. In reality 76 of 283 src files are .ts (~27%) while 207 remain .js (~73%). PRs #553, #554, #555, #566 migrated a first wave but left substantial work in each step: 4 leaf files, 39 core files, 159 orchestration files. Updated each step with accurate migrated/remaining counts. * fix(skill): ban untracked deferrals in /review skill The /review skill allowed replying "acknowledged as follow-up" to reviewer comments without tracking them anywhere. These deferrals get lost — nobody revisits PR comment threads after merge. Now: if a fix is genuinely out of scope, the skill must create a GitHub issue with the follow-up label before replying. The reply must include the issue link. A matching rule in the Rules section reinforces the ban. * feat(types): migrate db, graph algorithms/builders, and domain/queries to TypeScript (Phase 5.5) Migrate 19 remaining JS files to TypeScript across db/, graph/, and domain/: - db/: connection, migrations, query-builder, index barrel - graph/algorithms/leiden/: adapter, cpm, modularity, optimiser, partition, index - graph/algorithms/: louvain, index barrel - graph/builders/: dependency, structure, temporal, index barrel - graph/classifiers/: index barrel - graph/: index barrel - domain/: queries barrel Key type additions: - GraphAdapter, Partition, DetectClustersResult interfaces for Leiden - LockedDatabase type for advisory-locked DB instances - DependencyGraphOptions, TemporalGraphOptions for graph builders - Generic Statement<TRow> in vendor.d.ts for type-safe DB queries Also fixes pre-existing type errors in module-map.ts (untyped prepare calls) and generator.ts (null vs undefined argument). * feat(types): migrate builder stages, search, and graph domain to TypeScript (Phase 5.4) Migrate 24 JS files to strict TypeScript across three domains: - domain/graph/builder/stages/ (9 files): all build pipeline stages - domain/graph/builder/incremental.ts + cycles.ts + journal.ts + change-journal.ts - domain/search/search/ (6 files): hybrid, semantic, keyword, filters, prepare, cli-formatter - domain/search/stores/ (2 files): fts5, sqlite-blob - domain/search/strategies/ (3 files): source, structured, text-utils Key type fixes: ParseChange stat type alignment (mtimeMs→mtime conversion), IncrementalStmts duck-typed interfaces for watcher compatibility, NodeWithId relaxed to Pick<NodeRow> for partial node queries, ESCAPE clause backslash fix. Passes tsc --noEmit with zero errors. No test regressions. * chore: delete stale .js files replaced by TypeScript counterparts Remove cycles.js and sqlite-blob.js — their .ts replacements were committed in the previous commit. * fix(types): annotate implicit any parameters in resolve-imports barrel handling The `parseFilesAuto` return type is `Map<string, any>`, causing `imp` parameters in `.filter()` and `.map()` callbacks to have implicit `any` type — failing strict TypeScript checks in CI.
.claude/skills/bench-check/SKILL.md
Outdated
| @@ -227,6 +237,22 @@ Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`. | |||
| <!-- Full JSON output from each benchmark --> | |||
| ``` | |||
|
|
|||
| **If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report: | |||
|
|
|||
| ```markdown | |||
| # Benchmark Report — <date> | |||
|
|
|||
| **Version:** X.Y.Z | **Git ref:** abc1234 | **Threshold:** $THRESHOLD% | |||
|
|
|||
| ## Verdict: ABORTED — no valid benchmark results | |||
|
|
|||
| All benchmark suites failed or timed out. See Phase 1 error records for details. | |||
|
|
|||
| ## Raw Results | |||
|
|
|||
| <!-- Error/timeout records from each suite --> | |||
| ``` | |||
There was a problem hiding this comment.
ABORTED branch checked after SAVE_ONLY — misleading "BASELINE SAVED" report
Phase 6 checks conditions in order. The SAVE_ONLY / first-run branch fires before the ABORTED branch, so when both conditions are true the wrong report is emitted:
Scenario A — --save-baseline + all benchmarks fail:
SAVE_ONLY=true,metrics={}, ABORTED fired- Phase 5 skipped (ABORTED guard — no baseline written)
- Phase 6:
SAVE_ONLY is set→ true → writes "Verdict: BASELINE SAVED — no comparison performed" - But no baseline was actually saved.
Scenario B — First run (no baseline) + all benchmarks fail:
- No baseline exists,
metrics={}, ABORTED fired - Phase 5 skipped (ABORTED guard)
- Phase 6:
no prior baseline existed (first run)→ true → writes "Verdict: BASELINE SAVED" - Again, no baseline was saved.
The ABORTED check must come before the SAVE_ONLY / first-run check so it can short-circuit:
**If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report:
...
**If `SAVE_ONLY` is set or no prior baseline existed (first run):** write a shortened report — ...
**Otherwise (comparison was performed):** write the full report ...There was a problem hiding this comment.
Fixed — moved the ABORTED condition check before SAVE_ONLY/first-run in Phase 6, so when all benchmarks fail on a --save-baseline or first run, the correct ABORTED report is written instead of a misleading "BASELINE SAVED" report.
There was a problem hiding this comment.
Already fixed in prior commit 7aab540 — Phase 6 already checks ABORTED before SAVE_ONLY/first-run, preventing the misleading BASELINE SAVED report when all benchmarks fail.
.claude/skills/housekeep/SKILL.md
Outdated
| if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then | ||
| if [ "$DRY_RUN" = "true" ]; then | ||
| echo "[DRY RUN] Would remove stale lock: $f" | ||
| else | ||
| echo "Removing stale lock: $f" | ||
| rm "$f" |
There was a problem hiding this comment.
lsof unavailable treated as "no process holds the file"
When lsof is not installed (common in minimal Docker/CI containers), lsof "$f" exits with code 127 ("command not found"). The condition ! lsof "$f" > /dev/null 2>&1 becomes true, so the stale lock file is silently deleted — even though we have no information about whether a process holds it. This is the opposite of the intended "require user confirmation when check is unavailable" behaviour stated in the prose above the snippet.
The elif branch ("file is held — ask user") is only reached when lsof exits 0 (found the file). It never fires for exit code 127.
Add an explicit lsof availability check before the condition:
for f in .codegraph/*.lock; do
[ -f "$f" ] || continue
age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
[ -z "$age" ] && continue
if [ "$age" -gt 3600 ]; then
if ! command -v lsof > /dev/null 2>&1; then
echo "Lock file $f is old but cannot verify if held (lsof unavailable) — ask user before removing"
elif ! lsof "$f" > /dev/null 2>&1; then
if [ "$DRY_RUN" = "true" ]; then
echo "[DRY RUN] Would remove stale lock: $f"
else
echo "Removing stale lock: $f"
rm "$f"
fi
else
echo "Lock file $f is old but still held by a process — ask user before removing"
fi
fi
doneThis correctly separates three cases: lsof not installed → ask user; lsof finds no holder → safe to delete; lsof finds a holder → ask user.
There was a problem hiding this comment.
Fixed in 933c0d3 — the lock file removal now explicitly checks command -v lsof before attempting to use it. If lsof is not installed (exit 127 in Docker/CI), lock file removal is skipped entirely with a warning, rather than assuming no process holds the file.
| 4. If tests fail and `STASH_REF` is non-empty: | ||
| - Restore the saved manifests: `git stash pop $STASH_REF` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed |
There was a problem hiding this comment.
Failure-path
git stash pop applied to npm-modified manifests will conflict
When tests fail and STASH_REF is non-empty, the stash was created from:
working tree = HEAD + pre-existing changes
By this point in step 4, the working tree contains:
HEAD + npm's modifications (audit fix / update)
git stash pop $STASH_REF applies the stash as a patch on top of the current state — it does not restore the working tree to a previous snapshot. Since the stash patch (pre-existing changes) and the current state (npm modifications) both modify package.json/package-lock.json from the same base (HEAD), git will almost certainly report conflicts, leaving the manifests with conflict markers and node_modules/ in an undefined state.
The correct two-step restore is:
- First, reset the manifest files to HEAD (undoing npm's changes):
git checkout HEAD -- package.json package-lock.json - Then re-apply the pre-existing changes cleanly from HEAD:
git stash pop $STASH_REF
At step 2 the working tree matches HEAD, so the stash applies exactly as it was originally created — no conflicts.
Suggested replacement for step 4:
4. If tests fail and `STASH_REF` is non-empty:
- Reset manifests to HEAD first (undoes npm changes):
`git checkout HEAD -- package.json package-lock.json`
- Then re-apply the pre-existing changes cleanly:
`git stash pop $STASH_REF`
- Restore `node_modules/` to match the reverted lock file: `npm ci`
- Report what failedNote that the success path (step 3) intentionally does a merge (pop on the npm-modified state) to preserve both sets of changes. The failure path semantics are different — we want a full restore — so the merge approach is wrong here.
There was a problem hiding this comment.
Fixed — the failure path now resets manifests to HEAD first (git checkout HEAD -- package.json package-lock.json) before popping the stash. This ensures the stash applies cleanly against the same base it was created from, avoiding conflicts. Also updated the Rules section to match.
There was a problem hiding this comment.
Fixed in prior commit 0b08a2b — the failure path already resets manifests to HEAD first (git checkout HEAD -- package.json package-lock.json) before popping the stash, preventing conflicts between npm-modified and stashed versions.
…565) When tests fail after npm audit fix, the working tree has npm's modifications. Popping the stash directly would conflict because both sides modified package.json from the same base. Reset to HEAD first so the stash applies cleanly.
.claude/skills/deps-audit/SKILL.md
Outdated
| d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. | ||
| Recovery options: | ||
| - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci` | ||
| - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci` | ||
| - To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix` |
There was a problem hiding this comment.
Recovery option 3 cannot achieve its stated goal — stash already consumed
At the point where step 3d fires (clean pop succeeded → npm install ran → npm test failed), the stash entry for deps-audit-backup was already consumed by git stash pop $STASH_REF in step 3. The pre-existing changes are now merged into the working tree alongside the npm audit fixes.
Recovery option 3 reads:
"To keep only the pre-existing changes and discard the audit fixes: re-run
/deps-auditwithout--fix"
This advice is incorrect. Re-running the skill without --fix audits the current (mixed) manifest — it performs no rollback. The npm fixes remain in the working tree untouched. The user is left with the same mixed state, potentially misled into thinking re-running the skill undid the audit changes.
The correct recovery for "keep only pre-existing changes" at this point is manual, since the stash was consumed:
- To keep only the pre-existing changes and discard the audit fixes:
1. Run `git diff HEAD -- package.json package-lock.json` to identify the full set of changes
(this shows npm fixes AND pre-existing changes merged together)
2. Manually edit `package.json`/`package-lock.json` to remove the audit-fix deltas, keeping only your original changes
3. Run `npm ci` to restore `node_modules/` to match the edited lock file
— Note: the pre-existing stash entry was consumed by the `git stash pop` above; it cannot be re-appliedAlternatively, the cleanest approach is:
- To start fresh (discard everything — both audit fixes and pre-existing changes):
`git checkout HEAD -- package.json package-lock.json && npm ci`
Then re-apply your original changes manually (the stash entry is gone).The current option 3 text should be corrected or removed to avoid misdirecting the user.
There was a problem hiding this comment.
Fixed in 005f806 — replaced the unachievable "re-run /deps-audit without --fix" with accurate manual recovery steps: \Unknown command: ci"
…#565) When lsof is not installed (Docker/CI), it exits 127 and the lock file would be deleted assuming no process holds it. Now explicitly checks command -v lsof first and skips lock file removal entirely if unavailable.
Recovery option 3 said "re-run /deps-audit without --fix" but at that point the stash is already consumed, so re-running doesn't undo npm changes. Replaced with accurate manual recovery steps: revert manifests to HEAD with git checkout and npm ci, then re-apply pre-existing changes.
#565) When a run produces {"timeout":true} or {"error":true}, it has no per-test data. If counted as "all tests failed", every stable test looks flaky. Now skips invalid runs and requires minimum 2 valid runs for flaky detection to be conclusive.
|
Addressed latest Greptile feedback:
|
.claude/skills/housekeep/SKILL.md
Outdated
| ```bash | ||
| for f in .codegraph/*.lock; do | ||
| [ -f "$f" ] || continue | ||
| age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) )) | ||
| [ -z "$age" ] && continue | ||
| if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then | ||
| if [ "$DRY_RUN" = "true" ]; then | ||
| echo "[DRY RUN] Would remove stale lock: $f" | ||
| else | ||
| echo "Removing stale lock: $f" | ||
| rm "$f" | ||
| fi | ||
| elif [ "$age" -gt 3600 ]; then | ||
| echo "Lock file $f is old but still held by a process — ask user before removing" | ||
| fi | ||
| done |
There was a problem hiding this comment.
command -v lsof guard described in prose but absent from code snippet
The prose above the snippet correctly states: "Before removing, first check if lsof is available (command -v lsof). If lsof is not installed … skip lock file removal entirely."
However, the code snippet itself has no such guard. The if condition on line 93 is:
if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; thenWhen lsof is absent (exits 127 — command not found), ! lsof "$f" evaluates to true (because 127 is non-zero). The entire if branch fires and rm "$f" executes — exactly the unsafe behavior the prose was meant to prevent. An agent following the code block literally, rather than the preceding prose, will silently delete stale lock files in Docker/CI containers without any process-ownership check.
The command -v lsof guard needs to be embedded in the code snippet itself so both paths are consistently enforced:
if ! command -v lsof > /dev/null 2>&1; then
echo "lsof not available — skipping lock file cleanup (cannot verify no process holds the file)"
else
for f in .codegraph/*.lock; do
[ -f "$f" ] || continue
age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
[ -z "$age" ] && continue
if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
if [ "$DRY_RUN" = "true" ]; then
echo "[DRY RUN] Would remove stale lock: $f"
else
echo "Removing stale lock: $f"
rm "$f"
fi
elif [ "$age" -gt 3600 ]; then
echo "Lock file $f is old but still held by a process — ask user before removing"
fi
done
fiThere was a problem hiding this comment.
Already fixed in prior commit — the code snippet at lines 89-107 already wraps the entire lock file loop inside if ! command -v lsof > /dev/null 2>&1, matching the suggested fix exactly. The prose and code are now consistent.
|
Fixed the housekeep SKILL.md lock file removal code snippet — embedded the |
* feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches * fix(bench-check): capture stderr, guard division-by-zero, commit baseline - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule * fix(deps-audit): run npm ci after revert, document tokenizer skip reason - After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code * fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>` * fix: address Round 3 Greptile review feedback * fix: move deps-audit stash to Phase 0, before npm commands modify manifests * fix: capture flaky-detection loop output to per-run files for comparison * fix: always require confirmation for stale worktree removal * fix: use parsed threshold in baseline.json, guard --compare-only on first run * fix(deps-audit): track stash creation to avoid operating on wrong entry When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists. * fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis. * fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention. * docs(roadmap): update Phase 5 TypeScript migration with accurate progress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion. * fix: deps-audit success path should keep npm changes, not revert (#565) When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix. * fix: bench-check use git add + diff --cached to detect new files (#565) git diff --quiet ignores untracked files, so on the first run when baseline.json and history.ndjson are newly created, the commit was skipped. Stage first with git add, then check with --cached. * fix: housekeep require confirmation before branch deletion (#565) Branch deletion now asks for user confirmation before each delete, consistent with worktree removal in Phase 1c. * fix: scope git diff --cached to bench-check files only (#565) * fix: use json-summary reporter to match coverage-summary.json output (#565) * fix: capture stash ref by name to avoid position-based targeting (#565) * fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565) * fix: use dynamic threshold variable in bench-check Phase 6 report template (#565) * fix: address open review items in maintenance skills (#565) - bench-check: add timeout 300 wrappers to all 4 benchmark invocations with exit code 124 check for timeout detection - bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry - housekeep: fix grep portability — use grep -cE instead of GNU \| syntax - test-health: add timeout 180 wrapper in flaky detection loop - test-health: fix find command -o precedence with grouping parentheses * fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health * fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs * fix: add empty-string guard for stat size check in housekeep (#565) When both stat variants (GNU and BSD) fail, $size is empty and the arithmetic comparison errors out. Add a [ -z "$size" ] && continue guard so the loop skips files whose size cannot be determined. * fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565) Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a shortened report with "Verdict: BASELINE SAVED" instead of the full comparison report. Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with explicit "If timeout / Else if non-zero" so the two conditions are clearly mutually exclusive. * docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7) Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5 merged via PRs #553, #554, #555, #566 but was listed with stale counts. Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7 with 76 of 283 modules migrated (~27%). * docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress Previous commit incorrectly marked 5.3-5.5 as complete. In reality 76 of 283 src files are .ts (~27%) while 207 remain .js (~73%). PRs #553, #554, #555, #566 migrated a first wave but left substantial work in each step: 4 leaf files, 39 core files, 159 orchestration files. Updated each step with accurate migrated/remaining counts. * fix(skill): ban untracked deferrals in /review skill The /review skill allowed replying "acknowledged as follow-up" to reviewer comments without tracking them anywhere. These deferrals get lost — nobody revisits PR comment threads after merge. Now: if a fix is genuinely out of scope, the skill must create a GitHub issue with the follow-up label before replying. The reply must include the issue link. A matching rule in the Rules section reinforces the ban. * fix(skill): add --repo flag, multi-endpoint reply for deferrals, and scope guidance (#568) * fix(skill): guard follow-up label creation before gh issue create (#568) gh issue create --label "follow-up" fails if the label doesn't exist in the repo. Add a gh label create guard step that is safe to re-run. * feat(skill): parallelize /review with one subagent per PR * fix: correct heredoc terminator indentation in review skill (#568) * fix(skill): capture gh issue create output before referencing issue number gh issue create prints the new issue URL to stdout — capture it and extract the number so reply templates can reference it unambiguously. * fix(skill): surface follow-up issues in review result format and summary table Add "Issues Created" field to the subagent result format and an "Issues" column to the Step 3 summary table, so deferred out-of-scope items are visible in the final report. * fix(skill): require verbatim rule propagation to subagents * fix(skill): align Issues field name between 2i result and Step 3 table (#568)
.claude/skills/deps-audit/SKILL.md
Outdated
| 3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results. | ||
| - If the pop applies cleanly: | ||
| a. Run `npm install` to re-sync `node_modules/` with the merged manifest. | ||
| b. Re-run `npm test` to confirm nothing broke with the merged dependency state. | ||
| c. If tests still pass: confirm the project is consistent. | ||
| d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. | ||
| Recovery options: | ||
| - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci` | ||
| - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci` | ||
| - To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes | ||
| - If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing." | ||
| - For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes. | ||
| If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up |
There was a problem hiding this comment.
Initial
npm test result may be stale after clean pop + npm install
Step 2 runs npm test before the stash pop. Step 3 then says "If tests pass and STASH_REF is non-empty" — but the test run that passed in step 2 was against the npm-audited manifest WITHOUT the pre-existing stash changes applied. After the clean pop and npm install in step 3a, node_modules/ may differ from what step 2's tests used. The skill correctly re-runs npm test in step 3b to catch this, which is good.
However, the prose in step 3 is ordered as: "If tests pass … pop and merge … npm install … re-run npm test." An agent could interpret this as: "tests already passed, so pop and install, then re-test as a formality." The intent should be clearer that the step-2 test result is only a prerequisite for attempting the pop — the step-3b re-test is the authoritative result that determines whether the final state is consistent.
Consider making this explicit:
| 3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results. | |
| - If the pop applies cleanly: | |
| a. Run `npm install` to re-sync `node_modules/` with the merged manifest. | |
| b. Re-run `npm test` to confirm nothing broke with the merged dependency state. | |
| c. If tests still pass: confirm the project is consistent. | |
| d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. | |
| Recovery options: | |
| - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci` | |
| - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci` | |
| - To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes | |
| - If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing." | |
| - For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes. | |
| If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up | |
| 3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results. Note: the step 2 test run validated the npm changes alone; step 3b is the authoritative test of the final merged state. | |
| - If the pop applies cleanly: | |
| a. Run `npm install` to re-sync `node_modules/` with the merged manifest. | |
| b. Re-run `npm test` to confirm the merged state is consistent (this is the authoritative check). | |
| c. If tests still pass: confirm the project is consistent. | |
| d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. |
There was a problem hiding this comment.
Fixed — added explicit note that the step 2 test run validates npm changes alone, and that step 3b is the authoritative test of the final merged state. The step 3b description now says '(this is the authoritative check — step 2 only validated the npm changes in isolation)'.
Summary
/deps-audit— Audit dependencies for vulnerabilities (npm audit), staleness (npm outdated), unused packages, and license compliance. Produces a scored health report. Optional--fixfor safe auto-updates./bench-check— Run benchmark suite against a saved baseline, detect regressions beyond a configurable threshold, maintain a history log for trend tracking. Guards against silent performance degradation./test-health— Detect flaky tests (multi-run), dead/trivial tests (no assertions), coverage gaps on core modules, and structural issues (oversized files, missing cleanup). Read-only audit with prioritized fix suggestions./housekeep— Local repo spring cleaning: prune stale worktrees, delete temp/dirt files, sync with main, update codegraph, prune merged branches, verify graph and node_modules integrity. Supports--dry-run.Test plan
/deps-auditand verify report is generated ingenerated/deps-audit//deps-audit --fixand verify safe updates are applied, tests pass/bench-check --save-baselineto create initial baseline/bench-checkagain to verify comparison works/test-health --quickand verify report ingenerated/test-health//test-health(full) and verify flaky detection runs N times/housekeep --dry-runand verify no modifications are made/housekeepand verify cleanup actions