Skip to content

feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep#565

Merged
carlos-alm merged 66 commits intomainfrom
feat/maintenance-skills
Mar 24, 2026
Merged

feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep#565
carlos-alm merged 66 commits intomainfrom
feat/maintenance-skills

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • /deps-audit — Audit dependencies for vulnerabilities (npm audit), staleness (npm outdated), unused packages, and license compliance. Produces a scored health report. Optional --fix for safe auto-updates.
  • /bench-check — Run benchmark suite against a saved baseline, detect regressions beyond a configurable threshold, maintain a history log for trend tracking. Guards against silent performance degradation.
  • /test-health — Detect flaky tests (multi-run), dead/trivial tests (no assertions), coverage gaps on core modules, and structural issues (oversized files, missing cleanup). Read-only audit with prioritized fix suggestions.
  • /housekeep — Local repo spring cleaning: prune stale worktrees, delete temp/dirt files, sync with main, update codegraph, prune merged branches, verify graph and node_modules integrity. Supports --dry-run.

Test plan

  • Run /deps-audit and verify report is generated in generated/deps-audit/
  • Run /deps-audit --fix and verify safe updates are applied, tests pass
  • Run /bench-check --save-baseline to create initial baseline
  • Run /bench-check again to verify comparison works
  • Run /test-health --quick and verify report in generated/test-health/
  • Run /test-health (full) and verify flaky detection runs N times
  • Run /housekeep --dry-run and verify no modifications are made
  • Run /housekeep and verify cleanup actions

…housekeep

Four recurring maintenance routines as Claude Code skills:
- /deps-audit: vulnerability scanning, staleness, unused deps, license checks
- /bench-check: benchmark regression detection against saved baselines
- /test-health: flaky test detection, dead tests, coverage gap analysis
- /housekeep: clean worktrees, dirt files, sync main, prune branches
@claude
Copy link
Copy Markdown

claude bot commented Mar 21, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Maintenance Skills

Reviewing this PR that adds 4 new maintenance skills for Claude Code.

Todo List:

  • Read and analyze all 4 skill files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Code quality assessment
  • Phase 5: Backlog compliance evaluation
  • Provide final structured assessment

Root Cause Analysis

Problem Being Solved: This PR addresses maintenance debt and technical hygiene gaps that accumulate in active development repositories. The skills automate routine maintenance tasks that are typically done manually or inconsistently.

Why This Approach: These are documentation-only additions that define Claude Code skills (not executable code), following the established .claude/skills/ pattern. Each skill is comprehensive and well-structured with clear phases and safety rules.

Risk Assessment: Low risk - these are pure documentation files that define maintenance workflows. No code changes, no dependencies added.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies; builds on existing npm/node/git/codegraph infrastructure
  • Foundation-aligned: ✓ — Aligns with P8 (optimize for planning/developing/refactoring); maintenance automation supports core development workflow
  • Problem-fit: 2 — Useful maintenance automation but doesn't directly address the "lost AI in large codebases" core problem
  • Breaking: No — Purely additive documentation files
  • Tier: 1

Critical Concerns

None identified. The skills are:

  • Well-structured with clear phases and safety rules
  • Infrastructure-aware (all referenced scripts exist)
  • Safety-focused (dry-run modes, rollback procedures, test verification)
  • Documentation-only (no executable code changes)

Quality Assessment

Excellent Design Patterns:

  • Phase-based structure makes complex workflows manageable
  • Safety rules prevent destructive operations (--dry-run, test verification)
  • Comprehensive coverage (deps, perf, tests, cleanup)
  • Rollback procedures for failed operations
  • Configurable thresholds for different project needs

Minor Observations:

  • /deps-audit references some packages that may not apply to all projects (acceptable - skill provides guidance)
  • /bench-check assumes specific benchmark script structure (verified to exist in this repo)
  • All skills generate reports in generated/ directory (good pattern)

Technical Validation:

  • ✅ All referenced scripts exist (scripts/benchmark*.js)
  • ✅ Uses existing dev dependencies (@vitest/coverage-v8, better-sqlite3)
  • ✅ No new runtime dependencies required
  • ✅ Follows existing skill format and patterns

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality maintenance automation that follows established patterns, includes proper safety measures, and provides valuable workflow automation for development teams. While not directly addressing the core "lost AI" problem (hence Tier 1), these skills will improve development velocity and code quality.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR introduces four maintenance skills (/bench-check, /deps-audit, /test-health, /housekeep) as agent instruction files, alongside a roadmap update marking TypeScript migration step 5.3 complete. The skills have been through extensive prior review iterations that addressed ~25 issues; this round finds the previous fixes well-applied with two remaining concerns.

Key changes in this diff:

  • bench-check: Adds the ABORTED pre-condition (empty metrics guard), explicit Phase 5/6 skip guards for ABORTED, narrows the Phase 6 skip condition so ABORTED always produces a report, updates Phase 7 summary line, and corrects the stale "always commits" rule.
  • deps-audit: Replaces STASH_CREATED=$?-based branching with STASH_REF-presence checks (correctly accounting for git 2.16+ returning 0 even when nothing is stashed), expands clean-pop success path with npm install + re-test + recovery options, and fixes the failure-path to reset to HEAD before popping.
  • housekeep: Adds dual dirt-file discovery (gitignored vs. untracked), lsof-guarded lock-file removal with DRY_RUN awareness, [ -d "$f" ] && continue guard for directory entries in the large-file scan, and git pull --no-rebase.
  • test-health: Captures exit code, uses jq for safe JSON encoding of stderr, excludes invalid runs from flaky analysis with a minimum-valid-runs check, and switches to origin/main for the coverage diff.

Issues found:

  • housekeep Phase 1c: git worktree prune runs unconditionally even when --dry-run is active, violating the "DRY_RUN is sacred" rule. Use git worktree prune --dry-run in dry-run mode.
  • bench-check Phase 4: The "No regressions found" verdict path appears before "First run" and "Save-baseline" paths and is vacuously true whenever Phase 3 was skipped (SAVE_ONLY or first-run). This can cause an agent to emit a misleading BENCH-CHECK PASSED message (though the underlying baseline-save action would still be correct). Adding an explicit applicability note to that path removes the ambiguity.

Confidence Score: 3/5

  • Safe to merge after addressing the two flagged issues; both are correctness concerns in agent-facing instruction documents rather than executable code.
  • The prior 25-issue review cycle has been thoroughly addressed. Two new issues remain: one is a clear DRY_RUN violation in housekeep (P1 — a running /housekeep --dry-run would still prune worktree refs), the other is a verdict-message ambiguity in bench-check (P1 — wrong message but correct action). Neither causes data loss or security issues, but both are observable behavioral deviations from the documented contracts.
  • .claude/skills/housekeep/SKILL.md (Phase 1c DRY_RUN guard) and .claude/skills/bench-check/SKILL.md (Phase 4 verdict ordering)

Important Files Changed

Filename Overview
.claude/skills/bench-check/SKILL.md Addresses many previous review issues (ABORTED pre-condition, Phase 5/6 skip guards, COMPARE_ONLY early exit, Phase 7 summary line); one remaining issue — "No regressions found" path is ambiguous when Phase 3 was skipped (SAVE_ONLY or first-run), potentially causing a misleading "PASSED" verdict message.
.claude/skills/deps-audit/SKILL.md Extensive Phase 7 rework: STASH_REF-based branching replaces fragile exit-code checks, clean-pop success path now does npm install + re-test with recovery options, failure path resets to HEAD before popping stash. No new issues found.
.claude/skills/housekeep/SKILL.md Good fixes for dirt-file discovery (dual category approach), lock-file lsof guard, git pull --no-rebase, and DRY_RUN for lock files; however, git worktree prune in Phase 1c still runs unconditionally without a DRY_RUN guard, violating the "DRY_RUN is sacred" rule.
.claude/skills/test-health/SKILL.md Flaky detection loop now captures exit codes, uses jq for safe JSON encoding, applies mktemp isolation, excludes invalid runs from analysis, and uses origin/main for coverage diff; no new issues found.
docs/roadmap/ROADMAP.md Bookkeeping update marking step 5.3 as complete, updating remaining counts for 5.4, and noting step 5.3 completion status; no issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    START([skill invoked]) --> P0[Phase 0 — Pre-flight\nparse args, check repo]

    P0 --> BC_P1[bench-check\nPhase 1: run benchmarks\ntimeout 300 each]
    BC_P1 --> BC_ABORT{metrics empty?}
    BC_ABORT -- yes --> BC_P6A[Phase 6: ABORTED report]
    BC_ABORT -- no --> BC_P3{SAVE_ONLY\nor no baseline?}
    BC_P3 -- yes --> BC_P4S[Phase 4: First-run\nor Save-baseline verdict]
    BC_P3 -- no --> BC_P3C[Phase 3: compare\ncompute delta_pct]
    BC_P3C --> BC_P4V{regressions?}
    BC_P4V -- yes --> BC_P6F[Phase 6: FAILED report\nno baseline update]
    BC_P4V -- no --> BC_P5{COMPARE_ONLY?}
    BC_P5 -- no --> BC_P5S[Phase 5: save baseline\ngit commit files]
    BC_P5 -- yes --> BC_P6P[Phase 6: PASSED report\nno baseline update]
    BC_P5S --> BC_P6P
    BC_P4S --> BC_P6B[Phase 6: BASELINE SAVED report]
    BC_P6A --> BC_P7[Phase 7: print summary]
    BC_P6F --> BC_P7
    BC_P6P --> BC_P7
    BC_P6B --> BC_P7

    P0 --> DA_P1[deps-audit\nPhase 0: stash if --fix\ncapture STASH_REF by name]
    DA_P1 --> DA_P2[Phases 1–5: audit\nsecurity/outdated/unused\nlicense/duplicates]
    DA_P2 --> DA_P6[Phase 6: report]
    DA_P6 --> DA_P7{AUTO_FIX?}
    DA_P7 -- no --> DA_END([done])
    DA_P7 -- yes --> DA_TEST[npm test]
    DA_TEST --> DA_PASS{pass?}
    DA_PASS -- yes + STASH_REF non-empty --> DA_POP[git stash pop\nnpm install\nre-run npm test]
    DA_PASS -- yes + STASH_REF empty --> DA_END
    DA_POP --> DA_END
    DA_PASS -- no + STASH_REF non-empty --> DA_RESTORE[git checkout HEAD\ngit stash pop\nnpm ci]
    DA_PASS -- no + STASH_REF empty --> DA_REVERT[git checkout\nnpm ci]
    DA_RESTORE --> DA_END
    DA_REVERT --> DA_END

    P0 --> TH_P1[test-health\nPhase 1: mktemp RUN_DIR\nrun FLAKY_RUNS × vitest\ntimeout 180 each]
    TH_P1 --> TH_P1A[exclude invalid runs\nmin 2 valid runs\nfor flaky detection]
    TH_P1A --> TH_P2[Phase 2: dead tests\nPhase 3: coverage json-summary\nPhase 4: structure]
    TH_P2 --> TH_P5[Phase 5: report\nrm -rf RUN_DIR]

    P0 --> HK_P1[housekeep\nPhase 1: worktree prune\n+ confirm stale removal]
    HK_P1 --> HK_P2[Phase 2: dirt files\ndual discovery\nlsof lock guard]
    HK_P2 --> HK_P3[Phase 3: git fetch\ngit pull --no-rebase]
    HK_P3 --> HK_P4[Phase 4: prune merged branches\nconfirm each]
    HK_P4 --> HK_P5[Phase 5: source-repo guard\nalways skipped here]
    HK_P5 --> HK_P6[Phase 6: health checks\ncoverage / graph / git fsck]
    HK_P6 --> HK_P7[Phase 7: console report]
Loading

Comments Outside Diff (2)

  1. .claude/skills/housekeep/SKILL.md, line 50-53 (link)

    P1 git worktree prune runs unconditionally in --dry-run mode

    Phase 1c runs git worktree prune without a DRY_RUN guard, even though the Rules section explicitly states "--dry-run is sacred — it must NEVER modify anything, only report." The prose note "If DRY_RUN: Just list what would be removed, don't do it" appears after the prune code block and — structurally — applies only to the "stale worktrees with merged branches" section, leaving git worktree prune unconditionally executed.

    git worktree prune modifies git's internal administrative state (removes stale worktree refs under .git/worktrees/). Running it in --dry-run mode violates the invariant. git worktree prune --dry-run exists precisely for this case.

    bash

    In dry-run mode, report what would be pruned without modifying anything

    if [ "$DRY_RUN" = "true" ]; then
    git worktree prune --dry-run
    else
    git worktree prune
    fi

    
    **Context Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=7111ef63-eefd-42cc-82a1-a1c617a968ee))
    
    
  2. .claude/skills/bench-check/SKILL.md, line 165-168 (link)

    P1 First-run / SAVE_ONLY verdict path ordering ambiguity

    Phase 4's verdict paths are checked in textual order: "No regressions found" comes before "First run (no baseline)" and "Save-baseline with existing baseline". Both of the latter cases have Phase 3 skipped (Phase 3 guards: "Skip if SAVE_ONLY=true or no baseline exists"), meaning there are no comparison results — which makes "No regressions found" vacuously true.

    An LLM agent evaluating the paths top-down for a --save-baseline run against an existing baseline could match "No regressions found" and emit:

    BENCH-CHECK PASSED — no regressions beyond X% threshold

    …instead of the correct:

    BENCH-CHECK — baseline overwritten (previous: <old>, new: <new>)

    The action taken (baseline saved) would still be correct since "If not COMPARE_ONLY: update baseline" fires, but the summary message in Phase 7 and the report header in Phase 6 would say "PASSED" when no comparison was actually performed.

    Add an explicit precondition to the "No regressions found" path so agents don't inadvertently match it when Phase 3 was skipped:

    ### No regressions found
    *(Only applicable when Phase 3 ran — i.e., a baseline existed and `SAVE_ONLY` was not set)*
    - Print: `BENCH-CHECK PASSED — no regressions beyond {THRESHOLD}% threshold`
    - If not `COMPARE_ONLY`: update baseline with current results

Reviews (28): Last reviewed commit: "fix: address Greptile review feedback (#..." | Re-trigger Greptile

Comment on lines +48 to +82
- `nodeCount`, `edgeCount` — graph size

### 1b. Incremental Benchmark

```bash
node scripts/incremental-benchmark.js 2>/dev/null
```

Extract:
- `noOpRebuild` (ms) — time for no-change rebuild
- `singleFileRebuild` (ms) — time after one file change
- `importResolution` (ms) — resolution throughput

### 1c. Query Depth Benchmark

```bash
node scripts/query-benchmark.js 2>/dev/null
```

Extract:
- `fnDeps` scaling by depth
- `fnImpact` scaling by depth
- `diffImpact` latency

### 1d. Embedding Benchmark (optional)

```bash
node scripts/embedding-benchmark.js 2>/dev/null
```

Extract:
- `embeddingTime` (ms)
- `recall` at Hit@1, Hit@3, Hit@5, Hit@10

> **Timeout:** Each benchmark gets 5 minutes max. If it times out, record `"timeout"` for that suite and continue.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 2>/dev/null discards error messages that should be recorded

All four benchmark invocations pipe stderr to /dev/null, yet the Rules section (line 222) and the error-handling note on line 90 both say to record "error: <message>" when a script exits non-zero. With stderr silenced there is no message to capture — the skill can only detect that a failure occurred, not why.

Replace the silent discard with a captured-stderr pattern:

# Instead of:
node scripts/benchmark.js 2>/dev/null

# Use:
output=$(node scripts/benchmark.js 2>&1)
exit_code=$?
if [ $exit_code -ne 0 ]; then
  # record "error: $output" and continue
fi

This applies to all four benchmark invocations (lines 48, 59, 70, 81).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit a562b52. All four benchmark invocations now use output=$(... 2>&1) plus an exit_code check, so stderr is captured and recorded as "error: $output" rather than silently discarded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5, so the commit is skipped when the baseline values are identical to the previously committed version.

- **Improvement**: metric decreased by more than `THRESHOLD`% (time) or increased (quality)
- **Stable**: within threshold

> **Direction awareness:** For latency metrics (ms), higher = worse. For recall/quality metrics, higher = better. For count metrics (nodes, edges), changes are informational only — not regressions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Division-by-zero when baseline metric is 0

delta_pct = ((current - baseline) / baseline) * 100

If baseline is 0 (e.g. graph.edges on the very first build, or embedding.recall.hit1 on a cold model), this formula produces NaN or Infinity. The skill should guard against a zero baseline before computing the percentage:

if baseline == 0:
    mark as "N/A — baseline was zero"
else:
    delta_pct = ((current - baseline) / baseline) * 100

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit a562b52. Phase 3 now guards before dividing: when baseline == 0, the delta is marked "N/A — baseline was zero" and the metric is treated as informational only (no regression/improvement classification).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The division-by-zero guard was already added in a562b52 (Round 2). The current Phase 3 text at step 2 reads: "Guard against division-by-zero: if baseline == 0, mark the delta as N/A and treat the metric as informational only (not a regression or improvement)." No further change needed.

…line

- Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all
  four benchmark invocations so error messages are captured and recorded
- Add division-by-zero guard in Phase 3: when baseline == 0, mark delta
  as "N/A — baseline was zero" (informational only, not a regression)
- Add git add + git commit step in Phase 5 so the baseline file is
  actually committed after each save, matching the documented rule
- After reverting package.json + package-lock.json on --fix test failure,
  also run `npm ci` to resync node_modules/ with the restored lock file;
  without this the manifest is reverted but installed packages are not
- Add explanatory comment on @anthropic-ai/tokenizer skip-list entry
  clarifying it is a peer dependency of @anthropic-ai/sdk and may be
  required at runtime without an explicit import in our code
…erion

- Phase 5 (Update Codegraph): add source-repo guard that skips the
  self-update logic when running inside the codegraph source repo;
  comparing the dev version to the published release and running
  npm install is a no-op since codegraph is not one of its own deps
- Phase 1b stale-worktree criterion: replace "created more than 7 days
  ago" (not determinable via git worktree list) with "last commit on the
  branch is more than 7 days old AND branch has no commits ahead of
  origin/main", using `git log -1 --format=%ci <branch>`
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressing the 5 items from the "Comments Outside Diff" section of the Greptile review:

1. deps-audit — incomplete revert (line 388–392): Fixed in commit 4fc994d. The Phase 7 revert path now runs npm ci after git checkout -- package.json package-lock.json to resync node_modules/ with the restored lock file. The Rules section was also updated to reflect this.

2. housekeep — Phase 5 self-update incoherent in source repo (line 572–607): Fixed in commit 89aef6b. Added a source-repo guard at the top of Phase 5: when running inside the codegraph source repo (confirmed by pre-flight), the phase is skipped entirely and prints Codegraph: skipped (running inside source repo — update via git pull / branch sync instead).

3. housekeep — stale-worktree creation timestamp not available (line 445–451): Fixed in commit 89aef6b. Replaced "created more than 7 days ago with no commits since" with a criterion that is actually determinable: the branch's last commit is more than 7 days old (via git log -1 --format=%ci <branch>) AND the branch has no commits ahead of origin/main. The note that git worktree list does not expose creation timestamps is included inline.

4. bench-check — baseline declared committed but no git add/commit (line 226–227): Fixed in commit a562b52. Phase 5 now includes explicit git add and git commit steps after writing baseline.json and history.ndjson. The corresponding Rule was updated to say "Phase 5 always commits it".

5. deps-audit — @anthropic-ai/tokenizer undocumented (line 306–307): Fixed in commit 4fc994d. The skip-list entry now explains it is a peer dependency of @anthropic-ai/sdk that may be required at runtime without an explicit import, and notes to verify against package.json before removing. (Confirmed: the package is not currently in package.json, so this entry acts as a conservative safety note.)

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +199 to +203
## Phase 6 — Report

Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`:

```markdown
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Auto-commit fails silently when baseline values are unchanged

If the benchmark results are numerically identical to the stored baseline (e.g. a second back-to-back run on the same commit), generated/bench-check/baseline.json and history.ndjson will still be rewritten with a new timestamp. However, git commit will exit non-zero with "nothing to commit" if git detects the content hash is the same — causing the Phase 5 step to produce a spurious error.

Guard with --allow-empty (not ideal) or, better, only run the commit when the files actually changed:

git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
if ! git diff --cached --quiet; then
  git commit -m "chore: update bench-check baseline (<gitRef>)"
fi

This keeps Phase 5 idempotent and avoids a noisy error on successive clean runs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5. When the baseline values are identical to the previously committed version, git add stages nothing and the commit is now skipped instead of erroring out.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressing Round 3 outside-diff feedback (ce5d811):

test-health/SKILL.md � Changed both the flaky-detection loop and the coverage run from 2>/dev/null to 2>&1, so stderr is captured in the output variable for diagnosis instead of being discarded.

housekeep/SKILL.md � Changed Phase 4a from git branch --merged main to git branch --merged origin/main so it uses the freshly-fetched remote ref from Phase 3 instead of the potentially stale local main.

deps-audit/SKILL.md � Replaced the git checkout -- package.json package-lock.json revert pattern with a stash-based approach: save the original manifests before running auto-fix, then restore from the saved state on failure. Updated both Phase 7 instructions and the corresponding rule in the Rules section.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +154 to +166
**Before running any auto-fix** (in Phase 1/2), save the original manifests so pre-existing unstaged changes are preserved:
```bash
git stash push -m "deps-audit-backup" -- package.json package-lock.json
```

Summarize all changes made:
1. List each package updated/fixed
2. Run `npm test` to verify nothing broke
3. If tests pass: drop the saved state (`git stash drop`)
4. If tests fail:
- Restore the saved manifests: `git stash pop`
- Restore `node_modules/` to match the reverted lock file: `npm ci`
- Report what failed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stash runs too late — backup is taken after files are already modified

The git stash push is described inside Phase 7, which executes after Phase 1 (npm audit fix) and Phase 2 (npm update) have already modified package.json and package-lock.json. An agent following the phases sequentially will reach Phase 7 with the manifests already changed, making the stash a backup of the modified state rather than the original state. If tests then fail and git stash pop is invoked, it restores the already-modified files — the recovery is a no-op.

The stash command must run inside Phase 0 (immediately after AUTO_FIX=true is detected), before any npm commands touch the manifests. For example, add a step to Phase 0:

4. Parse `$ARGUMENTS` — set `AUTO_FIX=true` if `--fix` is present
5. **If `AUTO_FIX` is set:** stash the original manifests now, before any modifications:
   ```bash
   git stash push -m "deps-audit-backup" -- package.json package-lock.json

Then remove the stash instruction from Phase 7 (keeping only the summary/rollback logic there).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved the stash command to Phase 0, step 5, so manifests are backed up before any npm commands run. Removed the stash from Phase 7.

Comment on lines +55 to +65
For stale worktrees with merged branches:
- List them and ask the user for confirmation before removing
- If confirmed (or `--full` without `--dry-run`):
```bash
git worktree remove <path>
git branch -d <branch> # only if fully merged
```

**If `DRY_RUN`:** Just list what would be removed, don't do it.

> **Never force-remove** a worktree with uncommitted changes. List it as "has uncommitted work" and skip.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Default --full silently removes stale worktrees without user confirmation

The cleanup condition at line 57 reads: "If confirmed (or --full without --dry-run)". Because --full is the default mode (no arguments = full cleanup), every default run of /housekeep will remove stale worktrees without asking. The project requires every Claude Code session to start with /worktree (CLAUDE.md: "Every session must start with /worktree to get an isolated copy of the repo"), meaning there may be active worktrees belonging to concurrent sessions that the "7-day old last commit" heuristic could incorrectly classify as stale — especially when a session hasn't committed anything yet but the branch tip is old.

The "never force-remove a worktree with uncommitted changes" guard (line 65) only catches worktrees with locally-modified files; it does not protect sessions that branched off main, haven't committed anything yet, but have no staged or unstaged changes either.

The confirmation prompt should always be required for worktree removal (regardless of --full), since this is a potentially destructive, irreversible action. Reserve auto-removal only for explicitly prunable worktrees (missing directory — git worktree prune), which are already guaranteed safe:

For stale worktrees with merged branches:
- **Always** list them and ask the user for confirmation before removing
  (even with `--full` — worktree deletion is irreversible)
- If confirmed (and not `--dry-run`): ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — worktree removal now always requires user confirmation, regardless of --full mode. Only git worktree prune (missing-directory cleanup) remains automatic.

{
"savedAt": "<ISO 8601>",
"version": "<package version>",
"gitRef": "<HEAD short SHA>",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded threshold in saved baseline.json

Line 180 writes "threshold": 15 literally into baseline.json, regardless of the --threshold N argument parsed in Phase 0. If the user runs /bench-check --threshold 20, the stored baseline will record 15, making the artifact misleading when inspected later. While the skill always re-parses the threshold from $ARGUMENTS rather than reading it from the file, an operator debugging a regression from the JSON file will see the wrong value.

Replace the literal with the parsed variable:

Suggested change
"gitRef": "<HEAD short SHA>",
"threshold": <THRESHOLD>,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — baseline.json now writes "threshold": $THRESHOLD (the parsed value) instead of the hardcoded 15.

Comment on lines +167 to +170
### First run (no baseline)
- Print: `BENCH-CHECK — initial baseline saved`
- Save current results as baseline

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 --compare-only doesn't guard the "First run" baseline save

Phase 4 has two result paths:

  • "No regressions found" correctly guards with "If not COMPARE_ONLY: update baseline."
  • "First run (no baseline)" has no such guard — it unconditionally saves a baseline.

An agent running /bench-check --compare-only against a repo with no prior baseline will fall through to the "First run" path and save a baseline, contradicting the --compare-only semantics ("compare against baseline without updating it").

Add the same guard to the first-run path:

### First run (no baseline)
- If `COMPARE_ONLY`: print a warning that no baseline exists and exit
- Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the first-run path now checks COMPARE_ONLY: if set, prints a warning that no baseline exists and exits without saving. Otherwise proceeds to save the initial baseline as before.

Comment on lines +40 to +46
```bash
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json 2>&1
done
```

For each run, parse the JSON reporter output to get per-test results.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Flaky detection loop discards output — nothing to parse

The Phase 1 loop runs:

for i in $(seq 1 $FLAKY_RUNS); do
  npx vitest run --reporter=json 2>&1
done

The JSON reporter output streams to stdout and is not captured in any variable, so there is no in-memory data for the "parse the JSON reporter output to get per-test results" step immediately below. Each iteration's output will be mixed into the session's terminal output and lost before comparison is possible.

Each run's output must be captured and stored for cross-run comparison. For example:

mkdir -p /tmp/test-health-runs
for i in $(seq 1 $FLAKY_RUNS); do
  npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err
done

Then read /tmp/test-health-runs/run-*.json to build the per-test pass/fail matrix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the flaky-detection loop now redirects each run's output to /tmp/test-health-runs/run-.json (stdout) and /tmp/test-health-runs/run-.err (stderr). The parsing step reads from these per-run files for cross-run comparison.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +156 to +165
If `AUTO_FIX` was set:

Summarize all changes made:
1. List each package updated/fixed
2. Run `npm test` to verify nothing broke
3. If tests pass: drop the saved state (`git stash drop`)
4. If tests fail:
- Restore the saved manifests: `git stash pop`
- Restore `node_modules/` to match the reverted lock file: `npm ci`
- Report what failed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stash pop/drop operates on wrong entry when Phase 0 stash was a no-op

git stash push -- package.json package-lock.json is a no-op when neither file has any local modifications (the common case — a fresh branch before running the skill). Git outputs "No local changes to save" and exits non-zero without creating a stash entry.

Phase 7 then unconditionally calls either git stash drop (success path) or git stash pop (failure path). With no deps-audit-backup entry on the stack:

  • On success: git stash drop silently pops a pre-existing, unrelated stash entry (data loss).
  • On failure: git stash pop restores the wrong stash, while the broken package.json/package-lock.json from the failed npm audit fix/npm update run remain on disk.

The stash guard in Phase 0 should track whether the stash was actually created:

5. **If `AUTO_FIX` is set:**
   ```bash
   git stash push -m "deps-audit-backup" -- package.json package-lock.json
   STASH_CREATED=$?
   # STASH_CREATED=0 → stash entry exists; STASH_CREATED=1 → nothing was stashed

Then in Phase 7, only call `git stash drop`/`git stash pop` when `STASH_CREATED` was `0`. When `STASH_CREATED` was `1`, the manifests were already clean and can simply be restored with `git checkout -- package.json package-lock.json` after a test failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Phase 0 now captures STASH_CREATED=$? after the stash push. Phase 7 branches on this: when STASH_CREATED=0 (stash was created), uses stash drop/pop as before; when STASH_CREATED=1 (no-op), uses git checkout -- package.json package-lock.json instead. The Rules section was updated to match.

Comment on lines +41 to +44
mkdir -p /tmp/test-health-runs
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hardcoded /tmp/test-health-runs/ path corrupts results under concurrent sessions

CLAUDE.md explicitly states that "Multiple Claude Code instances run concurrently in this repo." If two sessions invoke /test-health simultaneously, both write to the same /tmp/test-health-runs/run-$i.json files. Run files from session A get overwritten by session B mid-loop, and the cross-run comparison in Phase 1 silently mixes results from two independent executions — producing false flaky-test results.

Use an isolated, unique working directory per invocation:

RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX)
for i in $(seq 1 $FLAKY_RUNS); do
  npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err"
done

Using mktemp -d guarantees each invocation gets its own directory, preventing cross-session pollution. Clean up with rm -rf "$RUN_DIR" at the end of Phase 1 (or Phase 5 after the report is written).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Replaced hardcoded /tmp/test-health-runs/ with RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX). All references now use $RUN_DIR. Added rm -rf "$RUN_DIR" cleanup after analysis is complete.

Comment on lines +150 to +169
## Phase 4 — Verdict

Based on comparison results:

### No regressions found
- Print: `BENCH-CHECK PASSED — no regressions beyond {THRESHOLD}% threshold`
- If not `COMPARE_ONLY`: update baseline with current results

### Regressions found
- Print: `BENCH-CHECK FAILED — {N} regressions detected`
- List each regression with metric name, baseline value, current value, delta %
- Do NOT update the baseline
- Suggest investigation:
- `git log --oneline <baseline-ref>..HEAD` to find what changed
- `codegraph diff-impact <baseline-ref> -T` to find structural changes
- Re-run individual benchmarks to confirm (not flaky)

### First run (no baseline)
- If `COMPARE_ONLY` is set: print a warning that no baseline exists and exit without saving
- Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing verdict path when --save-baseline is passed and a baseline already exists

Phase 3 is correctly skipped when SAVE_ONLY=true, but Phase 4 only defines three verdict paths:

  1. "No regressions found" — only reachable after a comparison run
  2. "Regressions found" — only reachable after a comparison run
  3. "First run (no baseline)" — only applies when no baseline file was found

When --save-baseline is passed against a repo that already has baseline.json, none of these paths apply. The agent has no clear verdict to follow and may fall through to an undefined state (e.g. silently skipping Phase 5 and never saving the baseline, or incorrectly entering the "First run" path even though a baseline was present).

Add a fourth explicit path:

### Save-baseline run (`--save-baseline`)
- Print: `BENCH-CHECK — baseline updated (previous baseline overwritten)`
- Proceed directly to Phase 5 to save the current results as the new baseline

This ensures the skill has a deterministic branch for every combination of flags × baseline-file-present.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Added a 4th explicit verdict path: "Save-baseline with existing baseline (--save-baseline)" which prints BENCH-CHECK — baseline overwritten (previous: <old gitRef>, new: <new gitRef>) and saves current results as the new baseline.

For each metric in the current run:

1. Look up the same metric in the baseline
2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Corrupted em-dash character in "N/A" string

The string "N/A � baseline was zero" contains a corrupted em-dash (the UTF-8 replacement character U+FFFD). This garbled text will appear verbatim in the regression table written to BENCH_REPORT_*.md and in the baseline.json marker, making the report hard to read.

Suggested change
2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A baseline was zero"` and treat the metric as **informational only** (not a regression or improvement)
2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A baseline was zero"` and treat the metric as **informational only** (not a regression or improvement)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Replaced the corrupted character with a proper em-dash. The string now reads "N/A — baseline was zero".

Comment on lines +194 to +197
```bash
git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)"
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Commit command deviates from project's "specific file paths" convention

CLAUDE.md requires: "Commit with specific file paths: git commit <files> -m "msg"". The current Phase 5 does git add <files> followed by a bare git commit -m "...", which stages then commits everything in the index — potentially picking up unrelated staged changes from an ongoing session.

Suggested change
```bash
git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)"
```
After writing both files, commit the baseline so it is a shared reference point:
```bash
git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
git diff --cached --quiet || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Replaced git add + git commit -m with explicit file paths per project convention: git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "...".

When Phase 0 stash push is a no-op (manifests unchanged), Phase 7
was calling stash drop/pop on the wrong entry. Track STASH_CREATED
exit code and branch on it: use git checkout when no stash exists.
…ent corruption

Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel
sessions get isolated directories. Add cleanup at end of analysis.
…plicit commit paths

Add 4th verdict path for --save-baseline when baseline already exists.
Replace corrupted em-dash character in N/A string. Change commit command
to use explicit file paths per project convention.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

…ress

Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file
paths. Updated to reflect actual state: 32 of 269 source modules migrated
(~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified
counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about
14 stale .js counterparts of already-migrated .ts files needing deletion.
Comment on lines +163 to +164
3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)
If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Success + STASH_CREATED=1 incorrectly reverts npm changes

When STASH_CREATED=1, it means the files had no pre-existing modifications before the skill ran — the stash push was a no-op. The npm commands then modified package.json/package-lock.json, and tests passed. In this success case the npm changes should be kept, but step 3 runs git checkout -- package.json package-lock.json, which reverts the files to HEAD — silently undoing every fix made by npm audit fix and npm update. This makes --fix a complete no-op on a clean branch.

The git checkout here was correct for the failure path (step 5), not the success path. The success path for STASH_CREATED=1 should be a no-op:

Suggested change
3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)
If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists)
3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)
If tests pass and `STASH_CREATED` is `1`: nothing to do — npm changes are already on disk and there is no stash entry to clean up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the success path (STASH_CREATED=1, tests pass) no longer runs git checkout to discard manifest changes. The npm audit fix changes are good when tests pass, so no action is needed. Only the failure path reverts.


After writing both files, commit the baseline so it is a shared reference point:
```bash
git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 git diff --quiet misses new files — baseline never committed on first run

git diff [<paths>] (without --cached or a tree-ish) compares the working tree to the index. On the very first run, baseline.json and history.ndjson are freshly created and are untracked — they have no index entry at all. For untracked files, git diff produces no output and exits 0, so the || short-circuits and git commit is never reached. The baseline is written to disk but silently never committed to git, contradicting the rule "The baseline file is committed to git."

Use git diff HEAD to compare the working tree against the last commit, which correctly shows new/untracked files as additions:

Suggested change
git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"
git diff HEAD --quiet -- generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"

This exits non-zero for both new files (first run) and modified files (subsequent runs), so the commit fires whenever there is genuinely new content.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced git diff --quiet with git add followed by git diff --cached --quiet. This ensures newly created files (first run) are staged and detected, so the baseline commit is not silently skipped.

Comment on lines +158 to +165
**If `DRY_RUN`:** List branches that would be deleted.

**Otherwise:** Delete merged branches:
```bash
git branch -d <branch> # safe delete, only if fully merged
```

> **Never use `git branch -D`** (force delete). If `-d` fails, the branch has unmerged work — skip it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Branch deletion silently skips user confirmation unlike worktree removal

Phase 1c (worktree removal) always requires user confirmation, even for clearly stale worktrees. Phase 4d ("Otherwise: Delete merged branches") runs git branch -d immediately with no equivalent confirmation step. While git branch -d is safe (refuses to delete unmerged work), users may still want to review the list of branches about to be pruned — especially in a repo where multiple concurrent sessions may have feature branches.

Consider adding a brief "List branches to be deleted and ask for confirmation" step before the git branch -d loop, consistent with Phase 1c's pattern.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — branch deletion now requires user confirmation before each delete, consistent with worktree removal in Phase 1c.

When STASH_CREATED=1 and tests pass, the npm audit fix changes are
good — no action needed. Previously it ran git checkout to discard
them, which undid the successful fix.

## Phase 6 — Report

**Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed.** That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ABORTED + --compare-only + no baseline silently produces no report

When all of the following are true simultaneously:

  1. COMPARE_ONLY=true
  2. No baseline exists (first run)
  3. All benchmark suites fail or timeout (ABORTED)

The flow is:

  • Phase 4 ABORTED pre-condition fires: "Stop here and skip to Phase 6"
  • Phase 6 skip guard: "Skip this phase if COMPARE_ONLY was set and no baseline existed"

Because the ABORTED pre-condition is orthogonal to the baseline existence check, the Phase 6 skip guard fires on the ABORTED result too. The agent writes no report and prints no ABORTED message — the run silently completes with no output. This is particularly confusing in CI contexts where a broken benchmarking environment would go undetected.

The Phase 6 skip guard should be narrowed to exclude the ABORTED case:

Suggested change
**Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed.** That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved.
**Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed, AND the ABORTED pre-condition was not triggered.** That early-exit case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved. When ABORTED, always write the ABORTED report regardless of other flags.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added explicit ABORTED skip guard to Phase 5, and narrowed the Phase 6 skip condition to exclude the ABORTED case. When ABORTED, Phase 5 is now skipped (no empty baseline written) and Phase 6 always writes the ABORTED report regardless of COMPARE_ONLY or baseline state.

Comment on lines +165 to +169
- If the pop applies cleanly:
a. Run `npm install` to re-sync `node_modules/` with the merged manifest.
b. Re-run `npm test` to confirm nothing broke with the merged dependency state.
c. If tests still pass: confirm the project is consistent.
d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No recovery path when tests fail after clean-pop + npm install

When the stash pops cleanly and npm install re-syncs node_modules/, the skill re-runs npm test. If those tests fail (step d), the skill warns the user that pre-existing changes conflict with the audit fixes — but the stash has already been consumed by git stash pop. There is no way to return to either prior state:

  • The stash entry is gone, so the pre-existing manifest state cannot be automatically restored.
  • npm audit fix/npm update changes are already merged with the pre-existing changes in the working tree.

Without a recovery path, the user is left with a mixed, broken state and must manually reconstruct which changes to keep. Add explicit recovery guidance:

   d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
      Recovery options:
      - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
      - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
      - To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — step 169d now lists three explicit recovery options: undo all changes (git checkout + npm ci), keep only audit fixes (manual edit + npm ci), or keep only pre-existing changes (re-run without --fix).

carlos-alm added a commit that referenced this pull request Mar 24, 2026
…Script (Phase 5.4) (#579)

* feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep

Four recurring maintenance routines as Claude Code skills:
- /deps-audit: vulnerability scanning, staleness, unused deps, license checks
- /bench-check: benchmark regression detection against saved baselines
- /test-health: flaky test detection, dead tests, coverage gap analysis
- /housekeep: clean worktrees, dirt files, sync main, prune branches

* fix(bench-check): capture stderr, guard division-by-zero, commit baseline

- Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all
  four benchmark invocations so error messages are captured and recorded
- Add division-by-zero guard in Phase 3: when baseline == 0, mark delta
  as "N/A — baseline was zero" (informational only, not a regression)
- Add git add + git commit step in Phase 5 so the baseline file is
  actually committed after each save, matching the documented rule

* fix(deps-audit): run npm ci after revert, document tokenizer skip reason

- After reverting package.json + package-lock.json on --fix test failure,
  also run `npm ci` to resync node_modules/ with the restored lock file;
  without this the manifest is reverted but installed packages are not
- Add explanatory comment on @anthropic-ai/tokenizer skip-list entry
  clarifying it is a peer dependency of @anthropic-ai/sdk and may be
  required at runtime without an explicit import in our code

* fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion

- Phase 5 (Update Codegraph): add source-repo guard that skips the
  self-update logic when running inside the codegraph source repo;
  comparing the dev version to the published release and running
  npm install is a no-op since codegraph is not one of its own deps
- Phase 1b stale-worktree criterion: replace "created more than 7 days
  ago" (not determinable via git worktree list) with "last commit on the
  branch is more than 7 days old AND branch has no commits ahead of
  origin/main", using `git log -1 --format=%ci <branch>`

* fix: address Round 3 Greptile review feedback

* fix: move deps-audit stash to Phase 0, before npm commands modify manifests

* fix: capture flaky-detection loop output to per-run files for comparison

* fix: always require confirmation for stale worktree removal

* fix: use parsed threshold in baseline.json, guard --compare-only on first run

* fix(deps-audit): track stash creation to avoid operating on wrong entry

When Phase 0 stash push is a no-op (manifests unchanged), Phase 7
was calling stash drop/pop on the wrong entry. Track STASH_CREATED
exit code and branch on it: use git checkout when no stash exists.

* fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption

Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel
sessions get isolated directories. Add cleanup at end of analysis.

* fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths

Add 4th verdict path for --save-baseline when baseline already exists.
Replace corrupted em-dash character in N/A string. Change commit command
to use explicit file paths per project convention.

* docs(roadmap): update Phase 5 TypeScript migration with accurate progress

Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file
paths. Updated to reflect actual state: 32 of 269 source modules migrated
(~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified
counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about
14 stale .js counterparts of already-migrated .ts files needing deletion.

* fix: deps-audit success path should keep npm changes, not revert (#565)

When STASH_CREATED=1 and tests pass, the npm audit fix changes are
good — no action needed. Previously it ran git checkout to discard
them, which undid the successful fix.

* fix: bench-check use git add + diff --cached to detect new files (#565)

git diff --quiet ignores untracked files, so on the first run when
baseline.json and history.ndjson are newly created, the commit was
skipped. Stage first with git add, then check with --cached.

* fix: housekeep require confirmation before branch deletion (#565)

Branch deletion now asks for user confirmation before each delete,
consistent with worktree removal in Phase 1c.

* fix: scope git diff --cached to bench-check files only (#565)

* fix: use json-summary reporter to match coverage-summary.json output (#565)

* fix: capture stash ref by name to avoid position-based targeting (#565)

* fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565)

* fix: use dynamic threshold variable in bench-check Phase 6 report template (#565)

* fix: address open review items in maintenance skills (#565)

- bench-check: add timeout 300 wrappers to all 4 benchmark invocations
  with exit code 124 check for timeout detection
- bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry
- housekeep: fix grep portability — use grep -cE instead of GNU \| syntax
- test-health: add timeout 180 wrapper in flaky detection loop
- test-health: fix find command -o precedence with grouping parentheses

* fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health

* fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs

* fix: add empty-string guard for stat size check in housekeep (#565)

When both stat variants (GNU and BSD) fail, $size is empty and the
arithmetic comparison errors out. Add a [ -z "$size" ] && continue
guard so the loop skips files whose size cannot be determined.

* fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565)

Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a
shortened report with "Verdict: BASELINE SAVED" instead of the full
comparison report.

Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with
explicit "If timeout / Else if non-zero" so the two conditions are
clearly mutually exclusive.

* docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7)

Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status
still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5
merged via PRs #553, #554, #555, #566 but was listed with stale counts.
Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7
with 76 of 283 modules migrated (~27%).

* docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress

Previous commit incorrectly marked 5.3-5.5 as complete. In reality
76 of 283 src files are .ts (~27%) while 207 remain .js (~73%).
PRs #553, #554, #555, #566 migrated a first wave but left substantial
work in each step: 4 leaf files, 39 core files, 159 orchestration
files. Updated each step with accurate migrated/remaining counts.

* fix(skill): ban untracked deferrals in /review skill

The /review skill allowed replying "acknowledged as follow-up" to
reviewer comments without tracking them anywhere. These deferrals
get lost — nobody revisits PR comment threads after merge.

Now: if a fix is genuinely out of scope, the skill must create a
GitHub issue with the follow-up label before replying. The reply
must include the issue link. A matching rule in the Rules section
reinforces the ban.

* feat(types): migrate db, graph algorithms/builders, and domain/queries to TypeScript (Phase 5.5)

Migrate 19 remaining JS files to TypeScript across db/, graph/, and domain/:
- db/: connection, migrations, query-builder, index barrel
- graph/algorithms/leiden/: adapter, cpm, modularity, optimiser, partition, index
- graph/algorithms/: louvain, index barrel
- graph/builders/: dependency, structure, temporal, index barrel
- graph/classifiers/: index barrel
- graph/: index barrel
- domain/: queries barrel

Key type additions:
- GraphAdapter, Partition, DetectClustersResult interfaces for Leiden
- LockedDatabase type for advisory-locked DB instances
- DependencyGraphOptions, TemporalGraphOptions for graph builders
- Generic Statement<TRow> in vendor.d.ts for type-safe DB queries

Also fixes pre-existing type errors in module-map.ts (untyped prepare
calls) and generator.ts (null vs undefined argument).

* feat(types): migrate builder stages, search, and graph domain to TypeScript (Phase 5.4)

Migrate 24 JS files to strict TypeScript across three domains:

- domain/graph/builder/stages/ (9 files): all build pipeline stages
- domain/graph/builder/incremental.ts + cycles.ts + journal.ts + change-journal.ts
- domain/search/search/ (6 files): hybrid, semantic, keyword, filters, prepare, cli-formatter
- domain/search/stores/ (2 files): fts5, sqlite-blob
- domain/search/strategies/ (3 files): source, structured, text-utils

Key type fixes: ParseChange stat type alignment (mtimeMs→mtime conversion),
IncrementalStmts duck-typed interfaces for watcher compatibility, NodeWithId
relaxed to Pick<NodeRow> for partial node queries, ESCAPE clause backslash fix.

Passes tsc --noEmit with zero errors. No test regressions.

* chore: delete stale .js files replaced by TypeScript counterparts

Remove cycles.js and sqlite-blob.js — their .ts replacements were
committed in the previous commit.

* fix(types): annotate implicit any parameters in resolve-imports barrel handling

The `parseFilesAuto` return type is `Map<string, any>`, causing `imp` parameters
in `.filter()` and `.map()` callbacks to have implicit `any` type — failing strict
TypeScript checks in CI.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +226 to +254
@@ -227,6 +237,22 @@ Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`.
<!-- Full JSON output from each benchmark -->
```

**If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report:

```markdown
# Benchmark Report — <date>

**Version:** X.Y.Z | **Git ref:** abc1234 | **Threshold:** $THRESHOLD%

## Verdict: ABORTED — no valid benchmark results

All benchmark suites failed or timed out. See Phase 1 error records for details.

## Raw Results

<!-- Error/timeout records from each suite -->
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ABORTED branch checked after SAVE_ONLY — misleading "BASELINE SAVED" report

Phase 6 checks conditions in order. The SAVE_ONLY / first-run branch fires before the ABORTED branch, so when both conditions are true the wrong report is emitted:

Scenario A — --save-baseline + all benchmarks fail:

  • SAVE_ONLY=true, metrics={}, ABORTED fired
  • Phase 5 skipped (ABORTED guard — no baseline written)
  • Phase 6: SAVE_ONLY is set → true → writes "Verdict: BASELINE SAVED — no comparison performed"
  • But no baseline was actually saved.

Scenario B — First run (no baseline) + all benchmarks fail:

  • No baseline exists, metrics={}, ABORTED fired
  • Phase 5 skipped (ABORTED guard)
  • Phase 6: no prior baseline existed (first run) → true → writes "Verdict: BASELINE SAVED"
  • Again, no baseline was saved.

The ABORTED check must come before the SAVE_ONLY / first-run check so it can short-circuit:

**If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report:
...

**If `SAVE_ONLY` is set or no prior baseline existed (first run):** write a shortened report — ...

**Otherwise (comparison was performed):** write the full report ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — moved the ABORTED condition check before SAVE_ONLY/first-run in Phase 6, so when all benchmarks fail on a --save-baseline or first run, the correct ABORTED report is written instead of a misleading "BASELINE SAVED" report.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in prior commit 7aab540 — Phase 6 already checks ABORTED before SAVE_ONLY/first-run, preventing the misleading BASELINE SAVED report when all benchmarks fail.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +93 to +98
if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
if [ "$DRY_RUN" = "true" ]; then
echo "[DRY RUN] Would remove stale lock: $f"
else
echo "Removing stale lock: $f"
rm "$f"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 lsof unavailable treated as "no process holds the file"

When lsof is not installed (common in minimal Docker/CI containers), lsof "$f" exits with code 127 ("command not found"). The condition ! lsof "$f" > /dev/null 2>&1 becomes true, so the stale lock file is silently deleted — even though we have no information about whether a process holds it. This is the opposite of the intended "require user confirmation when check is unavailable" behaviour stated in the prose above the snippet.

The elif branch ("file is held — ask user") is only reached when lsof exits 0 (found the file). It never fires for exit code 127.

Add an explicit lsof availability check before the condition:

for f in .codegraph/*.lock; do
  [ -f "$f" ] || continue
  age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
  [ -z "$age" ] && continue
  if [ "$age" -gt 3600 ]; then
    if ! command -v lsof > /dev/null 2>&1; then
      echo "Lock file $f is old but cannot verify if held (lsof unavailable) — ask user before removing"
    elif ! lsof "$f" > /dev/null 2>&1; then
      if [ "$DRY_RUN" = "true" ]; then
        echo "[DRY RUN] Would remove stale lock: $f"
      else
        echo "Removing stale lock: $f"
        rm "$f"
      fi
    else
      echo "Lock file $f is old but still held by a process — ask user before removing"
    fi
  fi
done

This correctly separates three cases: lsof not installed → ask user; lsof finds no holder → safe to delete; lsof finds a holder → ask user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 933c0d3 — the lock file removal now explicitly checks command -v lsof before attempting to use it. If lsof is not installed (exit 127 in Docker/CI), lock file removal is skipped entirely with a warning, rather than assuming no process holds the file.

Comment on lines +177 to 180
4. If tests fail and `STASH_REF` is non-empty:
- Restore the saved manifests: `git stash pop $STASH_REF`
- Restore `node_modules/` to match the reverted lock file: `npm ci`
- Report what failed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Failure-path git stash pop applied to npm-modified manifests will conflict

When tests fail and STASH_REF is non-empty, the stash was created from:

working tree = HEAD + pre-existing changes

By this point in step 4, the working tree contains:

HEAD + npm's modifications (audit fix / update)

git stash pop $STASH_REF applies the stash as a patch on top of the current state — it does not restore the working tree to a previous snapshot. Since the stash patch (pre-existing changes) and the current state (npm modifications) both modify package.json/package-lock.json from the same base (HEAD), git will almost certainly report conflicts, leaving the manifests with conflict markers and node_modules/ in an undefined state.

The correct two-step restore is:

  1. First, reset the manifest files to HEAD (undoing npm's changes):
    git checkout HEAD -- package.json package-lock.json
  2. Then re-apply the pre-existing changes cleanly from HEAD:
    git stash pop $STASH_REF

At step 2 the working tree matches HEAD, so the stash applies exactly as it was originally created — no conflicts.

Suggested replacement for step 4:

4. If tests fail and `STASH_REF` is non-empty:
   - Reset manifests to HEAD first (undoes npm changes):
     `git checkout HEAD -- package.json package-lock.json`
   - Then re-apply the pre-existing changes cleanly:
     `git stash pop $STASH_REF`
   - Restore `node_modules/` to match the reverted lock file: `npm ci`
   - Report what failed

Note that the success path (step 3) intentionally does a merge (pop on the npm-modified state) to preserve both sets of changes. The failure path semantics are different — we want a full restore — so the merge approach is wrong here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the failure path now resets manifests to HEAD first (git checkout HEAD -- package.json package-lock.json) before popping the stash. This ensures the stash applies cleanly against the same base it was created from, avoiding conflicts. Also updated the Rules section to match.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in prior commit 0b08a2b — the failure path already resets manifests to HEAD first (git checkout HEAD -- package.json package-lock.json) before popping the stash, preventing conflicts between npm-modified and stashed versions.

…565)

When tests fail after npm audit fix, the working tree has npm's
modifications. Popping the stash directly would conflict because both
sides modified package.json from the same base. Reset to HEAD first
so the stash applies cleanly.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +169 to +173
d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
Recovery options:
- To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
- To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
- To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Recovery option 3 cannot achieve its stated goal — stash already consumed

At the point where step 3d fires (clean pop succeeded → npm install ran → npm test failed), the stash entry for deps-audit-backup was already consumed by git stash pop $STASH_REF in step 3. The pre-existing changes are now merged into the working tree alongside the npm audit fixes.

Recovery option 3 reads:

"To keep only the pre-existing changes and discard the audit fixes: re-run /deps-audit without --fix"

This advice is incorrect. Re-running the skill without --fix audits the current (mixed) manifest — it performs no rollback. The npm fixes remain in the working tree untouched. The user is left with the same mixed state, potentially misled into thinking re-running the skill undid the audit changes.

The correct recovery for "keep only pre-existing changes" at this point is manual, since the stash was consumed:

- To keep only the pre-existing changes and discard the audit fixes:
  1. Run `git diff HEAD -- package.json package-lock.json` to identify the full set of changes
     (this shows npm fixes AND pre-existing changes merged together)
  2. Manually edit `package.json`/`package-lock.json` to remove the audit-fix deltas, keeping only your original changes
  3. Run `npm ci` to restore `node_modules/` to match the edited lock file
  — Note: the pre-existing stash entry was consumed by the `git stash pop` above; it cannot be re-applied

Alternatively, the cleanest approach is:

- To start fresh (discard everything — both audit fixes and pre-existing changes):
  `git checkout HEAD -- package.json package-lock.json && npm ci`
  Then re-apply your original changes manually (the stash entry is gone).

The current option 3 text should be corrected or removed to avoid misdirecting the user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 005f806 — replaced the unachievable "re-run /deps-audit without --fix" with accurate manual recovery steps: \Unknown command: ci"

…#565)

When lsof is not installed (Docker/CI), it exits 127 and the lock file
would be deleted assuming no process holds it. Now explicitly checks
command -v lsof first and skips lock file removal entirely if unavailable.
Recovery option 3 said "re-run /deps-audit without --fix" but at that
point the stash is already consumed, so re-running doesn't undo npm
changes. Replaced with accurate manual recovery steps: revert manifests
to HEAD with git checkout and npm ci, then re-apply pre-existing changes.
#565)

When a run produces {"timeout":true} or {"error":true}, it has no
per-test data. If counted as "all tests failed", every stable test
looks flaky. Now skips invalid runs and requires minimum 2 valid runs
for flaky detection to be conclusive.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed latest Greptile feedback:

  • housekeep: Added explicit command -v lsof guard for lock file removal — skips entirely if lsof unavailable instead of assuming no process holds the file (933c0d3)
  • deps-audit: Replaced unachievable recovery option 3 ("re-run without --fix") with accurate manual recovery steps since the stash is already consumed at that point (005f806)
  • test-health: Added error/timeout run exclusion in flaky analysis with minimum 2 valid runs requirement — prevents invalid runs from making every stable test look flaky (2e6d37d)
  • deps-audit failure path: Already fixed in prior commit (0b08a2b) — resets manifests to HEAD before stash pop
  • bench-check ABORTED ordering: Already fixed in prior commit (7aab540) — ABORTED check comes before SAVE_ONLY in Phase 6

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +88 to +103
```bash
for f in .codegraph/*.lock; do
[ -f "$f" ] || continue
age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
[ -z "$age" ] && continue
if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
if [ "$DRY_RUN" = "true" ]; then
echo "[DRY RUN] Would remove stale lock: $f"
else
echo "Removing stale lock: $f"
rm "$f"
fi
elif [ "$age" -gt 3600 ]; then
echo "Lock file $f is old but still held by a process — ask user before removing"
fi
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 command -v lsof guard described in prose but absent from code snippet

The prose above the snippet correctly states: "Before removing, first check if lsof is available (command -v lsof). If lsof is not installedskip lock file removal entirely."

However, the code snippet itself has no such guard. The if condition on line 93 is:

if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then

When lsof is absent (exits 127 — command not found), ! lsof "$f" evaluates to true (because 127 is non-zero). The entire if branch fires and rm "$f" executes — exactly the unsafe behavior the prose was meant to prevent. An agent following the code block literally, rather than the preceding prose, will silently delete stale lock files in Docker/CI containers without any process-ownership check.

The command -v lsof guard needs to be embedded in the code snippet itself so both paths are consistently enforced:

if ! command -v lsof > /dev/null 2>&1; then
  echo "lsof not available — skipping lock file cleanup (cannot verify no process holds the file)"
else
  for f in .codegraph/*.lock; do
    [ -f "$f" ] || continue
    age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
    [ -z "$age" ] && continue
    if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
      if [ "$DRY_RUN" = "true" ]; then
        echo "[DRY RUN] Would remove stale lock: $f"
      else
        echo "Removing stale lock: $f"
        rm "$f"
      fi
    elif [ "$age" -gt 3600 ]; then
      echo "Lock file $f is old but still held by a process — ask user before removing"
    fi
  done
fi

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in prior commit — the code snippet at lines 89-107 already wraps the entire lock file loop inside if ! command -v lsof > /dev/null 2>&1, matching the suggested fix exactly. The prose and code are now consistent.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Fixed the housekeep SKILL.md lock file removal code snippet — embedded the command -v lsof guard directly in the code block, wrapping the entire lock file loop. When lsof is absent, the snippet now skips lock file cleanup entirely and prints a warning instead of proceeding with unsafe deletion.

@greptileai

carlos-alm added a commit that referenced this pull request Mar 24, 2026
* feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep

Four recurring maintenance routines as Claude Code skills:
- /deps-audit: vulnerability scanning, staleness, unused deps, license checks
- /bench-check: benchmark regression detection against saved baselines
- /test-health: flaky test detection, dead tests, coverage gap analysis
- /housekeep: clean worktrees, dirt files, sync main, prune branches

* fix(bench-check): capture stderr, guard division-by-zero, commit baseline

- Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all
  four benchmark invocations so error messages are captured and recorded
- Add division-by-zero guard in Phase 3: when baseline == 0, mark delta
  as "N/A — baseline was zero" (informational only, not a regression)
- Add git add + git commit step in Phase 5 so the baseline file is
  actually committed after each save, matching the documented rule

* fix(deps-audit): run npm ci after revert, document tokenizer skip reason

- After reverting package.json + package-lock.json on --fix test failure,
  also run `npm ci` to resync node_modules/ with the restored lock file;
  without this the manifest is reverted but installed packages are not
- Add explanatory comment on @anthropic-ai/tokenizer skip-list entry
  clarifying it is a peer dependency of @anthropic-ai/sdk and may be
  required at runtime without an explicit import in our code

* fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion

- Phase 5 (Update Codegraph): add source-repo guard that skips the
  self-update logic when running inside the codegraph source repo;
  comparing the dev version to the published release and running
  npm install is a no-op since codegraph is not one of its own deps
- Phase 1b stale-worktree criterion: replace "created more than 7 days
  ago" (not determinable via git worktree list) with "last commit on the
  branch is more than 7 days old AND branch has no commits ahead of
  origin/main", using `git log -1 --format=%ci <branch>`

* fix: address Round 3 Greptile review feedback

* fix: move deps-audit stash to Phase 0, before npm commands modify manifests

* fix: capture flaky-detection loop output to per-run files for comparison

* fix: always require confirmation for stale worktree removal

* fix: use parsed threshold in baseline.json, guard --compare-only on first run

* fix(deps-audit): track stash creation to avoid operating on wrong entry

When Phase 0 stash push is a no-op (manifests unchanged), Phase 7
was calling stash drop/pop on the wrong entry. Track STASH_CREATED
exit code and branch on it: use git checkout when no stash exists.

* fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption

Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel
sessions get isolated directories. Add cleanup at end of analysis.

* fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths

Add 4th verdict path for --save-baseline when baseline already exists.
Replace corrupted em-dash character in N/A string. Change commit command
to use explicit file paths per project convention.

* docs(roadmap): update Phase 5 TypeScript migration with accurate progress

Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file
paths. Updated to reflect actual state: 32 of 269 source modules migrated
(~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified
counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about
14 stale .js counterparts of already-migrated .ts files needing deletion.

* fix: deps-audit success path should keep npm changes, not revert (#565)

When STASH_CREATED=1 and tests pass, the npm audit fix changes are
good — no action needed. Previously it ran git checkout to discard
them, which undid the successful fix.

* fix: bench-check use git add + diff --cached to detect new files (#565)

git diff --quiet ignores untracked files, so on the first run when
baseline.json and history.ndjson are newly created, the commit was
skipped. Stage first with git add, then check with --cached.

* fix: housekeep require confirmation before branch deletion (#565)

Branch deletion now asks for user confirmation before each delete,
consistent with worktree removal in Phase 1c.

* fix: scope git diff --cached to bench-check files only (#565)

* fix: use json-summary reporter to match coverage-summary.json output (#565)

* fix: capture stash ref by name to avoid position-based targeting (#565)

* fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565)

* fix: use dynamic threshold variable in bench-check Phase 6 report template (#565)

* fix: address open review items in maintenance skills (#565)

- bench-check: add timeout 300 wrappers to all 4 benchmark invocations
  with exit code 124 check for timeout detection
- bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry
- housekeep: fix grep portability — use grep -cE instead of GNU \| syntax
- test-health: add timeout 180 wrapper in flaky detection loop
- test-health: fix find command -o precedence with grouping parentheses

* fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health

* fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs

* fix: add empty-string guard for stat size check in housekeep (#565)

When both stat variants (GNU and BSD) fail, $size is empty and the
arithmetic comparison errors out. Add a [ -z "$size" ] && continue
guard so the loop skips files whose size cannot be determined.

* fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565)

Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a
shortened report with "Verdict: BASELINE SAVED" instead of the full
comparison report.

Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with
explicit "If timeout / Else if non-zero" so the two conditions are
clearly mutually exclusive.

* docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7)

Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status
still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5
merged via PRs #553, #554, #555, #566 but was listed with stale counts.
Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7
with 76 of 283 modules migrated (~27%).

* docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress

Previous commit incorrectly marked 5.3-5.5 as complete. In reality
76 of 283 src files are .ts (~27%) while 207 remain .js (~73%).
PRs #553, #554, #555, #566 migrated a first wave but left substantial
work in each step: 4 leaf files, 39 core files, 159 orchestration
files. Updated each step with accurate migrated/remaining counts.

* fix(skill): ban untracked deferrals in /review skill

The /review skill allowed replying "acknowledged as follow-up" to
reviewer comments without tracking them anywhere. These deferrals
get lost — nobody revisits PR comment threads after merge.

Now: if a fix is genuinely out of scope, the skill must create a
GitHub issue with the follow-up label before replying. The reply
must include the issue link. A matching rule in the Rules section
reinforces the ban.

* fix(skill): add --repo flag, multi-endpoint reply for deferrals, and scope guidance (#568)

* fix(skill): guard follow-up label creation before gh issue create (#568)

gh issue create --label "follow-up" fails if the label doesn't exist
in the repo. Add a gh label create guard step that is safe to re-run.

* feat(skill): parallelize /review with one subagent per PR

* fix: correct heredoc terminator indentation in review skill (#568)

* fix(skill): capture gh issue create output before referencing issue number

gh issue create prints the new issue URL to stdout — capture it and
extract the number so reply templates can reference it unambiguously.

* fix(skill): surface follow-up issues in review result format and summary table

Add "Issues Created" field to the subagent result format and an "Issues" column
to the Step 3 summary table, so deferred out-of-scope items are visible in the
final report.

* fix(skill): require verbatim rule propagation to subagents

* fix(skill): align Issues field name between 2i result and Step 3 table (#568)
Comment on lines +164 to +176
3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results.
- If the pop applies cleanly:
a. Run `npm install` to re-sync `node_modules/` with the merged manifest.
b. Re-run `npm test` to confirm nothing broke with the merged dependency state.
c. If tests still pass: confirm the project is consistent.
d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
Recovery options:
- To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
- To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
- To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes
- If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing."
- For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes.
If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Initial npm test result may be stale after clean pop + npm install

Step 2 runs npm test before the stash pop. Step 3 then says "If tests pass and STASH_REF is non-empty" — but the test run that passed in step 2 was against the npm-audited manifest WITHOUT the pre-existing stash changes applied. After the clean pop and npm install in step 3a, node_modules/ may differ from what step 2's tests used. The skill correctly re-runs npm test in step 3b to catch this, which is good.

However, the prose in step 3 is ordered as: "If tests pass … pop and merge … npm install … re-run npm test." An agent could interpret this as: "tests already passed, so pop and install, then re-test as a formality." The intent should be clearer that the step-2 test result is only a prerequisite for attempting the pop — the step-3b re-test is the authoritative result that determines whether the final state is consistent.

Consider making this explicit:

Suggested change
3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results.
- If the pop applies cleanly:
a. Run `npm install` to re-sync `node_modules/` with the merged manifest.
b. Re-run `npm test` to confirm nothing broke with the merged dependency state.
c. If tests still pass: confirm the project is consistent.
d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
Recovery options:
- To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
- To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
- To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes
- If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing."
- For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes.
If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up
3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results. Note: the step 2 test run validated the npm changes alone; step 3b is the authoritative test of the final merged state.
- If the pop applies cleanly:
a. Run `npm install` to re-sync `node_modules/` with the merged manifest.
b. Re-run `npm test` to confirm the merged state is consistent (this is the authoritative check).
c. If tests still pass: confirm the project is consistent.
d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added explicit note that the step 2 test run validates npm changes alone, and that step 3b is the authoritative test of the final merged state. The step 3b description now says '(this is the authoritative check — step 2 only validated the npm changes in isolation)'.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit b4f8010 into main Mar 24, 2026
16 checks passed
@carlos-alm carlos-alm deleted the feat/maintenance-skills branch March 24, 2026 06:07
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant