feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep by carlos-alm · Pull Request #565 · optave/ops-codegraph-tool

carlos-alm · 2026-03-21T11:07:50Z

Summary

/deps-audit — Audit dependencies for vulnerabilities (npm audit), staleness (npm outdated), unused packages, and license compliance. Produces a scored health report. Optional --fix for safe auto-updates.
/bench-check — Run benchmark suite against a saved baseline, detect regressions beyond a configurable threshold, maintain a history log for trend tracking. Guards against silent performance degradation.
/test-health — Detect flaky tests (multi-run), dead/trivial tests (no assertions), coverage gaps on core modules, and structural issues (oversized files, missing cleanup). Read-only audit with prioritized fix suggestions.
/housekeep — Local repo spring cleaning: prune stale worktrees, delete temp/dirt files, sync with main, update codegraph, prune merged branches, verify graph and node_modules integrity. Supports --dry-run.

Test plan

Run /deps-audit and verify report is generated in generated/deps-audit/
Run /deps-audit --fix and verify safe updates are applied, tests pass
Run /bench-check --save-baseline to create initial baseline
Run /bench-check again to verify comparison works
Run /test-health --quick and verify report in generated/test-health/
Run /test-health (full) and verify flaky detection runs N times
Run /housekeep --dry-run and verify no modifications are made
Run /housekeep and verify cleanup actions

…housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches

claude · 2026-03-21T11:08:02Z

Claude finished @carlos-alm's task —— View job

PR Review: Maintenance Skills

Reviewing this PR that adds 4 new maintenance skills for Claude Code.

Todo List:

Read and analyze all 4 skill files
Phase 1: Root cause analysis
Phase 2: Configuration change detection
Phase 3: Test integrity
Phase 4: Code quality assessment
Phase 5: Backlog compliance evaluation
Provide final structured assessment

Root Cause Analysis

Problem Being Solved: This PR addresses maintenance debt and technical hygiene gaps that accumulate in active development repositories. The skills automate routine maintenance tasks that are typically done manually or inconsistently.

Why This Approach: These are documentation-only additions that define Claude Code skills (not executable code), following the established .claude/skills/ pattern. Each skill is comprehensive and well-structured with clear phases and safety rules.

Risk Assessment: Low risk - these are pure documentation files that define maintenance workflows. No code changes, no dependencies added.

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies; builds on existing npm/node/git/codegraph infrastructure
Foundation-aligned: ✓ — Aligns with P8 (optimize for planning/developing/refactoring); maintenance automation supports core development workflow
Problem-fit: 2 — Useful maintenance automation but doesn't directly address the "lost AI in large codebases" core problem
Breaking: No — Purely additive documentation files
Tier: 1

Critical Concerns

None identified. The skills are:

Well-structured with clear phases and safety rules
Infrastructure-aware (all referenced scripts exist)
Safety-focused (dry-run modes, rollback procedures, test verification)
Documentation-only (no executable code changes)

Quality Assessment

Excellent Design Patterns:

Phase-based structure makes complex workflows manageable
Safety rules prevent destructive operations (--dry-run, test verification)
Comprehensive coverage (deps, perf, tests, cleanup)
Rollback procedures for failed operations
Configurable thresholds for different project needs

Minor Observations:

/deps-audit references some packages that may not apply to all projects (acceptable - skill provides guidance)
/bench-check assumes specific benchmark script structure (verified to exist in this repo)
All skills generate reports in generated/ directory (good pattern)

Technical Validation:

✅ All referenced scripts exist (scripts/benchmark*.js)
✅ Uses existing dev dependencies (@vitest/coverage-v8, better-sqlite3)
✅ No new runtime dependencies required
✅ Follows existing skill format and patterns

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality maintenance automation that follows established patterns, includes proper safety measures, and provides valuable workflow automation for development teams. While not directly addressing the core "lost AI" problem (hence Tier 1), these skills will improve development velocity and code quality.

greptile-apps · 2026-03-21T11:11:36Z

Greptile Summary

This PR introduces four maintenance skills (/bench-check, /deps-audit, /test-health, /housekeep) as agent instruction files, alongside a roadmap update marking TypeScript migration step 5.3 complete. The skills have been through extensive prior review iterations that addressed ~25 issues; this round finds the previous fixes well-applied with two remaining concerns.

Key changes in this diff:

bench-check: Adds the ABORTED pre-condition (empty metrics guard), explicit Phase 5/6 skip guards for ABORTED, narrows the Phase 6 skip condition so ABORTED always produces a report, updates Phase 7 summary line, and corrects the stale "always commits" rule.
deps-audit: Replaces STASH_CREATED=$?-based branching with STASH_REF-presence checks (correctly accounting for git 2.16+ returning 0 even when nothing is stashed), expands clean-pop success path with npm install + re-test + recovery options, and fixes the failure-path to reset to HEAD before popping.
housekeep: Adds dual dirt-file discovery (gitignored vs. untracked), lsof-guarded lock-file removal with DRY_RUN awareness, [ -d "$f" ] && continue guard for directory entries in the large-file scan, and git pull --no-rebase.
test-health: Captures exit code, uses jq for safe JSON encoding of stderr, excludes invalid runs from flaky analysis with a minimum-valid-runs check, and switches to origin/main for the coverage diff.

Issues found:

housekeep Phase 1c: git worktree prune runs unconditionally even when --dry-run is active, violating the "DRY_RUN is sacred" rule. Use git worktree prune --dry-run in dry-run mode.
bench-check Phase 4: The "No regressions found" verdict path appears before "First run" and "Save-baseline" paths and is vacuously true whenever Phase 3 was skipped (SAVE_ONLY or first-run). This can cause an agent to emit a misleading BENCH-CHECK PASSED message (though the underlying baseline-save action would still be correct). Adding an explicit applicability note to that path removes the ambiguity.

Confidence Score: 3/5

Safe to merge after addressing the two flagged issues; both are correctness concerns in agent-facing instruction documents rather than executable code.
The prior 25-issue review cycle has been thoroughly addressed. Two new issues remain: one is a clear DRY_RUN violation in housekeep (P1 — a running /housekeep --dry-run would still prune worktree refs), the other is a verdict-message ambiguity in bench-check (P1 — wrong message but correct action). Neither causes data loss or security issues, but both are observable behavioral deviations from the documented contracts.
.claude/skills/housekeep/SKILL.md (Phase 1c DRY_RUN guard) and .claude/skills/bench-check/SKILL.md (Phase 4 verdict ordering)

Important Files Changed

Filename	Overview
.claude/skills/bench-check/SKILL.md	Addresses many previous review issues (ABORTED pre-condition, Phase 5/6 skip guards, COMPARE_ONLY early exit, Phase 7 summary line); one remaining issue — "No regressions found" path is ambiguous when Phase 3 was skipped (SAVE_ONLY or first-run), potentially causing a misleading "PASSED" verdict message.
.claude/skills/deps-audit/SKILL.md	Extensive Phase 7 rework: STASH_REF-based branching replaces fragile exit-code checks, clean-pop success path now does npm install + re-test with recovery options, failure path resets to HEAD before popping stash. No new issues found.
.claude/skills/housekeep/SKILL.md	Good fixes for dirt-file discovery (dual category approach), lock-file lsof guard, git pull --no-rebase, and DRY_RUN for lock files; however, `git worktree prune` in Phase 1c still runs unconditionally without a DRY_RUN guard, violating the "DRY_RUN is sacred" rule.
.claude/skills/test-health/SKILL.md	Flaky detection loop now captures exit codes, uses jq for safe JSON encoding, applies mktemp isolation, excludes invalid runs from analysis, and uses origin/main for coverage diff; no new issues found.
docs/roadmap/ROADMAP.md	Bookkeeping update marking step 5.3 as complete, updating remaining counts for 5.4, and noting step 5.3 completion status; no issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    START([skill invoked]) --> P0[Phase 0 — Pre-flight\nparse args, check repo]

    P0 --> BC_P1[bench-check\nPhase 1: run benchmarks\ntimeout 300 each]
    BC_P1 --> BC_ABORT{metrics empty?}
    BC_ABORT -- yes --> BC_P6A[Phase 6: ABORTED report]
    BC_ABORT -- no --> BC_P3{SAVE_ONLY\nor no baseline?}
    BC_P3 -- yes --> BC_P4S[Phase 4: First-run\nor Save-baseline verdict]
    BC_P3 -- no --> BC_P3C[Phase 3: compare\ncompute delta_pct]
    BC_P3C --> BC_P4V{regressions?}
    BC_P4V -- yes --> BC_P6F[Phase 6: FAILED report\nno baseline update]
    BC_P4V -- no --> BC_P5{COMPARE_ONLY?}
    BC_P5 -- no --> BC_P5S[Phase 5: save baseline\ngit commit files]
    BC_P5 -- yes --> BC_P6P[Phase 6: PASSED report\nno baseline update]
    BC_P5S --> BC_P6P
    BC_P4S --> BC_P6B[Phase 6: BASELINE SAVED report]
    BC_P6A --> BC_P7[Phase 7: print summary]
    BC_P6F --> BC_P7
    BC_P6P --> BC_P7
    BC_P6B --> BC_P7

    P0 --> DA_P1[deps-audit\nPhase 0: stash if --fix\ncapture STASH_REF by name]
    DA_P1 --> DA_P2[Phases 1–5: audit\nsecurity/outdated/unused\nlicense/duplicates]
    DA_P2 --> DA_P6[Phase 6: report]
    DA_P6 --> DA_P7{AUTO_FIX?}
    DA_P7 -- no --> DA_END([done])
    DA_P7 -- yes --> DA_TEST[npm test]
    DA_TEST --> DA_PASS{pass?}
    DA_PASS -- yes + STASH_REF non-empty --> DA_POP[git stash pop\nnpm install\nre-run npm test]
    DA_PASS -- yes + STASH_REF empty --> DA_END
    DA_POP --> DA_END
    DA_PASS -- no + STASH_REF non-empty --> DA_RESTORE[git checkout HEAD\ngit stash pop\nnpm ci]
    DA_PASS -- no + STASH_REF empty --> DA_REVERT[git checkout\nnpm ci]
    DA_RESTORE --> DA_END
    DA_REVERT --> DA_END

    P0 --> TH_P1[test-health\nPhase 1: mktemp RUN_DIR\nrun FLAKY_RUNS × vitest\ntimeout 180 each]
    TH_P1 --> TH_P1A[exclude invalid runs\nmin 2 valid runs\nfor flaky detection]
    TH_P1A --> TH_P2[Phase 2: dead tests\nPhase 3: coverage json-summary\nPhase 4: structure]
    TH_P2 --> TH_P5[Phase 5: report\nrm -rf RUN_DIR]

    P0 --> HK_P1[housekeep\nPhase 1: worktree prune\n+ confirm stale removal]
    HK_P1 --> HK_P2[Phase 2: dirt files\ndual discovery\nlsof lock guard]
    HK_P2 --> HK_P3[Phase 3: git fetch\ngit pull --no-rebase]
    HK_P3 --> HK_P4[Phase 4: prune merged branches\nconfirm each]
    HK_P4 --> HK_P5[Phase 5: source-repo guard\nalways skipped here]
    HK_P5 --> HK_P6[Phase 6: health checks\ncoverage / graph / git fsck]
    HK_P6 --> HK_P7[Phase 7: console report]

Comments Outside Diff (2)

.claude/skills/housekeep/SKILL.md, line 50-53 (link)

git worktree prune runs unconditionally in --dry-run mode

Phase 1c runs git worktree prune without a DRY_RUN guard, even though the Rules section explicitly states "--dry-run is sacred — it must NEVER modify anything, only report." The prose note "If DRY_RUN: Just list what would be removed, don't do it" appears after the prune code block and — structurally — applies only to the "stale worktrees with merged branches" section, leaving git worktree prune unconditionally executed.

git worktree prune modifies git's internal administrative state (removes stale worktree refs under .git/worktrees/). Running it in --dry-run mode violates the invariant. git worktree prune --dry-run exists precisely for this case.

bash

In dry-run mode, report what would be pruned without modifying anything

if [ "$DRY_RUN" = "true" ]; then
git worktree prune --dry-run
else
git worktree prune
fi
```
**Context Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=7111ef63-eefd-42cc-82a1-a1c617a968ee))
```
.claude/skills/bench-check/SKILL.md, line 165-168 (link)

First-run / SAVE_ONLY verdict path ordering ambiguity

Phase 4's verdict paths are checked in textual order: "No regressions found" comes before "First run (no baseline)" and "Save-baseline with existing baseline". Both of the latter cases have Phase 3 skipped (Phase 3 guards: "Skip if SAVE_ONLY=true or no baseline exists"), meaning there are no comparison results — which makes "No regressions found" vacuously true.

An LLM agent evaluating the paths top-down for a --save-baseline run against an existing baseline could match "No regressions found" and emit:

BENCH-CHECK PASSED — no regressions beyond X% threshold

…instead of the correct:

BENCH-CHECK — baseline overwritten (previous: <old>, new: <new>)

The action taken (baseline saved) would still be correct since "If not COMPARE_ONLY: update baseline" fires, but the summary message in Phase 7 and the report header in Phase 6 would say "PASSED" when no comparison was actually performed.

Add an explicit precondition to the "No regressions found" path so agents don't inadvertently match it when Phase 3 was skipped:
```
### No regressions found
*(Only applicable when Phase 3 ran — i.e., a baseline existed and `SAVE_ONLY` was not set)*
- Print: `BENCH-CHECK PASSED — no regressions beyond {THRESHOLD}% threshold`
- If not `COMPARE_ONLY`: update baseline with current results
```

_{Reviews (28): Last reviewed commit: "fix: address Greptile review feedback (#..." | Re-trigger Greptile}

greptile-apps · 2026-03-21T11:11:40Z

.claude/skills/bench-check/SKILL.md

+- `nodeCount`, `edgeCount` — graph size
+
+### 1b. Incremental Benchmark
+
+```bash
+node scripts/incremental-benchmark.js 2>/dev/null
+```
+
+Extract:
+- `noOpRebuild` (ms) — time for no-change rebuild
+- `singleFileRebuild` (ms) — time after one file change
+- `importResolution` (ms) — resolution throughput
+
+### 1c. Query Depth Benchmark
+
+```bash
+node scripts/query-benchmark.js 2>/dev/null
+```
+
+Extract:
+- `fnDeps` scaling by depth
+- `fnImpact` scaling by depth
+- `diffImpact` latency
+
+### 1d. Embedding Benchmark (optional)
+
+```bash
+node scripts/embedding-benchmark.js 2>/dev/null
+```
+
+Extract:
+- `embeddingTime` (ms)
+- `recall` at Hit@1, Hit@3, Hit@5, Hit@10
+
+> **Timeout:** Each benchmark gets 5 minutes max. If it times out, record `"timeout"` for that suite and continue.


2>/dev/null discards error messages that should be recorded

All four benchmark invocations pipe stderr to /dev/null, yet the Rules section (line 222) and the error-handling note on line 90 both say to record "error: <message>" when a script exits non-zero. With stderr silenced there is no message to capture — the skill can only detect that a failure occurred, not why.

Replace the silent discard with a captured-stderr pattern:

# Instead of: node scripts/benchmark.js 2>/dev/null # Use: output=$(node scripts/benchmark.js 2>&1) exit_code=$? if [ $exit_code -ne 0 ]; then # record "error: $output" and continue fi

This applies to all four benchmark invocations (lines 48, 59, 70, 81).

Fixed in commit a562b52. All four benchmark invocations now use output=$(... 2>&1) plus an exit_code check, so stderr is captured and recorded as "error: $output" rather than silently discarded.

Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5, so the commit is skipped when the baseline values are identical to the previously committed version.

greptile-apps · 2026-03-21T11:11:41Z

.claude/skills/bench-check/SKILL.md

+   - **Improvement**: metric decreased by more than `THRESHOLD`% (time) or increased (quality)
+   - **Stable**: within threshold
+
+> **Direction awareness:** For latency metrics (ms), higher = worse. For recall/quality metrics, higher = better. For count metrics (nodes, edges), changes are informational only — not regressions.


Division-by-zero when baseline metric is 0

delta_pct = ((current - baseline) / baseline) * 100

If baseline is 0 (e.g. graph.edges on the very first build, or embedding.recall.hit1 on a cold model), this formula produces NaN or Infinity. The skill should guard against a zero baseline before computing the percentage:

if baseline == 0: mark as "N/A — baseline was zero" else: delta_pct = ((current - baseline) / baseline) * 100

Fixed in commit a562b52. Phase 3 now guards before dividing: when baseline == 0, the delta is marked "N/A — baseline was zero" and the metric is treated as informational only (no regression/improvement classification).

The division-by-zero guard was already added in a562b52 (Round 2). The current Phase 3 text at step 2 reads: "Guard against division-by-zero: if baseline == 0, mark the delta as N/A and treat the metric as informational only (not a regression or improvement)." No further change needed.

…line - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule

- After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code

…erion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>`

…at/maintenance-skills

carlos-alm · 2026-03-21T11:27:18Z

Addressing the 5 items from the "Comments Outside Diff" section of the Greptile review:

1. deps-audit — incomplete revert (line 388–392): Fixed in commit 4fc994d. The Phase 7 revert path now runs npm ci after git checkout -- package.json package-lock.json to resync node_modules/ with the restored lock file. The Rules section was also updated to reflect this.

2. housekeep — Phase 5 self-update incoherent in source repo (line 572–607): Fixed in commit 89aef6b. Added a source-repo guard at the top of Phase 5: when running inside the codegraph source repo (confirmed by pre-flight), the phase is skipped entirely and prints Codegraph: skipped (running inside source repo — update via git pull / branch sync instead).

3. housekeep — stale-worktree creation timestamp not available (line 445–451): Fixed in commit 89aef6b. Replaced "created more than 7 days ago with no commits since" with a criterion that is actually determinable: the branch's last commit is more than 7 days old (via git log -1 --format=%ci <branch>) AND the branch has no commits ahead of origin/main. The note that git worktree list does not expose creation timestamps is included inline.

4. bench-check — baseline declared committed but no git add/commit (line 226–227): Fixed in commit a562b52. Phase 5 now includes explicit git add and git commit steps after writing baseline.json and history.ndjson. The corresponding Rule was updated to say "Phase 5 always commits it".

5. deps-audit — @anthropic-ai/tokenizer undocumented (line 306–307): Fixed in commit 4fc994d. The skip-list entry now explains it is a peer dependency of @anthropic-ai/sdk that may be required at runtime without an explicit import, and notes to verify against package.json before removing. (Confirmed: the package is not currently in package.json, so this entry acts as a conservative safety note.)

carlos-alm · 2026-03-21T11:27:26Z

@greptileai

greptile-apps · 2026-03-21T11:31:24Z

.claude/skills/bench-check/SKILL.md

+## Phase 6 — Report
+
+Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`:
+
+```markdown


Auto-commit fails silently when baseline values are unchanged

If the benchmark results are numerically identical to the stored baseline (e.g. a second back-to-back run on the same commit), generated/bench-check/baseline.json and history.ndjson will still be rewritten with a new timestamp. However, git commit will exit non-zero with "nothing to commit" if git detects the content hash is the same — causing the Phase 5 step to produce a spurious error.

Guard with --allow-empty (not ideal) or, better, only run the commit when the files actually changed:

git add generated/bench-check/baseline.json generated/bench-check/history.ndjson if ! git diff --cached --quiet; then git commit -m "chore: update bench-check baseline (<gitRef>)" fi

This keeps Phase 5 idempotent and avoids a noisy error on successive clean runs.

Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5. When the baseline values are identical to the previously committed version, git add stages nothing and the commit is now skipped instead of erroring out.

carlos-alm · 2026-03-22T01:36:01Z

Addressing Round 3 outside-diff feedback (ce5d811):

test-health/SKILL.md � Changed both the flaky-detection loop and the coverage run from 2>/dev/null to 2>&1, so stderr is captured in the output variable for diagnosis instead of being discarded.

housekeep/SKILL.md � Changed Phase 4a from git branch --merged main to git branch --merged origin/main so it uses the freshly-fetched remote ref from Phase 3 instead of the potentially stale local main.

deps-audit/SKILL.md � Replaced the git checkout -- package.json package-lock.json revert pattern with a stash-based approach: save the original manifests before running auto-fix, then restore from the saved state on failure. Updated both Phase 7 instructions and the corresponding rule in the Rules section.

carlos-alm · 2026-03-22T01:36:11Z

@greptileai

greptile-apps · 2026-03-22T01:40:12Z

.claude/skills/deps-audit/SKILL.md

+**Before running any auto-fix** (in Phase 1/2), save the original manifests so pre-existing unstaged changes are preserved:
+```bash
+git stash push -m "deps-audit-backup" -- package.json package-lock.json
+```
+
+Summarize all changes made:
+1. List each package updated/fixed
+2. Run `npm test` to verify nothing broke
+3. If tests pass: drop the saved state (`git stash drop`)
+4. If tests fail:
+   - Restore the saved manifests: `git stash pop`
+   - Restore `node_modules/` to match the reverted lock file: `npm ci`
+   - Report what failed


Stash runs too late — backup is taken after files are already modified

The git stash push is described inside Phase 7, which executes after Phase 1 (npm audit fix) and Phase 2 (npm update) have already modified package.json and package-lock.json. An agent following the phases sequentially will reach Phase 7 with the manifests already changed, making the stash a backup of the modified state rather than the original state. If tests then fail and git stash pop is invoked, it restores the already-modified files — the recovery is a no-op.

The stash command must run inside Phase 0 (immediately after AUTO_FIX=true is detected), before any npm commands touch the manifests. For example, add a step to Phase 0:

4. Parse `$ARGUMENTS` — set `AUTO_FIX=true` if `--fix` is present 5. **If `AUTO_FIX` is set:** stash the original manifests now, before any modifications: ```bash git stash push -m "deps-audit-backup" -- package.json package-lock.json

Then remove the stash instruction from Phase 7 (keeping only the summary/rollback logic there).

Fixed — moved the stash command to Phase 0, step 5, so manifests are backed up before any npm commands run. Removed the stash from Phase 7.

greptile-apps · 2026-03-22T01:40:13Z

.claude/skills/housekeep/SKILL.md

+For stale worktrees with merged branches:
+- List them and ask the user for confirmation before removing
+- If confirmed (or `--full` without `--dry-run`):
+  ```bash
+  git worktree remove <path>
+  git branch -d <branch>  # only if fully merged
+  ```
+
+**If `DRY_RUN`:** Just list what would be removed, don't do it.
+
+> **Never force-remove** a worktree with uncommitted changes. List it as "has uncommitted work" and skip.


Default --full silently removes stale worktrees without user confirmation

The cleanup condition at line 57 reads: "If confirmed (or --full without --dry-run)". Because --full is the default mode (no arguments = full cleanup), every default run of /housekeep will remove stale worktrees without asking. The project requires every Claude Code session to start with /worktree (CLAUDE.md: "Every session must start with /worktree to get an isolated copy of the repo"), meaning there may be active worktrees belonging to concurrent sessions that the "7-day old last commit" heuristic could incorrectly classify as stale — especially when a session hasn't committed anything yet but the branch tip is old.

The "never force-remove a worktree with uncommitted changes" guard (line 65) only catches worktrees with locally-modified files; it does not protect sessions that branched off main, haven't committed anything yet, but have no staged or unstaged changes either.

The confirmation prompt should always be required for worktree removal (regardless of --full), since this is a potentially destructive, irreversible action. Reserve auto-removal only for explicitly prunable worktrees (missing directory — git worktree prune), which are already guaranteed safe:

For stale worktrees with merged branches: - **Always** list them and ask the user for confirmation before removing (even with `--full` — worktree deletion is irreversible) - If confirmed (and not `--dry-run`): ...

Fixed — worktree removal now always requires user confirmation, regardless of --full mode. Only git worktree prune (missing-directory cleanup) remains automatic.

greptile-apps · 2026-03-22T01:40:14Z

.claude/skills/bench-check/SKILL.md

+{
+  "savedAt": "<ISO 8601>",
+  "version": "<package version>",
+  "gitRef": "<HEAD short SHA>",


Hardcoded threshold in saved baseline.json

Line 180 writes "threshold": 15 literally into baseline.json, regardless of the --threshold N argument parsed in Phase 0. If the user runs /bench-check --threshold 20, the stored baseline will record 15, making the artifact misleading when inspected later. While the skill always re-parses the threshold from $ARGUMENTS rather than reading it from the file, an operator debugging a regression from the JSON file will see the wrong value.

Replace the literal with the parsed variable:

Suggested change

"gitRef": "<HEAD short SHA>",

"threshold": <THRESHOLD>,

Fixed — baseline.json now writes "threshold": $THRESHOLD (the parsed value) instead of the hardcoded 15.

greptile-apps · 2026-03-22T01:40:15Z

.claude/skills/bench-check/SKILL.md

+### First run (no baseline)
+- Print: `BENCH-CHECK — initial baseline saved`
+- Save current results as baseline
+


--compare-only doesn't guard the "First run" baseline save

Phase 4 has two result paths:

"No regressions found" correctly guards with "If not COMPARE_ONLY: update baseline."

"First run (no baseline)" has no such guard — it unconditionally saves a baseline.

An agent running /bench-check --compare-only against a repo with no prior baseline will fall through to the "First run" path and save a baseline, contradicting the --compare-only semantics ("compare against baseline without updating it").

Add the same guard to the first-run path:

### First run (no baseline) - If `COMPARE_ONLY`: print a warning that no baseline exists and exit - Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline

Fixed — the first-run path now checks COMPARE_ONLY: if set, prints a warning that no baseline exists and exits without saving. Otherwise proceeds to save the initial baseline as before.

greptile-apps · 2026-03-22T01:40:16Z

.claude/skills/test-health/SKILL.md

+```bash
+for i in $(seq 1 $FLAKY_RUNS); do
+  npx vitest run --reporter=json 2>&1
+done
+```
+
+For each run, parse the JSON reporter output to get per-test results.


Flaky detection loop discards output — nothing to parse

The Phase 1 loop runs:

for i in $(seq 1 $FLAKY_RUNS); do npx vitest run --reporter=json 2>&1 done

The JSON reporter output streams to stdout and is not captured in any variable, so there is no in-memory data for the "parse the JSON reporter output to get per-test results" step immediately below. Each iteration's output will be mixed into the session's terminal output and lost before comparison is possible.

Each run's output must be captured and stored for cross-run comparison. For example:

mkdir -p /tmp/test-health-runs for i in $(seq 1 $FLAKY_RUNS); do npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err done

Then read /tmp/test-health-runs/run-*.json to build the per-test pass/fail matrix.

Fixed — the flaky-detection loop now redirects each run's output to /tmp/test-health-runs/run-.json (stdout) and /tmp/test-health-runs/run-.err (stderr). The parsing step reads from these per-run files for cross-run comparison.

…ifests

…irst run

carlos-alm · 2026-03-22T05:34:18Z

@greptileai

greptile-apps · 2026-03-22T05:38:46Z

.claude/skills/deps-audit/SKILL.md

+If `AUTO_FIX` was set:
+
+Summarize all changes made:
+1. List each package updated/fixed
+2. Run `npm test` to verify nothing broke
+3. If tests pass: drop the saved state (`git stash drop`)
+4. If tests fail:
+   - Restore the saved manifests: `git stash pop`
+   - Restore `node_modules/` to match the reverted lock file: `npm ci`
+   - Report what failed


Stash pop/drop operates on wrong entry when Phase 0 stash was a no-op

git stash push -- package.json package-lock.json is a no-op when neither file has any local modifications (the common case — a fresh branch before running the skill). Git outputs "No local changes to save" and exits non-zero without creating a stash entry.

Phase 7 then unconditionally calls either git stash drop (success path) or git stash pop (failure path). With no deps-audit-backup entry on the stack:

On success: git stash drop silently pops a pre-existing, unrelated stash entry (data loss).

On failure: git stash pop restores the wrong stash, while the broken package.json/package-lock.json from the failed npm audit fix/npm update run remain on disk.

The stash guard in Phase 0 should track whether the stash was actually created:

5. **If `AUTO_FIX` is set:** ```bash git stash push -m "deps-audit-backup" -- package.json package-lock.json STASH_CREATED=$? # STASH_CREATED=0 → stash entry exists; STASH_CREATED=1 → nothing was stashed

Then in Phase 7, only call `git stash drop`/`git stash pop` when `STASH_CREATED` was `0`. When `STASH_CREATED` was `1`, the manifests were already clean and can simply be restored with `git checkout -- package.json package-lock.json` after a test failure.

Fixed — Phase 0 now captures STASH_CREATED=$? after the stash push. Phase 7 branches on this: when STASH_CREATED=0 (stash was created), uses stash drop/pop as before; when STASH_CREATED=1 (no-op), uses git checkout -- package.json package-lock.json instead. The Rules section was updated to match.

greptile-apps · 2026-03-22T05:38:47Z

.claude/skills/test-health/SKILL.md

+mkdir -p /tmp/test-health-runs
+for i in $(seq 1 $FLAKY_RUNS); do
+  npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err
+done


Hardcoded /tmp/test-health-runs/ path corrupts results under concurrent sessions

CLAUDE.md explicitly states that "Multiple Claude Code instances run concurrently in this repo." If two sessions invoke /test-health simultaneously, both write to the same /tmp/test-health-runs/run-$i.json files. Run files from session A get overwritten by session B mid-loop, and the cross-run comparison in Phase 1 silently mixes results from two independent executions — producing false flaky-test results.

Use an isolated, unique working directory per invocation:

RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX) for i in $(seq 1 $FLAKY_RUNS); do npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err" done

Using mktemp -d guarantees each invocation gets its own directory, preventing cross-session pollution. Clean up with rm -rf "$RUN_DIR" at the end of Phase 1 (or Phase 5 after the report is written).

Fixed — Replaced hardcoded /tmp/test-health-runs/ with RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX). All references now use $RUN_DIR. Added rm -rf "$RUN_DIR" cleanup after analysis is complete.

greptile-apps · 2026-03-22T05:38:48Z

.claude/skills/bench-check/SKILL.md

+## Phase 4 — Verdict
+
+Based on comparison results:
+
+### No regressions found
+- Print: `BENCH-CHECK PASSED — no regressions beyond {THRESHOLD}% threshold`
+- If not `COMPARE_ONLY`: update baseline with current results
+
+### Regressions found
+- Print: `BENCH-CHECK FAILED — {N} regressions detected`
+- List each regression with metric name, baseline value, current value, delta %
+- Do NOT update the baseline
+- Suggest investigation:
+  - `git log --oneline <baseline-ref>..HEAD` to find what changed
+  - `codegraph diff-impact <baseline-ref> -T` to find structural changes
+  - Re-run individual benchmarks to confirm (not flaky)
+
+### First run (no baseline)
+- If `COMPARE_ONLY` is set: print a warning that no baseline exists and exit without saving
+- Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline


Missing verdict path when --save-baseline is passed and a baseline already exists

Phase 3 is correctly skipped when SAVE_ONLY=true, but Phase 4 only defines three verdict paths:

"No regressions found" — only reachable after a comparison run

"Regressions found" — only reachable after a comparison run

"First run (no baseline)" — only applies when no baseline file was found

When --save-baseline is passed against a repo that already has baseline.json, none of these paths apply. The agent has no clear verdict to follow and may fall through to an undefined state (e.g. silently skipping Phase 5 and never saving the baseline, or incorrectly entering the "First run" path even though a baseline was present).

Add a fourth explicit path:

### Save-baseline run (`--save-baseline`) - Print: `BENCH-CHECK — baseline updated (previous baseline overwritten)` - Proceed directly to Phase 5 to save the current results as the new baseline

This ensures the skill has a deterministic branch for every combination of flags × baseline-file-present.

Fixed — Added a 4th explicit verdict path: "Save-baseline with existing baseline (--save-baseline)" which prints BENCH-CHECK — baseline overwritten (previous: <old gitRef>, new: <new gitRef>) and saves current results as the new baseline.

greptile-apps · 2026-03-22T05:38:49Z

.claude/skills/bench-check/SKILL.md

+For each metric in the current run:
+
+1. Look up the same metric in the baseline
+2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement)


Corrupted em-dash character in "N/A" string

The string "N/A ï¿½ baseline was zero" contains a corrupted em-dash (the UTF-8 replacement character U+FFFD). This garbled text will appear verbatim in the regression table written to BENCH_REPORT_*.md and in the baseline.json marker, making the report hard to read.

Suggested change

2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement)

2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A — baseline was zero"` and treat the metric as **informational only** (not a regression or improvement)

Fixed — Replaced the corrupted character with a proper em-dash. The string now reads "N/A — baseline was zero".

greptile-apps · 2026-03-22T05:38:50Z

.claude/skills/bench-check/SKILL.md

+```bash
+git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
+git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)"
+```


Commit command deviates from project's "specific file paths" convention

CLAUDE.md requires: "Commit with specific file paths: git commit <files> -m "msg"". The current Phase 5 does git add <files> followed by a bare git commit -m "...", which stages then commits everything in the index — potentially picking up unrelated staged changes from an ongoing session.

Suggested change

```bash

git add generated/bench-check/baseline.json generated/bench-check/history.ndjson

git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)"

```

After writing both files, commit the baseline so it is a shared reference point:

```bash

git add generated/bench-check/baseline.json generated/bench-check/history.ndjson

git diff --cached --quiet || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"

Fixed — Replaced git add + git commit -m with explicit file paths per project convention: git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "...".

When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists.

…ent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis.

…plicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention.

carlos-alm · 2026-03-22T05:56:29Z

@greptileai

…ress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion.

greptile-apps · 2026-03-22T06:05:48Z

.claude/skills/deps-audit/SKILL.md

+3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)
+   If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists)


Success + STASH_CREATED=1 incorrectly reverts npm changes

When STASH_CREATED=1, it means the files had no pre-existing modifications before the skill ran — the stash push was a no-op. The npm commands then modified package.json/package-lock.json, and tests passed. In this success case the npm changes should be kept, but step 3 runs git checkout -- package.json package-lock.json, which reverts the files to HEAD — silently undoing every fix made by npm audit fix and npm update. This makes --fix a complete no-op on a clean branch.

The git checkout here was correct for the failure path (step 5), not the success path. The success path for STASH_CREATED=1 should be a no-op:

Suggested change

3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)

If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists)

3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)

If tests pass and `STASH_CREATED` is `1`: nothing to do — npm changes are already on disk and there is no stash entry to clean up

Fixed — the success path (STASH_CREATED=1, tests pass) no longer runs git checkout to discard manifest changes. The npm audit fix changes are good when tests pass, so no action is needed. Only the failure path reverts.

greptile-apps · 2026-03-22T06:05:49Z

.claude/skills/bench-check/SKILL.md

+
+After writing both files, commit the baseline so it is a shared reference point:
+```bash
+git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"


git diff --quiet misses new files — baseline never committed on first run

git diff [<paths>] (without --cached or a tree-ish) compares the working tree to the index. On the very first run, baseline.json and history.ndjson are freshly created and are untracked — they have no index entry at all. For untracked files, git diff produces no output and exits 0, so the || short-circuits and git commit is never reached. The baseline is written to disk but silently never committed to git, contradicting the rule "The baseline file is committed to git."

Use git diff HEAD to compare the working tree against the last commit, which correctly shows new/untracked files as additions:

Suggested change

git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"

git diff HEAD --quiet -- generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"

This exits non-zero for both new files (first run) and modified files (subsequent runs), so the commit fires whenever there is genuinely new content.

Fixed — replaced git diff --quiet with git add followed by git diff --cached --quiet. This ensures newly created files (first run) are staged and detected, so the baseline commit is not silently skipped.

greptile-apps · 2026-03-22T06:05:50Z

.claude/skills/housekeep/SKILL.md

+**If `DRY_RUN`:** List branches that would be deleted.
+
+**Otherwise:** Delete merged branches:
+```bash
+git branch -d <branch>  # safe delete, only if fully merged
+```
+
+> **Never use `git branch -D`** (force delete). If `-d` fails, the branch has unmerged work — skip it.


Branch deletion silently skips user confirmation unlike worktree removal

Phase 1c (worktree removal) always requires user confirmation, even for clearly stale worktrees. Phase 4d ("Otherwise: Delete merged branches") runs git branch -d immediately with no equivalent confirmation step. While git branch -d is safe (refuses to delete unmerged work), users may still want to review the list of branches about to be pruned — especially in a repo where multiple concurrent sessions may have feature branches.

Consider adding a brief "List branches to be deleted and ask for confirmation" step before the git branch -d loop, consistent with Phase 1c's pattern.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Fixed — branch deletion now requires user confirmation before each delete, consistent with worktree removal in Phase 1c.

When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix.

greptile-apps · 2026-03-24T01:01:58Z

.claude/skills/bench-check/SKILL.md


 ## Phase 6 — Report

+**Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed.** That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved.


ABORTED + --compare-only + no baseline silently produces no report

When all of the following are true simultaneously:

COMPARE_ONLY=true

No baseline exists (first run)

All benchmark suites fail or timeout (ABORTED)

The flow is:

Phase 4 ABORTED pre-condition fires: "Stop here and skip to Phase 6"

Phase 6 skip guard: "Skip this phase if COMPARE_ONLY was set and no baseline existed"

Because the ABORTED pre-condition is orthogonal to the baseline existence check, the Phase 6 skip guard fires on the ABORTED result too. The agent writes no report and prints no ABORTED message — the run silently completes with no output. This is particularly confusing in CI contexts where a broken benchmarking environment would go undetected.

The Phase 6 skip guard should be narrowed to exclude the ABORTED case:

Suggested change

**Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed.** That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved.

**Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed, AND the ABORTED pre-condition was not triggered.** That early-exit case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved. When ABORTED, always write the ABORTED report regardless of other flags.

Fixed — added explicit ABORTED skip guard to Phase 5, and narrowed the Phase 6 skip condition to exclude the ABORTED case. When ABORTED, Phase 5 is now skipped (no empty baseline written) and Phase 6 always writes the ABORTED report regardless of COMPARE_ONLY or baseline state.

greptile-apps · 2026-03-24T01:01:59Z

.claude/skills/deps-audit/SKILL.md

+   - If the pop applies cleanly:
+     a. Run `npm install` to re-sync `node_modules/` with the merged manifest.
+     b. Re-run `npm test` to confirm nothing broke with the merged dependency state.
+     c. If tests still pass: confirm the project is consistent.
+     d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.


No recovery path when tests fail after clean-pop + npm install

When the stash pops cleanly and npm install re-syncs node_modules/, the skill re-runs npm test. If those tests fail (step d), the skill warns the user that pre-existing changes conflict with the audit fixes — but the stash has already been consumed by git stash pop. There is no way to return to either prior state:

The stash entry is gone, so the pre-existing manifest state cannot be automatically restored.

npm audit fix/npm update changes are already merged with the pre-existing changes in the working tree.

Without a recovery path, the user is left with a mixed, broken state and must manually reconstruct which changes to keep. Add explicit recovery guidance:

d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. Recovery options: - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci` - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci` - To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix`

Fixed — step 169d now lists three explicit recovery options: undo all changes (git checkout + npm ci), keep only audit fixes (manual edit + npm ci), or keep only pre-existing changes (re-run without --fix).

…Script (Phase 5.4) (#579) * feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches * fix(bench-check): capture stderr, guard division-by-zero, commit baseline - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule * fix(deps-audit): run npm ci after revert, document tokenizer skip reason - After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code * fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>` * fix: address Round 3 Greptile review feedback * fix: move deps-audit stash to Phase 0, before npm commands modify manifests * fix: capture flaky-detection loop output to per-run files for comparison * fix: always require confirmation for stale worktree removal * fix: use parsed threshold in baseline.json, guard --compare-only on first run * fix(deps-audit): track stash creation to avoid operating on wrong entry When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists. * fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis. * fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention. * docs(roadmap): update Phase 5 TypeScript migration with accurate progress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion. * fix: deps-audit success path should keep npm changes, not revert (#565) When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix. * fix: bench-check use git add + diff --cached to detect new files (#565) git diff --quiet ignores untracked files, so on the first run when baseline.json and history.ndjson are newly created, the commit was skipped. Stage first with git add, then check with --cached. * fix: housekeep require confirmation before branch deletion (#565) Branch deletion now asks for user confirmation before each delete, consistent with worktree removal in Phase 1c. * fix: scope git diff --cached to bench-check files only (#565) * fix: use json-summary reporter to match coverage-summary.json output (#565) * fix: capture stash ref by name to avoid position-based targeting (#565) * fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565) * fix: use dynamic threshold variable in bench-check Phase 6 report template (#565) * fix: address open review items in maintenance skills (#565) - bench-check: add timeout 300 wrappers to all 4 benchmark invocations with exit code 124 check for timeout detection - bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry - housekeep: fix grep portability — use grep -cE instead of GNU \| syntax - test-health: add timeout 180 wrapper in flaky detection loop - test-health: fix find command -o precedence with grouping parentheses * fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health * fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs * fix: add empty-string guard for stat size check in housekeep (#565) When both stat variants (GNU and BSD) fail, $size is empty and the arithmetic comparison errors out. Add a [ -z "$size" ] && continue guard so the loop skips files whose size cannot be determined. * fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565) Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a shortened report with "Verdict: BASELINE SAVED" instead of the full comparison report. Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with explicit "If timeout / Else if non-zero" so the two conditions are clearly mutually exclusive. * docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7) Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5 merged via PRs #553, #554, #555, #566 but was listed with stale counts. Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7 with 76 of 283 modules migrated (~27%). * docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress Previous commit incorrectly marked 5.3-5.5 as complete. In reality 76 of 283 src files are .ts (~27%) while 207 remain .js (~73%). PRs #553, #554, #555, #566 migrated a first wave but left substantial work in each step: 4 leaf files, 39 core files, 159 orchestration files. Updated each step with accurate migrated/remaining counts. * fix(skill): ban untracked deferrals in /review skill The /review skill allowed replying "acknowledged as follow-up" to reviewer comments without tracking them anywhere. These deferrals get lost — nobody revisits PR comment threads after merge. Now: if a fix is genuinely out of scope, the skill must create a GitHub issue with the follow-up label before replying. The reply must include the issue link. A matching rule in the Rules section reinforces the ban. * feat(types): migrate db, graph algorithms/builders, and domain/queries to TypeScript (Phase 5.5) Migrate 19 remaining JS files to TypeScript across db/, graph/, and domain/: - db/: connection, migrations, query-builder, index barrel - graph/algorithms/leiden/: adapter, cpm, modularity, optimiser, partition, index - graph/algorithms/: louvain, index barrel - graph/builders/: dependency, structure, temporal, index barrel - graph/classifiers/: index barrel - graph/: index barrel - domain/: queries barrel Key type additions: - GraphAdapter, Partition, DetectClustersResult interfaces for Leiden - LockedDatabase type for advisory-locked DB instances - DependencyGraphOptions, TemporalGraphOptions for graph builders - Generic Statement<TRow> in vendor.d.ts for type-safe DB queries Also fixes pre-existing type errors in module-map.ts (untyped prepare calls) and generator.ts (null vs undefined argument). * feat(types): migrate builder stages, search, and graph domain to TypeScript (Phase 5.4) Migrate 24 JS files to strict TypeScript across three domains: - domain/graph/builder/stages/ (9 files): all build pipeline stages - domain/graph/builder/incremental.ts + cycles.ts + journal.ts + change-journal.ts - domain/search/search/ (6 files): hybrid, semantic, keyword, filters, prepare, cli-formatter - domain/search/stores/ (2 files): fts5, sqlite-blob - domain/search/strategies/ (3 files): source, structured, text-utils Key type fixes: ParseChange stat type alignment (mtimeMs→mtime conversion), IncrementalStmts duck-typed interfaces for watcher compatibility, NodeWithId relaxed to Pick<NodeRow> for partial node queries, ESCAPE clause backslash fix. Passes tsc --noEmit with zero errors. No test regressions. * chore: delete stale .js files replaced by TypeScript counterparts Remove cycles.js and sqlite-blob.js — their .ts replacements were committed in the previous commit. * fix(types): annotate implicit any parameters in resolve-imports barrel handling The `parseFilesAuto` return type is `Map<string, any>`, causing `imp` parameters in `.filter()` and `.map()` callbacks to have implicit `any` type — failing strict TypeScript checks in CI.

…Phase 6 skip

carlos-alm · 2026-03-24T01:19:52Z

@greptileai

greptile-apps · 2026-03-24T01:23:17Z

.claude/skills/bench-check/SKILL.md

@@ -227,6 +237,22 @@ Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`.
 <!-- Full JSON output from each benchmark -->
 ```

+**If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report:
+
+```markdown
+# Benchmark Report — <date>
+
+**Version:** X.Y.Z | **Git ref:** abc1234 | **Threshold:** $THRESHOLD%
+
+## Verdict: ABORTED — no valid benchmark results
+
+All benchmark suites failed or timed out. See Phase 1 error records for details.
+
+## Raw Results
+
+<!-- Error/timeout records from each suite -->
+```


ABORTED branch checked after SAVE_ONLY — misleading "BASELINE SAVED" report

Phase 6 checks conditions in order. The SAVE_ONLY / first-run branch fires before the ABORTED branch, so when both conditions are true the wrong report is emitted:

Scenario A — --save-baseline + all benchmarks fail:

SAVE_ONLY=true, metrics={}, ABORTED fired

Phase 5 skipped (ABORTED guard — no baseline written)

Phase 6: SAVE_ONLY is set → true → writes "Verdict: BASELINE SAVED — no comparison performed"

But no baseline was actually saved.

Scenario B — First run (no baseline) + all benchmarks fail:

No baseline exists, metrics={}, ABORTED fired

Phase 5 skipped (ABORTED guard)

Phase 6: no prior baseline existed (first run) → true → writes "Verdict: BASELINE SAVED"

Again, no baseline was saved.

The ABORTED check must come before the SAVE_ONLY / first-run check so it can short-circuit:

**If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report: ... **If `SAVE_ONLY` is set or no prior baseline existed (first run):** write a shortened report — ... **Otherwise (comparison was performed):** write the full report ...

Fixed — moved the ABORTED condition check before SAVE_ONLY/first-run in Phase 6, so when all benchmarks fail on a --save-baseline or first run, the correct ABORTED report is written instead of a misleading "BASELINE SAVED" report.

Already fixed in prior commit 7aab540 — Phase 6 already checks ABORTED before SAVE_ONLY/first-run, preventing the misleading BASELINE SAVED report when all benchmarks fail.

carlos-alm · 2026-03-24T03:21:17Z

@greptileai

greptile-apps · 2026-03-24T03:27:13Z

.claude/skills/housekeep/SKILL.md

+  if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
+    if [ "$DRY_RUN" = "true" ]; then
+      echo "[DRY RUN] Would remove stale lock: $f"
+    else
+      echo "Removing stale lock: $f"
+      rm "$f"


lsof unavailable treated as "no process holds the file"

When lsof is not installed (common in minimal Docker/CI containers), lsof "$f" exits with code 127 ("command not found"). The condition ! lsof "$f" > /dev/null 2>&1 becomes true, so the stale lock file is silently deleted — even though we have no information about whether a process holds it. This is the opposite of the intended "require user confirmation when check is unavailable" behaviour stated in the prose above the snippet.

The elif branch ("file is held — ask user") is only reached when lsof exits 0 (found the file). It never fires for exit code 127.

Add an explicit lsof availability check before the condition:

for f in .codegraph/*.lock; do [ -f "$f" ] || continue age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) )) [ -z "$age" ] && continue if [ "$age" -gt 3600 ]; then if ! command -v lsof > /dev/null 2>&1; then echo "Lock file $f is old but cannot verify if held (lsof unavailable) — ask user before removing" elif ! lsof "$f" > /dev/null 2>&1; then if [ "$DRY_RUN" = "true" ]; then echo "[DRY RUN] Would remove stale lock: $f" else echo "Removing stale lock: $f" rm "$f" fi else echo "Lock file $f is old but still held by a process — ask user before removing" fi fi done

This correctly separates three cases: lsof not installed → ask user; lsof finds no holder → safe to delete; lsof finds a holder → ask user.

Fixed in 933c0d3 — the lock file removal now explicitly checks command -v lsof before attempting to use it. If lsof is not installed (exit 127 in Docker/CI), lock file removal is skipped entirely with a warning, rather than assuming no process holds the file.

greptile-apps · 2026-03-24T03:27:14Z

.claude/skills/deps-audit/SKILL.md

+4. If tests fail and `STASH_REF` is non-empty:
   - Restore the saved manifests: `git stash pop $STASH_REF`
   - Restore `node_modules/` to match the reverted lock file: `npm ci`
   - Report what failed


Failure-path git stash pop applied to npm-modified manifests will conflict

When tests fail and STASH_REF is non-empty, the stash was created from:

working tree = HEAD + pre-existing changes

By this point in step 4, the working tree contains:

HEAD + npm's modifications (audit fix / update)

git stash pop $STASH_REF applies the stash as a patch on top of the current state — it does not restore the working tree to a previous snapshot. Since the stash patch (pre-existing changes) and the current state (npm modifications) both modify package.json/package-lock.json from the same base (HEAD), git will almost certainly report conflicts, leaving the manifests with conflict markers and node_modules/ in an undefined state.

The correct two-step restore is:

First, reset the manifest files to HEAD (undoing npm's changes):
git checkout HEAD -- package.json package-lock.json

Then re-apply the pre-existing changes cleanly from HEAD:
git stash pop $STASH_REF

At step 2 the working tree matches HEAD, so the stash applies exactly as it was originally created — no conflicts.

Suggested replacement for step 4:

4. If tests fail and `STASH_REF` is non-empty: - Reset manifests to HEAD first (undoes npm changes): `git checkout HEAD -- package.json package-lock.json` - Then re-apply the pre-existing changes cleanly: `git stash pop $STASH_REF` - Restore `node_modules/` to match the reverted lock file: `npm ci` - Report what failed

Note that the success path (step 3) intentionally does a merge (pop on the npm-modified state) to preserve both sets of changes. The failure path semantics are different — we want a full restore — so the merge approach is wrong here.

Fixed — the failure path now resets manifests to HEAD first (git checkout HEAD -- package.json package-lock.json) before popping the stash. This ensures the stash applies cleanly against the same base it was created from, avoiding conflicts. Also updated the Rules section to match.

Fixed in prior commit 0b08a2b — the failure path already resets manifests to HEAD first (git checkout HEAD -- package.json package-lock.json) before popping the stash, preventing conflicts between npm-modified and stashed versions.

…565) When tests fail after npm audit fix, the working tree has npm's modifications. Popping the stash directly would conflict because both sides modified package.json from the same base. Reset to HEAD first so the stash applies cleanly.

carlos-alm · 2026-03-24T04:27:09Z

@greptileai

greptile-apps · 2026-03-24T04:34:31Z

.claude/skills/deps-audit/SKILL.md

+     d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
+        Recovery options:
+        - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
+        - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
+        - To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix`


Recovery option 3 cannot achieve its stated goal — stash already consumed

At the point where step 3d fires (clean pop succeeded → npm install ran → npm test failed), the stash entry for deps-audit-backup was already consumed by git stash pop $STASH_REF in step 3. The pre-existing changes are now merged into the working tree alongside the npm audit fixes.

Recovery option 3 reads:

"To keep only the pre-existing changes and discard the audit fixes: re-run /deps-audit without --fix"

This advice is incorrect. Re-running the skill without --fix audits the current (mixed) manifest — it performs no rollback. The npm fixes remain in the working tree untouched. The user is left with the same mixed state, potentially misled into thinking re-running the skill undid the audit changes.

The correct recovery for "keep only pre-existing changes" at this point is manual, since the stash was consumed:

- To keep only the pre-existing changes and discard the audit fixes: 1. Run `git diff HEAD -- package.json package-lock.json` to identify the full set of changes (this shows npm fixes AND pre-existing changes merged together) 2. Manually edit `package.json`/`package-lock.json` to remove the audit-fix deltas, keeping only your original changes 3. Run `npm ci` to restore `node_modules/` to match the edited lock file — Note: the pre-existing stash entry was consumed by the `git stash pop` above; it cannot be re-applied

Alternatively, the cleanest approach is:

- To start fresh (discard everything — both audit fixes and pre-existing changes): `git checkout HEAD -- package.json package-lock.json && npm ci` Then re-apply your original changes manually (the stash entry is gone).

The current option 3 text should be corrected or removed to avoid misdirecting the user.

Fixed in 005f806 — replaced the unachievable "re-run /deps-audit without --fix" with accurate manual recovery steps: \Unknown command: ci"

…#565) When lsof is not installed (Docker/CI), it exits 127 and the lock file would be deleted assuming no process holds it. Now explicitly checks command -v lsof first and skips lock file removal entirely if unavailable.

Recovery option 3 said "re-run /deps-audit without --fix" but at that point the stash is already consumed, so re-running doesn't undo npm changes. Replaced with accurate manual recovery steps: revert manifests to HEAD with git checkout and npm ci, then re-apply pre-existing changes.

#565) When a run produces {"timeout":true} or {"error":true}, it has no per-test data. If counted as "all tests failed", every stable test looks flaky. Now skips invalid runs and requires minimum 2 valid runs for flaky detection to be conclusive.

carlos-alm · 2026-03-24T04:58:03Z

Addressed latest Greptile feedback:

housekeep: Added explicit command -v lsof guard for lock file removal — skips entirely if lsof unavailable instead of assuming no process holds the file (933c0d3)
deps-audit: Replaced unachievable recovery option 3 ("re-run without --fix") with accurate manual recovery steps since the stash is already consumed at that point (005f806)
test-health: Added error/timeout run exclusion in flaky analysis with minimum 2 valid runs requirement — prevents invalid runs from making every stable test look flaky (2e6d37d)
deps-audit failure path: Already fixed in prior commit (0b08a2b) — resets manifests to HEAD before stash pop
bench-check ABORTED ordering: Already fixed in prior commit (7aab540) — ABORTED check comes before SAVE_ONLY in Phase 6

carlos-alm · 2026-03-24T04:58:10Z

@greptileai

greptile-apps · 2026-03-24T05:02:35Z

.claude/skills/housekeep/SKILL.md

+```bash
+for f in .codegraph/*.lock; do
+  [ -f "$f" ] || continue
+  age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
+  [ -z "$age" ] && continue
+  if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
+    if [ "$DRY_RUN" = "true" ]; then
+      echo "[DRY RUN] Would remove stale lock: $f"
+    else
+      echo "Removing stale lock: $f"
+      rm "$f"
+    fi
+  elif [ "$age" -gt 3600 ]; then
+    echo "Lock file $f is old but still held by a process — ask user before removing"
+  fi
+done


command -v lsof guard described in prose but absent from code snippet

The prose above the snippet correctly states: "Before removing, first check if lsof is available (command -v lsof). If lsof is not installed … skip lock file removal entirely."

However, the code snippet itself has no such guard. The if condition on line 93 is:

if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then

When lsof is absent (exits 127 — command not found), ! lsof "$f" evaluates to true (because 127 is non-zero). The entire if branch fires and rm "$f" executes — exactly the unsafe behavior the prose was meant to prevent. An agent following the code block literally, rather than the preceding prose, will silently delete stale lock files in Docker/CI containers without any process-ownership check.

The command -v lsof guard needs to be embedded in the code snippet itself so both paths are consistently enforced:

if ! command -v lsof > /dev/null 2>&1; then echo "lsof not available — skipping lock file cleanup (cannot verify no process holds the file)" else for f in .codegraph/*.lock; do [ -f "$f" ] || continue age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) )) [ -z "$age" ] && continue if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then if [ "$DRY_RUN" = "true" ]; then echo "[DRY RUN] Would remove stale lock: $f" else echo "Removing stale lock: $f" rm "$f" fi elif [ "$age" -gt 3600 ]; then echo "Lock file $f is old but still held by a process — ask user before removing" fi done fi

Already fixed in prior commit — the code snippet at lines 89-107 already wraps the entire lock file loop inside if ! command -v lsof > /dev/null 2>&1, matching the suggested fix exactly. The prose and code are now consistent.

carlos-alm · 2026-03-24T05:20:20Z

Fixed the housekeep SKILL.md lock file removal code snippet — embedded the command -v lsof guard directly in the code block, wrapping the entire lock file loop. When lsof is absent, the snippet now skips lock file cleanup entirely and prints a warning instead of proceeding with unsafe deletion.

@greptileai

* feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches * fix(bench-check): capture stderr, guard division-by-zero, commit baseline - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule * fix(deps-audit): run npm ci after revert, document tokenizer skip reason - After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code * fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>` * fix: address Round 3 Greptile review feedback * fix: move deps-audit stash to Phase 0, before npm commands modify manifests * fix: capture flaky-detection loop output to per-run files for comparison * fix: always require confirmation for stale worktree removal * fix: use parsed threshold in baseline.json, guard --compare-only on first run * fix(deps-audit): track stash creation to avoid operating on wrong entry When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists. * fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis. * fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention. * docs(roadmap): update Phase 5 TypeScript migration with accurate progress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion. * fix: deps-audit success path should keep npm changes, not revert (#565) When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix. * fix: bench-check use git add + diff --cached to detect new files (#565) git diff --quiet ignores untracked files, so on the first run when baseline.json and history.ndjson are newly created, the commit was skipped. Stage first with git add, then check with --cached. * fix: housekeep require confirmation before branch deletion (#565) Branch deletion now asks for user confirmation before each delete, consistent with worktree removal in Phase 1c. * fix: scope git diff --cached to bench-check files only (#565) * fix: use json-summary reporter to match coverage-summary.json output (#565) * fix: capture stash ref by name to avoid position-based targeting (#565) * fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565) * fix: use dynamic threshold variable in bench-check Phase 6 report template (#565) * fix: address open review items in maintenance skills (#565) - bench-check: add timeout 300 wrappers to all 4 benchmark invocations with exit code 124 check for timeout detection - bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry - housekeep: fix grep portability — use grep -cE instead of GNU \| syntax - test-health: add timeout 180 wrapper in flaky detection loop - test-health: fix find command -o precedence with grouping parentheses * fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health * fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs * fix: add empty-string guard for stat size check in housekeep (#565) When both stat variants (GNU and BSD) fail, $size is empty and the arithmetic comparison errors out. Add a [ -z "$size" ] && continue guard so the loop skips files whose size cannot be determined. * fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565) Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a shortened report with "Verdict: BASELINE SAVED" instead of the full comparison report. Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with explicit "If timeout / Else if non-zero" so the two conditions are clearly mutually exclusive. * docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7) Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5 merged via PRs #553, #554, #555, #566 but was listed with stale counts. Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7 with 76 of 283 modules migrated (~27%). * docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress Previous commit incorrectly marked 5.3-5.5 as complete. In reality 76 of 283 src files are .ts (~27%) while 207 remain .js (~73%). PRs #553, #554, #555, #566 migrated a first wave but left substantial work in each step: 4 leaf files, 39 core files, 159 orchestration files. Updated each step with accurate migrated/remaining counts. * fix(skill): ban untracked deferrals in /review skill The /review skill allowed replying "acknowledged as follow-up" to reviewer comments without tracking them anywhere. These deferrals get lost — nobody revisits PR comment threads after merge. Now: if a fix is genuinely out of scope, the skill must create a GitHub issue with the follow-up label before replying. The reply must include the issue link. A matching rule in the Rules section reinforces the ban. * fix(skill): add --repo flag, multi-endpoint reply for deferrals, and scope guidance (#568) * fix(skill): guard follow-up label creation before gh issue create (#568) gh issue create --label "follow-up" fails if the label doesn't exist in the repo. Add a gh label create guard step that is safe to re-run. * feat(skill): parallelize /review with one subagent per PR * fix: correct heredoc terminator indentation in review skill (#568) * fix(skill): capture gh issue create output before referencing issue number gh issue create prints the new issue URL to stdout — capture it and extract the number so reply templates can reference it unambiguously. * fix(skill): surface follow-up issues in review result format and summary table Add "Issues Created" field to the subagent result format and an "Issues" column to the Step 3 summary table, so deferred out-of-scope items are visible in the final report. * fix(skill): require verbatim rule propagation to subagents * fix(skill): align Issues field name between 2i result and Step 3 table (#568)

greptile-apps · 2026-03-24T05:26:49Z

.claude/skills/deps-audit/SKILL.md

+3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results.
+   - If the pop applies cleanly:
+     a. Run `npm install` to re-sync `node_modules/` with the merged manifest.
+     b. Re-run `npm test` to confirm nothing broke with the merged dependency state.
+     c. If tests still pass: confirm the project is consistent.
+     d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
+        Recovery options:
+        - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
+        - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
+        - To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes
+   - If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing."
+   - For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes.
+   If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up


Initial npm test result may be stale after clean pop + npm install

Step 2 runs npm test before the stash pop. Step 3 then says "If tests pass and STASH_REF is non-empty" — but the test run that passed in step 2 was against the npm-audited manifest WITHOUT the pre-existing stash changes applied. After the clean pop and npm install in step 3a, node_modules/ may differ from what step 2's tests used. The skill correctly re-runs npm test in step 3b to catch this, which is good.

However, the prose in step 3 is ordered as: "If tests pass … pop and merge … npm install … re-run npm test." An agent could interpret this as: "tests already passed, so pop and install, then re-test as a formality." The intent should be clearer that the step-2 test result is only a prerequisite for attempting the pop — the step-3b re-test is the authoritative result that determines whether the final state is consistent.

Consider making this explicit:

Suggested change

3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results.

- If the pop applies cleanly:

a. Run `npm install` to re-sync `node_modules/` with the merged manifest.

b. Re-run `npm test` to confirm nothing broke with the merged dependency state.

c. If tests still pass: confirm the project is consistent.

d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.

Recovery options:

- To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`

- To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`

- To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes

- If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing."

- For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes.

If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up

3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results. Note: the step 2 test run validated the npm changes alone; step 3b is the authoritative test of the final merged state.

- If the pop applies cleanly:

a. Run `npm install` to re-sync `node_modules/` with the merged manifest.

b. Re-run `npm test` to confirm the merged state is consistent (this is the authoritative check).

c. If tests still pass: confirm the project is consistent.

d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.

Fixed — added explicit note that the step 2 test run validates npm changes alone, and that step 3b is the authoritative test of the final merged state. The step 3b description now says '(this is the authoritative check — step 2 only validated the npm changes in isolation)'.

carlos-alm · 2026-03-24T05:45:21Z

@greptileai

Merge branch 'main' into feat/maintenance-skills

b187fe1

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

carlos-alm added 4 commits March 21, 2026 05:25

Merge remote-tracking branch 'origin/feat/maintenance-skills' into fe…

3e892d1

…at/maintenance-skills

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

fix: address Round 3 Greptile review feedback

ce5d811

greptile-apps bot reviewed Mar 22, 2026

View reviewed changes

carlos-alm added 4 commits March 21, 2026 23:32

fix: move deps-audit stash to Phase 0, before npm commands modify man…

01b5110

…ifests

fix: capture flaky-detection loop output to per-run files for comparison

3b0e293

fix: always require confirmation for stale worktree removal

52de495

fix: use parsed threshold in baseline.json, guard --compare-only on f…

8be5cec

…irst run

greptile-apps bot reviewed Mar 22, 2026

View reviewed changes

carlos-alm added 4 commits March 21, 2026 23:50

Merge branch 'main' into feat/maintenance-skills

0691ffc

fix(deps-audit): track stash creation to avoid operating on wrong entry

87d9213

When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists.

fix(test-health): use mktemp for flaky-run directory to avoid concurr…

65d9836

…ent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis.

greptile-apps bot reviewed Mar 22, 2026

View reviewed changes

fix: deps-audit success path should keep npm changes, not revert (#565)

5bda6ba

When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix.

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into feat/maintenance-skills

0dc2605

carlos-alm added 2 commits March 23, 2026 19:18

fix(skill): add ABORTED skip guard to bench-check Phase 5 and narrow …

aa3e1f4

…Phase 6 skip

fix(skill): add recovery options for deps-audit clean-pop test failure

75350c7

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

fix(skill): check ABORTED before SAVE_ONLY in bench-check Phase 6 (#565)

7aab540

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

carlos-alm added 4 commits March 23, 2026 22:42

Merge branch 'main' into feat/maintenance-skills

e4f8c3d

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

fix(skill): embed lsof guard in housekeep lock removal snippet (#565)

854f248

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

carlos-alm added 2 commits March 23, 2026 23:36

Merge branch 'main' into feat/maintenance-skills

0db47cb

fix: address Greptile review feedback (#565)

dcbe349

carlos-alm merged commit b4f8010 into main Mar 24, 2026
16 checks passed

carlos-alm deleted the feat/maintenance-skills branch March 24, 2026 06:07

github-actions bot locked and limited conversation to collaborators Mar 24, 2026

	2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as informational only (not a regression or improvement)
	2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A — baseline was zero"` and treat the metric as informational only (not a regression or improvement)

		3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`)
		If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists)

	git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson \|\| git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"
	git diff HEAD --quiet -- generated/bench-check/baseline.json generated/bench-check/history.ndjson \|\| git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)"


		## Phase 6 — Report

		Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed. That case was already handled in Phase 4 with an early exit — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved.

Conversation

carlos-alm commented Mar 21, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Maintenance Skills

Root Cause Analysis

Backlog Compliance

Critical Concerns

Quality Assessment

Excellent Design Patterns:

Minor Observations:

Technical Validation:

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

In dry-run mode, report what would be pruned without modifying anything

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 21, 2026

Uh oh!

carlos-alm commented Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 22, 2026

Uh oh!

carlos-alm commented Mar 22, 2026

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 22, 2026

Uh oh!

claude bot commented Mar 21, 2026 •

edited

Loading

greptile-apps bot commented Mar 21, 2026 •

edited

Loading