| name | dogfood |
|---|---|
| description | Run a full dogfooding session against a published codegraph release — install from npm, test all commands, compare engines, find bugs, and write a report |
| argument-hint | <version> |
| allowed-tools | Bash, Read, Write, Glob, Grep, Task, Edit |
You are running a comprehensive dogfooding session for codegraph v$ARGUMENTS. Your goal is to install the published package, exercise every feature, compare engines, find bugs, and produce a structured report.
Reference: Read
generated/DOGFOOD-REPORT-2.1.0.mdandgenerated/DOGFOOD_REPORT_v2.2.0.md(if present) for the format and depth expected. Match or exceed that quality.
-
Create a temporary working directory (e.g.,
/tmp/dogfood-$ARGUMENTSor a system temp). -
Run
npm init -ythere, thennpm install @optave/codegraph@$ARGUMENTS. -
Verify the install:
npx codegraph --versionshould print$ARGUMENTS. -
Verify the native binary installed. The native Rust addon is delivered as a platform-specific optional dependency. Check that the correct one exists:
ls node_modules/@optave/codegraph-*/Expected packages by platform:
- Windows x64:
@optave/codegraph-win32-x64-msvc - macOS ARM:
@optave/codegraph-darwin-arm64 - macOS x64:
@optave/codegraph-darwin-x64 - Linux x64:
@optave/codegraph-linux-x64-gnu
If the native package is missing or a different version than
$ARGUMENTS, file it as a bug — theoptionalDependenciesinpackage.jsonmay have a pinned version mismatch. Verify by checking:node -e "const p = require('@optave/codegraph/package.json'); console.log(p.optionalDependencies)"Then confirm the native engine actually loads:
npx codegraph info
This should report
engine: native. If it falls back towasm, record why. - Windows x64:
-
Record: platform, OS version, Node version, native binary package name + version, engine reported by
info. -
Do NOT rebuild the graph yet. The first phase tests commands against the codegraph source repo without a pre-existing graph.
Using the installed binary (npx codegraph from the temp dir, pointed at the codegraph source repo):
- Self-discover all commands: Run
npx codegraph --helpand extract every command and subcommand (includingregistry list|add|remove|prune). - Run each command before running
build. Record which ones:- Fail gracefully with a helpful message (PASS)
- Crash with a stack trace (BUG)
- Silently return empty/wrong results without warning (BUG)
- Then run
npx codegraph build <path-to-codegraph-repo>and record: file count, node count, edge count, time taken, engine used.
After the graph is built, exercise every command and subcommand. Discover the list from --help but use this reference to ensure thorough flag coverage:
Test each with -j/--json and -T/--no-tests where supported:
| Command | Key flags to exercise |
|---|---|
query <name> |
--depth <n>, --db <path> |
impact <file> |
|
map |
-n/--limit <number> |
stats |
|
deps <file> |
|
fn <name> |
--depth <n>, -f/--file <path>, -k/--kind <kind> |
fn-impact <name> |
--depth <n>, -f/--file, -k/--kind |
context <name> |
--depth <n>, -f/--file, -k/--kind, --no-source, --include-tests |
explain <target> |
test with both a file path and a function name |
where <name> |
also test where -f <file> for file-overview mode |
diff-impact [ref] |
--staged, test vs main, vs HEAD, and with no arg (unstaged) |
cycles |
--functions for function-level cycles |
structure [dir] |
--depth <n>, --sort cohesion|fan-in|fan-out|density|files |
hotspots |
--metric fan-in|fan-out|density|coupling, --level file|directory, -n/--limit |
| Command | Flags |
|---|---|
export |
-f dot, -f mermaid, -f json, --functions, -o <file> |
| Command | Flags |
|---|---|
models |
(no flags) |
embed [dir] |
-m minilm (use this — jina-code default requires HF auth) |
search <query> |
-n/--limit, --min-score, -k/--kind, --file <pattern>, multi-query with ; separator, --rrf-k |
| Command | Flags |
|---|---|
info |
|
--version |
|
watch [dir] |
start, verify it detects a file change, then Ctrl+C |
registry list |
-j/--json |
registry add <dir> |
-n/--name <custom> |
registry remove <name> |
|
registry prune |
--ttl <days> |
mcp |
initialize via JSON-RPC stdin, verify tool list response |
| Scenario | Expected |
|---|---|
Non-existent symbol: query nonexistent |
Graceful "No results" message |
Non-existent file: deps nonexistent.js |
Graceful "No file matching" message |
Non-existent function: fn nonexistent |
Graceful message |
structure . |
Should work (was a bug in v2.2.0 — verify fix) |
--json on every command that supports it |
Valid JSON output |
--no-tests effect: compare counts with/without |
Test file count should drop |
--kind with invalid kind |
Graceful error or ignored |
--verbose on build |
Should show per-file parsing details |
build --no-incremental |
Force full rebuild |
search with no embeddings |
Should warn, not crash |
embed then search with dimension mismatch model |
Should warn about mismatch |
Pipe output: codegraph map --json | head -1 |
Clean JSON, no status messages in stdout |
Test that incremental rebuilds, full rebuilds, and cross-feature state remain consistent. Codegraph uses three-tier change detection: journal → mtime+size → content hash.
- Incremental no-op: Run
buildagain with no file changes. It should report "Graph is up to date" and touch nothing. Verify node/edge counts are identical. - Incremental with change: Touch or slightly modify one source file, run
buildagain. Verify with--verbose:- Only the changed file is re-parsed
- Node IDs for unchanged symbols remain stable
- Edge counts are consistent
- The journal (
.codegraph/directory) tracks the change
- Force full rebuild: Run
build --no-incremental. Compare node/edge counts with the incremental result — they should match exactly. - Embed then rebuild: Run
embed --model minilm, then runbuildagain (even with no changes). After the rebuild:- Run
search "build graph"— do results still return? If embeddings reference stale node IDs, search will return 0 results - Compare embedding
node_ids against actual node IDs in the graph (use--jsonoutputs)
- Run
- Embed, modify, rebuild, search: Modify a file,
buildagain (incremental), thensearchwithout re-runningembed. This is the most likely path to stale embeddings. Record whether results are correct, empty, or wrong. - Full rebuild after incremental: Delete
.codegraph/graph.db, rebuild from scratch, then verifysearchstill works (it shouldn't — embeddings should be gone too, or the tool should warn). - Watch mode integration: Start
watch, modify a file in another terminal, verify the watcher detects the change and incrementally updates (check output). Then run a query to verify the graph reflects the change. Stop the watcher with Ctrl+C and verify graceful shutdown (journal flush). - Revert any file modifications before continuing.
- Build with
--engine wasm: record nodes, edges, time. - Build with
--engine native: record nodes, edges, time. - Compare in a table: node count, edge count, function count, call edges, call confidence (from
stats), graph quality score. - Note any significant parity gaps (>5% difference in any metric).
- Run the same set of queries with both engines and flag any result differences:
fn buildGraph— compare callers/calleescontext parseFileAuto— compare source extractioncycles --functions— compare cycle detectionstats --json— full metric comparisonhotspots --metric fan-in --json— compare rankings
Run all four benchmark scripts from the codegraph source repo (not the temp install dir) and record results. These detect performance regressions between releases.
| Benchmark | Script | What it measures | When it matters |
|---|---|---|---|
| Build | node scripts/benchmark.js |
Build speed (native vs WASM), query latency | Always |
| Incremental | node scripts/incremental-benchmark.js |
Incremental build tiers, import resolution throughput | Always |
| Query | node scripts/query-benchmark.js |
Query depth scaling, diff-impact latency | Always |
| Embedding | node scripts/embedding-benchmark.js |
Search recall (Hit@1/3/5/10) across models | Always |
- Run all four from the codegraph source repo directory.
- Record the JSON output from each.
- Compare with the previous release's numbers in
generated/BUILD-BENCHMARKS.md(build benchmark) and previous dogfood reports. - Flag any regressions:
- Build time per file >10% slower → investigate
- Query latency >2x slower → investigate
- Embedding recall (Hit@5) drops by >2% → investigate
- Incremental no-op >10ms → investigate
- Include a Performance Benchmarks section in the report with tables for each benchmark.
Note: The native engine may not be available in the dev repo (no prebuilt binary in node_modules). Record WASM results at minimum. If native is available, record both.
IMPORTANT: If your bug-fix PR touches code covered by a benchmark (builder.js, parser.js, queries.js, resolve.js, db.js, embedder.js, journal.js), you must run the relevant benchmarks before and after your changes and include the comparison in the PR description.
- Read
CHANGELOG.mdto identify what changed in v$ARGUMENTS vs the previous version. - Read
package.jsonfor the previous version tag. - For every feature added and bug fixed in this release, write a targeted test:
- Verify the feature works as described
- Verify the bug is actually fixed
- Try to break it with edge cases
- Present results in a "Release-Specific Tests" table.
Before writing the report, stop and think about:
- What testing approaches am I missing?
- Cross-command pipelines: Have I tested
build→embed→search→ modify →build→search? Have I testedwatchdetecting changes thendiff-impact? - MCP server: Have I tested the
mcpcommand? Initialize via JSON-RPC on stdin, sendtools/list, verify all 17 tools are present. Test single-repo mode (default —list_reposshould be absent, norepoparameter on tools) vs--multi-repomode. - Programmatic API: Have I tested
require('@optave/codegraph')orimportfromindex.js? Key exports to verify:buildGraph,loadConfig,openDb,findDbPath,contextData,explainData,whereData,fnDepsData,diffImpactData,statsData,isNativeAvailable,EXTENSIONS,IGNORE_DIRS,ALL_SYMBOL_KINDS,MODELS. - Config options: Have I tested
.codegraphrc.json? Create one withinclude/excludepatterns, customaliases,build.incremental: false,query.defaultDepth,search.defaultMinScore. Verify overrides work. - Env var overrides:
CODEGRAPH_LLM_PROVIDER,CODEGRAPH_LLM_API_KEY,CODEGRAPH_LLM_MODEL,CODEGRAPH_REGISTRY_PATH. - Credential resolution:
apiKeyCommandin config — does it shell out viaexecFileSynccorrectly? Test with a simpleechocommand. - Multi-repo registry flow:
registry add .,registry list,mcp --repos <name>,registry remove <name>,registry prune --ttl 0. - Concurrent usage: Two builds at once, build while watching.
- Different repo: Have I tested on a repo besides codegraph itself? Try a small open-source project.
- False positive filtering: Does
statsreport false positives? AreFALSE_POSITIVE_NAMES(run, get, set, init, main, etc.) filtered from high-caller warnings? - Symbol kinds: Test
--kindwith all valid kinds: function, method, class, interface, type, struct, enum, trait, record, module. - Database migrations: If testing an upgrade path (older graph.db → new version), do schema migrations (v1→v4) run correctly?
- What would a real user hit that I haven't simulated?
Add any additional tests you identify here and run them before writing the final report.
For each bug found during testing:
gh issue list --repo optave/codegraph --state open --search "<bug title keywords>"If a matching issue already exists, skip creating a new one. Add a comment with your findings if you have new information.
For each new bug, create a GitHub issue:
gh issue create --repo optave/codegraph \
--title "bug: <concise title>" \
--label "bug,dogfood" \
--body "$(cat <<'ISSUE_EOF'
## Found during dogfooding v$ARGUMENTS
**Severity:** <Critical | High | Medium | Low>
**Command:** `codegraph <command that failed>`
## Reproduction
\`\`\`bash
<exact commands to reproduce>
\`\`\`
## Expected behavior
<what should happen>
## Actual behavior
<what actually happens — include output/stack traces>
## Root cause
<analysis if known>
## Suggested fix
<approach if known>
ISSUE_EOF
)"Record the issue number for each bug.
For each bug you can fix in this session:
- Create a branch:
git checkout -b fix/dogfood-<short-description> main - Implement the fix.
- Run
npm testto verify no regressions. - Run
npm run lintto verify code style. - Run benchmarks before and after if your fix touches code covered by a benchmark (see Phase 4b table). Include the comparison in the PR body.
- Commit with a message referencing the issue:
The
fix(<scope>): <description> Closes #<issue-number>Closes #Nfooter tells GitHub to auto-close the issue when the PR merges. - Push and open a PR. If benchmarks were run, include them in the body:
gh pr create --base main \ --title "fix(<scope>): <description>" \ --body "$(cat <<'PR_EOF' ## Summary <what was wrong and how this fixes it> ## Found during Dogfooding v$ARGUMENTS — see #<issue-number> ## Benchmark results <before/after table if applicable — see Phase 4b> ## Test plan - [ ] <how to verify the fix> PR_EOF )"
- Return to the main working branch before continuing to the next bug.
If a bug is too complex to fix in this session, leave the issue open and note it in the report.
If the entire dogfooding session finds zero bugs, the release is validated. Update the native binary version pins in the main repository to match:
- In the codegraph repo (not the temp dir), edit
package.jsonoptionalDependenciesto pin all@optave/codegraph-*packages to$ARGUMENTS. - Run
npm installto update the lockfile. - Create a PR to update the native binary pins:
git checkout -b chore/pin-native-binaries-v$ARGUMENTS main git add package.json package-lock.json git commit -m "chore: pin native binaries to v$ARGUMENTS after clean dogfood" gh pr create --base main \ --title "chore: pin native binaries to v$ARGUMENTS" \ --body "Validated in dogfooding session — zero bugs found."
This signals that v$ARGUMENTS has been manually verified end-to-end.
- Delete the temporary directory.
- Confirm no artifacts were left behind.
Write the report to generated/DOGFOOD_REPORT_v$ARGUMENTS.md with this structure:
# Dogfooding Report: @optave/codegraph@$ARGUMENTS
**Date:** <today>
**Platform:** <OS, arch, Node version>
**Native binary:** <package name and version, or "not available">
**Active engine:** <auto-detected engine>
**Target repo:** codegraph itself (<file count> files)
---
## 1. Setup & Installation
<Install results, native binary verification, any issues>
## 2. Cold Start (Pre-Build)
<Table of commands tested without a graph>
## 3. Full Command Sweep
<Table: Command | Status | Notes>
### Edge Cases Tested
<Table: Scenario | Result>
## 4. Rebuild & Staleness
<Incremental rebuild results>
<Embed-rebuild-search pipeline results>
<Watch mode results>
## 5. Engine Comparison
<Table: Metric | Native | WASM | Delta>
<Analysis of parity gaps>
<Per-query comparison results>
## 6. Release-Specific Tests
<What changed in this version>
<Table: Feature/Fix | Test | Result>
## 7. Additional Testing
<MCP server testing results>
<Programmatic API testing results>
<Config/env var testing results>
<Multi-repo registry testing results>
<Any other tests from Phase 6 thinking space>
## 8. Bugs Found
### BUG 1: <title> (<severity>)
- **Issue:** #<number> (link)
- **PR:** #<number> (link) or "open — too complex for this session"
- **Symptoms:**
- **Root cause:**
- **Fix applied:**
## 9. Suggestions for Improvement
### 9.1 <suggestion>
### 9.2 <suggestion>
## 10. Testing Plan
### General Testing Plan (Any Release)
<Checklist of standard tests every release should pass>
### Release-Specific Testing Plan (v$ARGUMENTS)
<Focused checklist based on CHANGELOG changes>
### Proposed Additional Tests
<Tests you thought of in Phase 6 that should be added to future dogfooding>
## 11. Overall Assessment
<Summary paragraph>
<Rating: X/10 with justification>
## 12. Issues & PRs Created
| Type | Number | Title | Status |
|------|--------|-------|--------|
| Issue | #N | ... | open / closed via PR |
| PR | #N | ... | open / merged |The dogfood report must be committed to the repository — do not leave it as an untracked file.
-
If bug-fix PRs were created during Phase 7: Add the report to the first PR's branch:
git checkout <first-pr-branch> git add generated/DOGFOOD_REPORT_v$ARGUMENTS.md git commit -m "docs: add dogfood report for v$ARGUMENTS" git push
-
If no PRs were created (zero bugs / green path): Create a dedicated PR for the report:
git checkout -b docs/dogfood-report-v$ARGUMENTS main git add generated/DOGFOOD_REPORT_v$ARGUMENTS.md git commit -m "docs: add dogfood report for v$ARGUMENTS" git push -u origin docs/dogfood-report-v$ARGUMENTS gh pr create --base main \ --title "docs: add dogfood report for v$ARGUMENTS" \ --body "Dogfooding report for v$ARGUMENTS. See generated/DOGFOOD_REPORT_v$ARGUMENTS.md for full details."
-
Verify the report file appears in the PR diff before moving on.
- Be thorough but honest. Don't inflate the rating.
- If codegraph crashes or produces wrong results when analyzing itself, file it as a bug — don't work around it.
- Report the raw truth. A dogfood report that finds 0 bugs is suspicious.
- Include exact command invocations and outputs for any bugs found.
- The report should be useful to a developer who wasn't in the session.
This skill lives at .claude/skills/dogfood/SKILL.md in the codegraph repo. If during the dogfooding session you discover that this skill is missing steps, has outdated command references, or could be improved — you are encouraged to edit it. Fix inaccuracies, add missing test cases, update flag lists, and improve the phase instructions so the next dogfood run benefits from what you learned. Commit any skill improvements alongside the dogfood report.