Every flow lists entry point, key files, and failure modes. Authentication / login is not present — codeiq has no user-facing service; the only auth-adjacent code is detector logic that finds auth patterns in scanned codebases.
Entry point: internal/cli/index.go → analyzer.Run().
Steps:
- File discovery via
git ls-files(fallback:filepath.WalkwithDefaultExcludeDirsexcludingnode_modules,vendor,target,.git,dist,build,.gradle,.idea,__pycache__,.tox,.eggs,venv,.venv). - Extension →
parser.Languagemapping inparser.LanguageFromExtension. - Worker pool (default
2 × GOMAXPROCS, override--workers). Per-file:- Read content.
- Parse: tree-sitter for {Java, Python, TypeScript, Go}; structured parser for YAML/JSON/TOML/INI/properties; regex-only fallback otherwise.
- Iterate every
DetectorwhoseSupportedLanguages()covers the file's language. Passdetector.Context{FilePath, Language, Content, Tree, …}.
- GraphBuilder (
graph_builder.go):mergeNodeperforms confidence-aware union: donor only fills keys the survivor doesn't have, so aSpringdetector'sframework=springstamp survives a genericauthdetector's overwrite attempt.- Edges are deduped on canonical
(source_id, target_id, kind).
- Snapshot sorts nodes + edges by ID for determinism and drops "phantom edges" — edges whose endpoint isn't in the node set. Visible via
analyzer.Stats.DroppedEdges. - Cache write in batches (
--batch-size, default 500). Nodes + edges go tonodes/edgestables keyed by file content hash. JSON-serialized in thedatacolumn.
Entry point keys / important files:
internal/cli/index.go— flag parsing + analyzer wiringinternal/analyzer/analyzer.go— pipeline loopinternal/analyzer/file_discovery.go—git ls-files+ filesystem fallbackinternal/parser/parser.go— language detectioninternal/detector/detector.go—Detectorinterface +Defaultregistryinternal/analyzer/graph_builder.go— dedup + snapshotinternal/cache/cache.go— batched writes
Failure modes:
- Empty registry — detector category not blank-imported in
detectors_register.go→ 0 emissions for that language. Symptom:codeiq plugins listdoesn't show the detector. Already-bitten by this; the auto-import check is one of the most important correctness invariants. - Tree-sitter parse error — detector falls back to regex-only path; some emissions degrade in fidelity (e.g.
frameworkmay be missing). Logged at-v. - Large file — Tree-sitter has memory cost ~O(file_size); the worker pool concurrency × tree size can OOM if
--workersis too high. Default2 × GOMAXPROCSis safe up to ~50k files / 15 GiB hosts. - Cache write contention — SQLite WAL handles concurrent reads + one writer. Writes are batched on a single channel; backpressure shows up as slow
index.
Entry point: internal/cli/enrich.go → analyzer.RunEnrich(EnrichOptions).
Steps:
- Open SQLite cache read-only.
- Stream every cached node + edge into a
GraphBuilderto re-snapshot (sort). - Linkers (
internal/analyzer/linker/) — TopicLinker, EntityLinker, ModuleContainmentLinker — emit cross-file edges by name matching (e.g. aproduces topic="users.created"and aconsumes topic="users.created"get linked even though they live in different files). - LayerClassifier stamps
layer = frontend | backend | infra | shared | unknownon every node using filename heuristics + framework hints. - Intelligence layer (
internal/intelligence/extractor/):ExtractFromTreeruns once per file (tree-sitter parsed once, not per-node — Phase A OOM fix).- Surfaces
prop_lex_comment(doc comments / JSDoc / docstring text) andprop_lex_config_keys(extracted key lists from YAML/JSON config files). - Per-file goroutine pool is
2 × GOMAXPROCS-bounded (Phase A OOM fix).
- ServiceDetector (
service_detector.go) walks the FS for build files (pom.xml,package.json,go.mod,Cargo.toml,pyproject.toml,setup.py,Gemfile,composer.json,Package.swift,mix.exs,pubspec.yaml,stack.yaml,build.zig,dune-project,DESCRIPTION,BUILD,BUILD.bazel, plus.csproj/.fsproj/.vbproj/.gemspec/.cabal/.nimblesuffixes). OneSERVICEnode per module +CONTAINSedges to its child nodes. IDs are path-qualified (service:<dir>:<name>). - Kuzu BulkLoad (
internal/graph/bulk.go):- Open Kuzu writable with
BufferPoolBytescapped at 2 GiB (override--max-buffer-pool=N). - Apply schema (idempotent — single
CodeNodetable + 28 REL tables). - Write CSV staging files with
csv.Writer{Comma: '|'}. COPY <table> FROM '<csv>' (header=false, DELIM='|', QUOTE='"', ESCAPE='"')— explicit QUOTE+ESCAPE so Kuzu honors Go's RFC-4180 quoting.- Batches of 50,000 rows (override
CODEIQ_BULK_BATCH_SIZEenv).
- Open Kuzu writable with
- FTS (
internal/graph/indexes.go):INSTALL fts; LOAD EXTENSION fts;CALL DROP_FTS_INDEX('CodeNode', '<name>');(idempotent)CALL CREATE_FTS_INDEX('CodeNode', 'code_node_label_fts', ['label', 'fqn_lower']);CALL CREATE_FTS_INDEX('CodeNode', 'code_node_lexical_fts', ['prop_lex_comment', 'prop_lex_config_keys']);
Tunable knobs (CLI flags on enrich):
--memprofile=<path>— writes a Go heap profile (pprof.WriteHeapProfile). Analyze withgo tool pprof -top -inuse_space <path>.--max-buffer-pool=N— Kuzu BufferPoolSize override (bytes). Default 2 GiB.--copy-threads=N— KuzuMaxNumThreads. Defaultmin(4, GOMAXPROCS).
Failure modes:
- Duplicate primary key on COPY — historically bit on
service:<name>collisions across modules. Fixed by path-qualified IDs (#151). Symptom:Copy exception: Found duplicated primary key value service:checkout. - CSV "expected N values per row, but got more" — JSON property values containing commas (#150 added pipe delim) or pipes (#153 added explicit
QUOTE/ESCAPE). All known instances fixed. - TOML quoted-key emission —
"check_sha" = ...made it through with literal quotes in node IDs, breaking edge PK lookup. Fixed inparseTOMLviaunquote()on the key (#152). - OOM — Phase A+B+C fix landed: parse-once-per-file, bounded extractor pool, 2 GiB BufferPool cap,
Snapshot()nils dedup maps. Verified at ~/projects/-scale (49k files): peak RSS 1.8–2.2 GiB. - FTS extension missing — Kuzu 0.11.3+ bundles it;
INSTALL ftsis a no-op when bundled. Pre-0.11.3 graphs fall through to CONTAINS predicates via the fallback path.
Entry point: internal/cli/mcp.go → mcp.Server.Serve().
Steps:
- Open Kuzu read-only (
graph.OpenReadOnly(path, query_timeout)). Mutation gate is active for everys.Cypher(...)call. - Build
mcp.Deps(store + intelligence + flow + review + max-results + max-depth caps). - Register tools via 3 helper functions:
RegisterGraphUserFacing(srv, d)→run_cypher+read_file.RegisterFlow(srv, d)→generate_flow.RegisterConsolidated(srv, d)→ 6 mode-driven tools +review_changes.
- Bind transport:
mcpsdk.StdioTransport{}(zero value bindsos.Stdin/os.Stdout). Server.Serve(ctx, transport)— blocks until stdin closes or context cancels.
Tool list (10 user-facing):
| Tool | Modes / params |
|---|---|
graph_summary |
overview / categories / capabilities / provenance |
find_in_graph |
nodes / edges / text / fuzzy / by_file / by_endpoint |
inspect_node |
neighbors / ego / evidence / source |
trace_relationships |
callers / consumers / producers / dependencies / dependents / shortest_path |
analyze_impact |
blast_radius / trace / cycles / circular_deps / dead_code / dead_services / bottlenecks |
topology_view |
summary / service / service_deps / service_dependents / flow |
run_cypher |
Escape hatch — read-only Cypher. CALL QUERY_FTS_INDEX allow-listed. |
read_file |
Read source file content. Path-sandboxed to the indexed root. Full file or line range. |
generate_flow |
Architecture-flow diagrams. Views: overview / ci / deploy / runtime / auth. Formats: json / mermaid / dot / yaml. |
review_changes |
LLM-driven git-diff review via Ollama. Reads graph + shells out to git; never writes to .codeiq/. |
Key files:
internal/mcp/server.go—Server,Registry,Serve()internal/mcp/tool.go—Toolstruct +asSDKToolconversion (special-cases string returns forgenerate_flow)internal/mcp/tools_consolidated.go— 6 mode-driven toolsinternal/mcp/tools_graph.go— narrow tool builders (Go-API delegation targets) +run_cypher+read_fileinternal/graph/mutation.go—MutationKeywordregex gate
Failure modes:
run_cypherblocked — query contains CREATE/DELETE/SET/REMOVE/MERGE/DROP/FOREACH/LOAD CSV/COPY/DETACH or a non-allow-listed CALL. Surfaced as a regular tool-call error with the blocked keyword named.- Cypher binder error — Kuzu's parser surfaces "Variable n is not in scope" or "Parameter X not found in EXISTS subquery" for known binder limitations. The query layer codes around these (e.g.
properties(nodes(p), 'id')instead of list comprehension). - Path traversal in
read_file— sandboxed to the indexed root. Attempted../resolves outside the root → error envelope. - MCP arg-name mismatches — historically the 6 consolidated tools delegated with wrong arg names (PR #149 fix). Parity tests in
internal/mcp/tools_consolidated_parity_test.golock the names down.
Entry point: internal/cli/review.go → review.NewService(...).Review(ctx, ...).
Steps:
- Shell out to
git diff <base>..<head>for the diff. - Parse the diff into hunks (
internal/review/diff.go— Inference based on filename). - For each touched file path, query Kuzu for evidence:
- Nodes defined in the file
- Inbound semantic edges to those nodes (callers, depends-on)
- Build the LLM prompt: diff + evidence + review-style guidance.
- POST to Ollama
/v1/chat/completions(OpenAI-compatible). Default base URLhttp://localhost:11434. IfOLLAMA_API_KEYis set, switch to Ollama Cloud. - Parse the response into structured review JSON, or render as Markdown if
--format markdown.
Key files:
internal/review/client.go— Inference: HTTP client wrapping/v1/chat/completionsinternal/review/service.go— Inference: orchestration glueinternal/review/graphctx.go— Kuzu queries for change-context evidence
Failure modes:
- No Ollama running — connection refused on localhost:11434. Falls back to a clear error rather than hanging.
- Model unavailable —
ollama runreturns 404 for unknown model; surfaced as a clean error. - HTTP/2 SETTINGS infinite-loop CVE — the Go 1.25.10 toolchain pin includes the fix for GO-2026-4918, reachable via
review.Client.Review(per.github/workflows/go-ci.ymlcomment). - Stale graph evidence — if the diff touches files that haven't been re-indexed, evidence is partial. The review still runs; quality is operator's responsibility.
There is no centralized error-handling module. Conventions:
| Layer | Pattern |
|---|---|
| CLI subcommands | Return error from RunE. Cobra prints + sets exit code (1 for usage error, 2 for runtime). |
| Detector | Detect(ctx) *Result — nil-tolerant. Detectors return EmptyResult() on no match; never panic on malformed input. |
| Graph layer | Every s.Cypher(...) returns (rows, error). Mutation-gate rejections surface as graph: write query rejected on read-only store (blocked keyword: X). |
| MCP tool handler | Catches errors, wraps in NewErrorEnvelope(code, err, RequestID(ctx)) so the MCP protocol surface stays well-formed. |
| Logging | fmt.Fprintln(os.Stderr, ...) with verbosity controlled by root -v flag. No structured-logging library. Inference: shipping concise to stay supply-chain-clean. |
codeiq does not run background jobs. Every action is operator-driven (codeiq <cmd>). The CI perf-gate is the closest thing to a scheduled job — it runs index + enrich against testdata/fixture-multi-lang on every PR.