|
| 1 | +# ADR-080: npx ruvector Deep Capability Audit |
| 2 | + |
| 3 | +**Status:** Accepted |
| 4 | +**Date:** 2026-03-03 |
| 5 | +**Author:** ruvnet |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +The `ruvector` npm package (v0.2.5) is the primary CLI and MCP entry point for the ruvector ecosystem, providing `npx ruvector` access to vector database operations, self-learning hooks, brain AGI subsystems, edge compute, and 91+ MCP tools. This ADR documents a comprehensive audit of all capabilities, coverage gaps, and security findings. |
| 10 | + |
| 11 | +## Package Overview |
| 12 | + |
| 13 | +| Field | Value | |
| 14 | +|-------|-------| |
| 15 | +| **Package** | `ruvector` on npm | |
| 16 | +| **Version** | 0.2.5 | |
| 17 | +| **CLI entry** | `bin/cli.js` (8,911 lines) | |
| 18 | +| **MCP entry** | `bin/mcp-server.js` (~3,816 lines) | |
| 19 | +| **Node.js** | >=18.0.0 | |
| 20 | +| **Dependencies** | 8 required, 1 optional, 3 peer (optional) | |
| 21 | +| **Published files** | `bin/`, `dist/`, `README.md`, `LICENSE` | |
| 22 | + |
| 23 | +## CLI Inventory |
| 24 | + |
| 25 | +### Summary |
| 26 | + |
| 27 | +- **Total commands**: ~179 registered, ~145 unique |
| 28 | +- **Command groups**: 14 main groups + standalone commands |
| 29 | +- **Lazy-loaded modules**: GNN, Attention, ora, ruvector core, pi-brain, ruvllm |
| 30 | +- **Startup time**: ~55ms (lazy loading optimization) |
| 31 | + |
| 32 | +### Command Groups (14) |
| 33 | + |
| 34 | +| Group | Subcommands | Description | |
| 35 | +|-------|-------------|-------------| |
| 36 | +| **hooks** | 55 | Self-learning intelligence hooks — routing, memory, trajectories, AST, diff, coverage, compression, learning algorithms | |
| 37 | +| **brain** | 22 | Shared intelligence — search, share, vote, sync, AGI subsystems (SONA, GWT, temporal, meta-learning, midstream) | |
| 38 | +| **workers** | 14 | Background analysis — dispatch, presets, phases, custom workers | |
| 39 | +| **rvf** | 11 | RuVector Format — create, ingest, query, derive, segments, examples, download | |
| 40 | +| **sona** | 6 | SONA adaptive learning — status, patterns, train, export | |
| 41 | +| **embed** | 5 | Embeddings — text, adaptive LoRA, ONNX, neural, benchmark | |
| 42 | +| **attention** | 5 | Attention mechanisms — compute, benchmark, hyperbolic, list | |
| 43 | +| **edge** | 5 | Distributed P2P compute — status, join, balance, tasks, dashboard | |
| 44 | +| **native** | 4 | Native ONNX/VectorDB workers — run, benchmark, list, compare | |
| 45 | +| **mcp** | 4 | MCP server — start, info, tools, test | |
| 46 | +| **gnn** | 4 | Graph Neural Networks — layer, compress, search, info | |
| 47 | +| **identity** | 4 | Pi key management — generate, show, export, import | |
| 48 | +| **llm** | 4 | LLM embeddings/inference via ruvllm | |
| 49 | +| **midstream** | 4 | Real-time streaming — status, attractor, scheduler, benchmark | |
| 50 | +| **route** | 3 | Semantic routing — classify, benchmark, info | |
| 51 | + |
| 52 | +### Standalone Commands (15) |
| 53 | + |
| 54 | +`create`, `insert`, `search`, `stats`, `benchmark`, `info`, `install`, `graph`, `router`, `server`, `cluster`, `export`, `import`, `doctor`, `setup` |
| 55 | + |
| 56 | +### Stub/Coming-Soon Commands (4) |
| 57 | + |
| 58 | +| Command | Status | Note | |
| 59 | +|---------|--------|------| |
| 60 | +| `router` | Coming Soon | npm package in development | |
| 61 | +| `server` | Coming Soon | HTTP/gRPC server planned | |
| 62 | +| `cluster` | Coming Soon | Distributed cluster planned | |
| 63 | +| `graph` | Requires @ruvector/graph-node | Optional package not installed by default | |
| 64 | + |
| 65 | +### External API Commands |
| 66 | + |
| 67 | +| Commands | Service | URL | |
| 68 | +|----------|---------|-----| |
| 69 | +| `brain *` (16 commands) | pi.ruv.io | `https://pi.ruv.io` | |
| 70 | +| `brain agi *` (6 commands) | pi.ruv.io AGI endpoints | `/v1/sona`, `/v1/temporal`, `/v1/explore`, `/v1/midstream` | |
| 71 | +| `edge *` (5 commands) | Edge genesis node | Cloud Run endpoint | |
| 72 | +| `midstream attractor` | pi.ruv.io | `/v1/midstream` | |
| 73 | +| `rvf download` | GCS + GitHub | Storage + raw GitHub | |
| 74 | + |
| 75 | +## MCP Server Inventory |
| 76 | + |
| 77 | +### Summary |
| 78 | + |
| 79 | +- **Total tools**: 91 (base) + 12 (AGI/midstream) = 103 registered inputSchemas |
| 80 | +- **Transport modes**: stdio (default), SSE (HTTP) |
| 81 | +- **Version**: 0.2.5 (hardcoded in 2 locations) |
| 82 | + |
| 83 | +### Tool Groups (9) |
| 84 | + |
| 85 | +| Group | Tools | Description | |
| 86 | +|-------|-------|-------------| |
| 87 | +| **hooks** | 49 | Intelligence, memory, routing, learning, compression, AST, diff, coverage, security, RAG | |
| 88 | +| **workers** | 12 | Background analysis dispatch, presets, phases, custom workers | |
| 89 | +| **rvf** | 10 | Vector store CRUD, compact, derive, segments, examples | |
| 90 | +| **brain** | 11 | Shared knowledge search, share, vote, sync, partition, transfer | |
| 91 | +| **brain_agi** | 6 | AGI diagnostics — SONA, temporal, explore, midstream, flags | |
| 92 | +| **midstream** | 6 | Real-time analysis — status, attractor, scheduler, benchmark, search, health | |
| 93 | +| **edge** | 4 | Distributed compute — status, join, balance, tasks | |
| 94 | +| **rvlite** | 3 | SQL/Cypher/SPARQL query engines over vector data | |
| 95 | +| **identity** | 2 | Pi key generation and display | |
| 96 | + |
| 97 | +### Stub Tools (~6 of 91, ~7%) |
| 98 | + |
| 99 | +`hooks_attention_info`, `hooks_gnn_info`, `workers_triggers`, `workers_presets`, `workers_phases` — return hardcoded fallback data when packages unavailable. Brain AGI tools require external service. |
| 100 | + |
| 101 | +### Functional Tools (~85 of 91, ~93%) |
| 102 | + |
| 103 | +All hooks intelligence, RVF CRUD, brain services, edge network, identity crypto, worker dispatch, and query engine tools have real implementations. |
| 104 | + |
| 105 | +## Security Findings |
| 106 | + |
| 107 | +### Strong Defenses |
| 108 | + |
| 109 | +| Defense | Coverage | |
| 110 | +|---------|----------| |
| 111 | +| **Path validation** (`validateRvfPath()`) | All RVF tools — null byte check, realpath resolution, CWD confinement, blocked system paths | |
| 112 | +| **Shell sanitization** (`sanitizeShellArg()`) | All hooks/workers using execSync — removes metacharacters, backticks, `$()`, pipes, semicolons | |
| 113 | +| **Numeric validation** (`sanitizeNumericArg()`) | Hooks/workers with numeric args — parseInt with NaN fallback | |
| 114 | +| **Null byte defense** | Both path and shell sanitizers strip `\0` | |
| 115 | +| **Chalk ESM fix** | Consistent `_chalk.default \|\| _chalk` pattern at line 7-8 | |
| 116 | + |
| 117 | +### Concerns (10 findings) |
| 118 | + |
| 119 | +| # | Finding | Severity | Location | |
| 120 | +|---|---------|----------|----------| |
| 121 | +| 1 | execSync with shell invocation despite sanitization | Medium | hooks_init, hooks_pretrain, analysis tools | |
| 122 | +| 2 | Intelligence data load/save paths not validated by `validateRvfPath()` | Medium | mcp-server.js lines 171-191 | |
| 123 | +| 3 | No fetch timeout on brain/edge/midstream API calls | Medium | Could hang/DoS | |
| 124 | +| 4 | No rate limiting on external API calls | Medium | Brain, edge, midstream tools | |
| 125 | +| 5 | Environment variable values used unsanitized in fetch/crypto | Medium | BRAIN_URL, PI, EDGE_GENESIS_URL | |
| 126 | +| 6 | Pi key prefix logged in responses | High | identity_show, mcp-server.js line 3555 | |
| 127 | +| 7 | No limits on vector dimensions or query result sizes | Medium | rvf_create, rvf_query, rvlite_sql | |
| 128 | +| 8 | 51% of MCP tools lack input validation | Medium | hooks_remember, hooks_recall, brain tools | |
| 129 | +| 9 | workers_dispatch returns `success: true` on error | Low | mcp-server.js line 2730 | |
| 130 | +| 10 | Inconsistent `isError` flag usage across tools | Low | Error response formatting | |
| 131 | + |
| 132 | +## Test Coverage Analysis |
| 133 | + |
| 134 | +### Test Suite |
| 135 | + |
| 136 | +| File | Tests | Quality | |
| 137 | +|------|-------|---------| |
| 138 | +| `test/cli-commands.js` | 63 active + 6 dynamic | Mixed — many help-only | |
| 139 | +| `test/integration.js` | 6 test groups | Good — module, types, structure | |
| 140 | +| `test/benchmark-cli.js` | 7 benchmark commands | Good — latency + lazy loading | |
| 141 | + |
| 142 | +### Coverage Matrix |
| 143 | + |
| 144 | +| Capability | CLI Test | Integration Test | Benchmark | |
| 145 | +|-----------|----------|-----------------|-----------| |
| 146 | +| create/insert/search/stats | **None** | **None** | **None** | |
| 147 | +| GNN operations | Help only | No | No | |
| 148 | +| Attention operations | Help only | No | No | |
| 149 | +| Hooks routing/memory | Basic | No | No | |
| 150 | +| Brain AGI commands | Help only | No | No | |
| 151 | +| Midstream commands | Help only | No | No | |
| 152 | +| Module loading | No | Yes | No | |
| 153 | +| Type definitions | No | Yes | No | |
| 154 | +| MCP tool count | No | Yes (103) | No | |
| 155 | +| CLI startup latency | No | No | Yes (<100ms budget) | |
| 156 | +| Lazy loading overhead | No | No | Yes | |
| 157 | + |
| 158 | +### Critical Gaps |
| 159 | + |
| 160 | +1. **No functional database tests** — `create`, `insert`, `search`, `stats` are the primary documented use case but have zero test coverage |
| 161 | +2. **Performance claims unvalidated** — "sub-millisecond queries", "52,000 inserts/sec", "150x HNSW speedup" have no benchmarks |
| 162 | +3. **MCP tool functionality untested** — only tool count validated, not individual tool behavior |
| 163 | +4. **Brain AGI connectivity untested** — commands only tested for `--help` output |
| 164 | + |
| 165 | +## Code Quality |
| 166 | + |
| 167 | +### Strengths |
| 168 | + |
| 169 | +- Well-organized 14-group command hierarchy |
| 170 | +- Consistent lazy-loading pattern (GNN, Attention, ora, ruvector core) |
| 171 | +- Graceful degradation when optional packages missing |
| 172 | +- Version sourced from package.json (not hardcoded in cli.js) |
| 173 | +- Comprehensive hooks system (55 subcommands covering full dev lifecycle) |
| 174 | +- RVF path validation is thorough |
| 175 | + |
| 176 | +### Issues |
| 177 | + |
| 178 | +| # | Issue | Severity | Location | |
| 179 | +|---|-------|----------|----------| |
| 180 | +| 1 | Dead code in router command (unreachable block) | Low | cli.js line 1807 | |
| 181 | +| 2 | brain page/node actions return "not yet available" | Low | cli.js lines 8120-8180 | |
| 182 | +| 3 | Uninitialized variables in conditional blocks | Low | cli.js lines 4757, 4769 | |
| 183 | +| 4 | Error suppression in brain/edge catch blocks | Low | cli.js lines 7907-7908 | |
| 184 | + |
| 185 | +## Decision |
| 186 | + |
| 187 | +Document findings and prioritize fixes: |
| 188 | + |
| 189 | +### P0 — Security (address before next publish) |
| 190 | +- Add fetch timeout (30s) to all external API calls (brain, edge, midstream) |
| 191 | +- Stop logging Pi key prefix in identity_show responses |
| 192 | +- Add `validateRvfPath()` to intelligence data load/save paths |
| 193 | + |
| 194 | +### P1 — Test Coverage (next sprint) |
| 195 | +- Add functional tests for `create`, `insert`, `search`, `stats` commands |
| 196 | +- Add MCP tool functional tests (at least one per group) |
| 197 | +- Add connectivity test for brain AGI endpoints (mock or live) |
| 198 | + |
| 199 | +### P2 — Code Quality (backlog) |
| 200 | +- Remove dead code in router command |
| 201 | +- Add input validation to remaining 51% of MCP tools |
| 202 | +- Add resource limits (max dimensions, max result count) |
| 203 | +- Fix workers_dispatch error reporting |
| 204 | + |
| 205 | +### P3 — Documentation (backlog) |
| 206 | +- Add performance benchmarks to validate README claims |
| 207 | +- Mark stub commands more clearly in README |
| 208 | +- Document external service dependencies and fallback behavior |
| 209 | + |
| 210 | +## Consequences |
| 211 | + |
| 212 | +- Full visibility into the 145-command, 91-tool npm package surface area |
| 213 | +- 10 security findings documented with severity and fix priority |
| 214 | +- Test coverage gaps identified — core database operations completely untested |
| 215 | +- Clear prioritized action plan for hardening before next publish |
| 216 | + |
| 217 | +## Appendix: Dependency Tree |
| 218 | + |
| 219 | +### Required |
| 220 | +``` |
| 221 | +@modelcontextprotocol/sdk ^1.0.0 |
| 222 | +@ruvector/attention ^0.1.3 |
| 223 | +@ruvector/core ^0.1.25 |
| 224 | +@ruvector/gnn ^0.1.22 |
| 225 | +@ruvector/sona ^0.1.4 |
| 226 | +chalk ^4.1.2 (CJS compat via .default || fallback) |
| 227 | +commander ^11.1.0 |
| 228 | +ora ^5.4.1 (lazy loaded) |
| 229 | +``` |
| 230 | + |
| 231 | +### Optional |
| 232 | +``` |
| 233 | +@ruvector/rvf ^0.1.0 |
| 234 | +``` |
| 235 | + |
| 236 | +### Peer (all optional) |
| 237 | +``` |
| 238 | +@ruvector/pi-brain >=0.1.0 (brain commands) |
| 239 | +@ruvector/ruvllm >=2.0.0 (llm commands) |
| 240 | +@ruvector/router >=0.1.0 (router command, not yet published) |
| 241 | +``` |
| 242 | + |
| 243 | +### External Services |
| 244 | +``` |
| 245 | +https://pi.ruv.io — Brain AGI, midstream (Cloud Run) |
| 246 | +edge-net-genesis (Cloud Run) — Edge compute network |
| 247 | +storage.googleapis.com — RVF examples |
| 248 | +raw.githubusercontent.com — RVF manifest fallback |
| 249 | +``` |
0 commit comments