Skip to content

test(life_capture): recall@k + MRR aggregator on retrieval eval#950

Open
jwalin-shah wants to merge 41 commits intotinyhumansai:mainfrom
jwalin-shah:followup/recall-eval-metrics
Open

test(life_capture): recall@k + MRR aggregator on retrieval eval#950
jwalin-shah wants to merge 41 commits intotinyhumansai:mainfrom
jwalin-shah:followup/recall-eval-metrics

Conversation

@jwalin-shah
Copy link
Copy Markdown
Contributor

@jwalin-shah jwalin-shah commented Apr 26, 2026

Summary

Stacks on #817. Adds graded relevance labels + recall@k/MRR aggregator on top of the eval scaffolding that #817 introduced.

  • Labels the 10 in-scope queries in tests/fixtures/life_capture/corpus.json with relevant: [{ext_id, grade}] arrays. (q-src-01 stays pending, q-neg-01 is purely negative-assertion.)
  • Adds score_query + print_metrics_summary to tests/life_capture_retrieval_eval.rs. Per-query + mean recall@1/3/5 + MRR, plus per-kind breakdown.
  • Existing assert_* smoke checks unchanged.

Baseline (from this run)

                R@1   R@3   R@5   MRR
MEAN           0.77  0.91  0.93  1.00
keyword        0.75  0.88  0.88  1.00
semantic       0.57  0.87  0.93  1.00
mixed          1.00  1.00  1.00  1.00

Numbers are deterministic (toy embedder is feature-hashing, no model). Now any retrieval change has to move them — regressions become a CI signal.

Note on diff size

This PR currently shows ~all of #817's commits because it's stacked. Once #817 merges to main, this PR collapses to just the 2 files actually changed here.

Test plan

  • cargo test --test life_capture_retrieval_eval -- --nocapture — 4 passed, metrics print
  • CI green

Summary by CodeRabbit

  • New Features

    • Added persistent memory management with read, add, replace, and remove operations.
    • Added personal data indexing and search with semantic search capabilities powered by embeddings.
    • Enhanced installer validation with smoke testing for asset resolution.
  • Bug Fixes

    • Improved installer asset resolver with better error handling and reachability verification.
  • Documentation

    • Added architectural specifications and implementation roadmaps for new features.
  • Tests

    • Expanded test coverage across new functionality and existing agent systems.

Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/
iMessage into a local personal_index.db, with a Today brief and retrieval-
tuned chat as the two surfaces. Three composable privacy modes (Convenience /
Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship
pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from
~/projects/inbox.
Two implementation plans drafted from the life-capture spec:

- Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs
  tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds.
- Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db,
  Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread
  stripping, hybrid retrieval (vector + keyword + recency), controller
  schema + RPC. End-to-end test with synthetic items. No ingestion or UI
  yet — those are subsequent milestone plans.
Refactors scripts/install.sh to expose resolve_asset_url and
verify_asset_reachable. Adds scripts/test_install.sh that exercises
the resolver against a committed fixture latest.json. Failures now
report the resolved URL and the parse error instead of dying silently.
Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an
agent-writable, char-bounded curated memory file pair. F15 builds the
char-bounded store with atomic writes; F16 wires a session-start
snapshot into the system prompt (preserves prefix cache); F17 exposes
memory.{add,replace,remove,read} through the controller dispatch so
both the agent loop and skills can use it.

Sits between TinyHumans synthesis (volatile) and personal_index.db
(raw events) — fills the "deliberately curated facts" gap that
neither covered.
Adds src/openhuman/life_capture/ module tree (stubs for embedder,
index, migrations, quote_strip, redact, rpc, schemas, types, plus
tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs.

Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing
rusqlite/regex/once_cell/async-trait/tempfile already in tree
(Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).
…egex

Previously the country-code group required at least one digit (\+?\d{1,3}),
so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the
whole prefix in a non-capturing optional group so parenthesized area codes
match without a leading country code.
… table

Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every
new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite
Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets
WAL+foreign_keys, open_in_memory() is for tests.

vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.
…_id) dedupe

Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS
stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has
no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in
spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>.

Exposes ensure_vec_extension_registered() crate-wide so the migrations test
can register vec0 before opening its in-memory connection.
- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …),
  plus the user:local ACL EXISTS clause so the same query shape works for the
  multi-token team v2 ACL without rewrites.
- vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement)
  and ORDER BY distance; score = 1/(1 + distance) so callers can blend it
  with keyword bm25 on the same monotonic scale.

Shared ItemRow + into_hit() helper lets both queries reuse the same row
shape; rusqlite query_map closures hand-build it because rusqlite has no
sqlx FromRow equivalent.
Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs,
normalises each independently to [0,1], then re-ranks with
0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30).

Documents present in only one leg get 0 for the missing signal but still
compete on the others; only items with neither signal are dropped. Same-vector
twin items now break their tie by recency.
…rch)

Adds two controllers exposed via the registry:
- life_capture.get_stats — total items, per-source counts, last-ingest ts.
- life_capture.search    — hybrid (vector + keyword + recency) search with
  embed-then-rank, optional k (default 10, capped at 100).

Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime
behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>)
and have no per-call context object. F14 will call runtime::init() at app
startup; until then handlers return a structured 'not initialised' error so
the failure mode is loud, not silent.

Schemas registered in core/all.rs alongside cron.
… → search)

FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector
so the same text round-trips identically through vec0 — keeps the e2e test
hermetic, no network call. Verifies that:
- quote-strip drops the 'On … wrote:' block before indexing,
- redact masks the email address before indexing,
- the upserted item resurfaces via hybrid_search with both transformations
  preserved on the returned text.
After Config::load_or_init succeeds, open the personal_index.db at
config.workspace_dir and register the life_capture runtime. Embedder is
env-gated:
  OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY
  OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1)
  OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small)

If no key is set we still open the index file (so Plan #2's ingestion
worker can write to it) but skip runtime registration — controllers then
return the 'not initialised' error from runtime::get() instead of silently
calling a misconfigured embedder.

Config-schema integration deferred to a follow-up; env-driven keeps F14
non-invasive while we land the rest of the foundation.
…ection

snapshot_pair(memory, user) returns a MemorySnapshot containing plain
strings — no reference back to the stores — so taking a snapshot at session
start and reusing it across turns gives a stable system-prompt prefix and
lets the LLM prefix-cache hit on every subsequent turn.

Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader,
but openhuman assembles agent prompts per-agent under
src/openhuman/agent/agents/*/prompt.rs (not the plan's stale
src-tauri/src/commands/chat.rs path); the prompt-builder integration is
deferred to Plan #2 alongside the agent-context refactor.
… controllers

Adds four controllers under the memory_curated namespace (the 'memory'
namespace is already owned by the long-term memory subsystem):
- memory_curated.read    — read MEMORY.md or USER.md
- memory_curated.add     — append entry, char-bounded
- memory_curated.replace — substring replace, char-bounded
- memory_curated.remove  — drop entries containing a needle

'file' input is a typed enum (memory|user) so adapters reject anything
else at validation time. Runtime state lives in curated_memory::runtime
behind a tokio OnceCell, mirroring life_capture's pattern: startup in
jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes'
char limits (memory: 2200, user: 1375) and registers the runtime.
Handlers return 'not initialised' until init runs, so failure is loud.
…ings

Findings from Codex + Gemini second-opinion review:

1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On
   (source, external_id) conflict the row's id was kept but the caller's
   fresh UUID was used for upsert_vector — the vector wrote under an id no
   row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same
   transaction; mutates caller's Item.id to the canonical id (so the next
   upsert_vector lands on the right row) and orphan-deletes any vector
   already written under the wrong id. Signature change: upsert(&[Item])
   -> upsert(&mut [Item]).

2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a
   failed INSERT permanently lost the item's vector. Now wrapped in tx.

3. Runtime over-gated get_stats on the embedder (Codex). Split runtime
   into separate INDEX + EMBEDDER OnceCells: get_stats only requires
   index, search requires both. Index initialises whenever the workspace
   dir is reachable; embedder is env-gated as before.

4. Startup race (both reviewers): runtime init lived inside the post-
   serve tokio::spawn block, so axum::serve was already accepting before
   the OnceCells were set. Hoisted both bootstraps (curated_memory and
   life_capture) into helpers called inline before axum::serve.

5. runtime::get error message lied (Codex) — said 'set embeddings.api_key
   in config' but startup actually reads env vars. Fixed text.

Bonus: rename controller namespace memory_curated -> curated_memory
(Codex preference; nothing depends on it yet so renamed before clients
do).

Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable).
4696 lib tests pass.
…emory namespace rename

Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.
Critical:
- curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`,
  which could rewrite MEMORY.md / USER.md from an empty baseline on transient
  I/O or permission failures. Reject empty needle in replace/remove.
- curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC
  boundary (remove with "" deletes every entry).
- life_capture/index.rs: upsert_vector rejects orphan ids (items row missing),
  which would have inserted vectors that never join in vector_search.
- life_capture/embedder.rs: validate response length matches input, indices
  are contiguous 0..n, and every vector matches dim() — prevents silent
  misalignment or wrong-dimensional vectors flowing into the 1536-wide
  sqlite-vec table. Added 30s request + 10s connect timeouts.

Retrieval hardening:
- life_capture/index.rs: keyword_search now escapes FTS5 operators by
  tokenizing on whitespace and wrapping each token as a quoted literal.
  Prevents errors or unintended matches from stray quotes, AND/OR/NEAR,
  column specifiers in user input.
- life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until
  via post-filter (single consistent pass across keyword+vector fusion).
- life_capture/rpc.rs: handle_search validates embedder dim and response
  length against the fixed 1536-wide index column — clear RPC error instead
  of a cryptic sqlite-vec failure when the embeddings model is swapped.

Docs:
- docs/event-bus.md: remove erroneous `.await` on register_native_global
  (it is sync; the handler closure is async).
- life-capture design spec: soften PII redaction claim — only regex is
  implemented today; light NER is flagged as a future enhancement.
Rendered prompts now surface runtime curated-memory writes:

- `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`.
- `UserFilesSection` prefers the snapshot over the workspace-file
  loader when one is attached, and injects `USER.md` alongside
  `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper.
- `Session` carries `Option<Arc<MemorySnapshot>>`, populated by
  `ensure_curated_snapshot` on the first turn from
  `curated_memory::runtime::get()`. Reused across turns so prompt
  bytes stay frozen (KV-cache prefix contract) while mid-session
  `curated_memory.add/replace/remove` writes land on the NEXT session.
- `ParentExecutionContext` inherits the snapshot so sub-agents render
  identical `MEMORY.md`/`USER.md` blocks as the parent.
- Legacy workspace-file fallback preserved for embeds that don't
  initialise the curated-memory runtime (pure unit tests).
Loads a 32-item corpus across gmail/calendar/imessage/slack into an
in-memory PersonalIndex and runs 12 queries through keyword / vector /
hybrid paths, asserting must_contain / must_not_contain within per-query
top-K prefixes.

The vector leg uses a deterministic FNV-1a feature-hashing embedder
(1536 dims, L2-normalized) so the test is offline and reproducible; real
embedder swaps behind one call. Fixture reserves `relevant` and
`pending` fields so recall@k / MRR / nDCG bolt onto the same JSON later.

q-src-01 is marked pending: hybrid_search currently ignores
Query.sources/since/until. Flip to false once filtering lands.
Previously every read grabbed the same `Arc<Mutex<Connection>>` as the
writer, so `IndexReader` calls serialised behind in-flight ingests even
though WAL would allow parallel readers. Split `PersonalIndex` into:

- `writer: Arc<Mutex<Connection>>` — unchanged single-writer model.
- `reader_pool: Option<Arc<r2d2::Pool<SqliteConnectionManager>>>` —
  four-connection read pool on file-backed indexes, built alongside the
  writer after migrations run. Each pooled connection gets
  `query_only=ON` as a belt-and-suspenders guard; sqlite-vec is loaded
  automatically via the process-wide auto-extension.

`IndexReader` routes `keyword_search` / `vector_search` through a new
`with_read_conn` helper that picks the pool when present and falls
back to the writer lock otherwise. In-memory handles keep the
single-connection layout — shared-cache URIs buy nothing at
test-fixture scale and would force every test to grow a unique name.
jwalinsshah and others added 11 commits April 23, 2026 01:11
…ndings

- Move bootstrap_curated_memory + bootstrap_life_capture out of
  src/core/jsonrpc.rs into their respective domain runtime modules
  (curated_memory::runtime::bootstrap, life_capture::runtime::bootstrap).
  jsonrpc.rs now calls the thin orchestration entrypoints instead of
  owning domain construction details, per CLAUDE.md controller-registry
  separation rule. (addresses @coderabbitai on src/core/jsonrpc.rs:670)

- Add tracing debug! at entry/exit of embed_batch in HostedEmbedder,
  logging text count, URL, model, and returned vector count.
  (addresses @coderabbitai on src/openhuman/life_capture/embedder.rs:68)

- Add tracing debug!/trace! to keyword_search, vector_search, and
  hybrid_search: entry logs query_len/k/vec_dim, exit logs hit count,
  and hybrid_search emits trace! per-hit fused score components (kw_rrf,
  vc_rrf, recency, final score) without logging raw query text.
  (addresses @coderabbitai on src/openhuman/life_capture/index.rs:624)

- Add tracing debug!/warn! to handle_get_stats and handle_search:
  entry/exit debug! with text_len, k, total_items, hit count; warn! on
  all error branches (dim mismatch, embed_batch failure, empty vec,
  hybrid_search failure, task panic).
  (addresses @coderabbitai on src/openhuman/life_capture/rpc.rs:117)
…CodeRabbit items

- src/core/all.rs: add namespace_description() match arms for
  life_capture and curated_memory namespaces (item 1)

- docs/superpowers/specs/2026-04-22-life-capture-layer-design.md:
  add 'text' language tag to fenced code block to fix MD040 (item 2)

- AGENTS.md: fix cargo test example to use -p openhuman_core (the
  actual library crate name) instead of -p openhuman (item 3)

- src/openhuman/life_capture/redact.rs: tighten PHONE regex to require
  a mandatory separator after the area code when no country code is
  present, preventing 8-digit invoice/order IDs from being redacted;
  add test asserting 8-digit IDs are NOT redacted (item 4)

- src/openhuman/life_capture/migrations/0001_init.sql: drop unused
  text_keyword column — FTS triggers only populate subject+text and
  nothing writes to text_keyword, so the column is dead weight (item 5)

- scripts/install.sh: move SOURCE_ONLY early-exit guard to immediately
  after the flag is parsed (before variable definitions, color setup,
  function definitions, and arg-parsing loop) so sourcing with
  --source-only is a true no-op with no side effects (item 8)

Items 6 (store.rs empty needle + error propagation), 7 (embedder
timeout + logging + validation), 9 (upsert_vector orphan check), 11
(keyword_search FTS injection), 12 (hybrid_search filters), and 13
(rpc.rs dimension check) were already addressed in prior commits.

https://claude.ai/code/session_01BD65GFgzQtMcdemXNeBjwv
… 10)

ItemRow::into_hit previously coerced malformed DB rows to silent
defaults (uuid::Uuid::nil(), Source::Gmail, chrono default timestamp),
masking data corruption. Change the return type to
anyhow::Result<Hit> and propagate errors:

- Unknown source string -> Err with context showing the bad value
- Malformed UUID in items.id -> Err with context
- Out-of-range unix timestamp -> Err with message

Both callsites (keyword_search, vector_search) now collect with
.collect::<anyhow::Result<Vec<_>>>()? so a single bad row surfaces
as an RPC error rather than a silently wrong Hit.

Items 9 (upsert_vector orphan guard), 11 (FTS injection via
fts5_quote), 12 (hybrid_search filter application), and 13
(rpc.rs dimension-mismatch check) were already implemented in
prior commits on this branch.

https://claude.ai/code/session_01BD65GFgzQtMcdemXNeBjwv
cargo fmt pedantry on life_capture/{index,redact}.rs (no behavior change).

Move install.sh --source-only early-exit back to *after* function
definitions. The earlier relocation (b0048c2) returned before
resolve_asset_url was defined, breaking scripts/test_install.sh on the
macOS smoke job with "command not found".
Transport file shouldn't own domain-specific construction details. Adds
core::all::bootstrap_controller_runtimes(ws) that holds the domain list
(curated_memory + life_capture today) and is called once from jsonrpc.rs
before axum::serve. Ordering invariant preserved.

Addresses CodeRabbit PR tinyhumansai#817 src/core/jsonrpc.rs:670.
Rust's `regex` crate doesn't support look-around; clippy's
`invalid_regex` deny-lint blocked the build.

Mandatory separators in the phone pattern already guard against
mid-number matches (e.g. 8-digit invoice IDs), and `redact()`
runs CC before PHONE so long digit runs are replaced first.
- curated_memory/store: seed new store from legacy root-level MEMORY.md on first
  upgrade, truncated to char_limit. Prevents user memory loss across the file
  layout change.
- life_capture/index: skip keyword_search early when sanitized query is empty
  to avoid SQLite empty-MATCH errors / accidental full scans.
- life_capture/rpc: wrap search response as {"hits": [...]} for forward-compat
  with paging/metadata fields.
- install.sh: verify asset URL is reachable (with DRY_RUN respected) before
  attempting download so failures surface earlier.
- test_install.sh: stop swallowing source-only error output.
…ation)

Resolves conflicts after main absorbed:
  - tinyhumansai#835 inline-test extraction to sibling _tests.rs files
  - tinyhumansai#803 memory namespaces (Memory trait gained `namespace` arg + `namespace_summaries`)
  - tinyhumansai#839 memory_tree retrieval RPC handlers
  - install.sh resolver / smoke test changes

Conflict resolutions:
  - src/openhuman/agent/prompts/mod.rs + mod_tests.rs:
      kept branch's snapshot_pair production code (UserFilesSection now prefers
      session-frozen MemorySnapshot over workspace files), then applied tinyhumansai#835's
      extraction by moving the inline mod tests block to mod_tests.rs sibling.
      Tests retain the `curated_snapshot: None` additions from 1976ba4.
  - src/openhuman/agent/harness/subagent_runner/ops.rs + ops_tests.rs: same
      pattern — kept branch's session-snapshot wiring, extracted inline tests
      to ops_tests.rs sibling.
  - src/core/all.rs: kept both webview_apis (main) and life_capture +
      curated_memory (branch) controller registrations.
  - scripts/install.sh: kept branch's clearer "Resolved URL was empty…" diag
      and the test_install.sh resolver-unit-test job in installer-smoke.yml.

Compatibility patches:
  - ops_tests.rs NoopMemory updated to new Memory trait (namespace arg on
    store/get/list/forget, RecallOpts on recall, added namespace_summaries).
  - help/prompt.rs test fixture gets curated_snapshot: None.

cargo build --tests passes locally on darwin-aarch64.
Adds graded relevance labels to the 10 in-scope queries in
`tests/fixtures/life_capture/corpus.json` (q-src-01 stays pending,
q-neg-01 has no relevant items by design) and computes recall@1/3/5
+ MRR per query, mean over the suite, and per-kind breakdowns.

Output runs under `--nocapture`:

    [retrieval_eval] per-query metrics (10q):
      id           kind        R@1    R@3    R@5    MRR
      q-kw-01      keyword    0.50   0.50   0.50   1.00
      ...
      MEAN                    0.77   0.91   0.93   1.00
      mean-keyword            0.75   0.88   0.88   1.00
      mean-semantic           0.57   0.87   0.93   1.00
      mean-mixed              1.00   1.00   1.00   1.00

This gives us a number that any retrieval change has to move. The
fixture already reserved a `relevant: [{ext_id, grade}]` field, so
no schema change — grades are stored but not yet used (binary
relevance only). Switching to nDCG later requires no fixture rewrite.

`assert_*` smoke checks are unchanged. Empty `relevant` => the query
is silently excluded from recall/MRR aggregation.
@jwalin-shah jwalin-shah requested a review from a team April 26, 2026 19:43
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Walkthrough

This PR introduces two new domains—life_capture for personal indexing with embeddings and hybrid search, and curated_memory for runtime MEMORY.md/USER.md stores—plus updates agent session plumbing to thread curated snapshots through prompt contexts. The installer flow is hardened with asset-resolution testing.

Changes

Cohort / File(s) Summary
Installer & Workflow
.github/workflows/installer-smoke.yml, scripts/install.sh, scripts/test_install.sh, scripts/fixtures/latest.json
Add --source-only mode to install script, introduce asset resolver with reachability checks, and add dedicated smoke test for resolver validation.
Life Capture — Core Module
src/openhuman/life_capture/types.rs, src/openhuman/life_capture/embedder.rs, src/openhuman/life_capture/quote_strip.rs, src/openhuman/life_capture/redact.rs
Define item/query/hit types; implement hosted embedder trait with OpenAI-compatible HTTP endpoint; add email reply stripping and PII redaction utilities.
Life Capture — Index & Storage
src/openhuman/life_capture/index.rs, src/openhuman/life_capture/migrations.rs, src/openhuman/life_capture/migrations/*.sql
Implement SQLite-backed personal index with r2d2 read pool, FTS5 keyword search, sqlite-vec vector search, and hybrid ranking with recency boosting; define schema and migration infrastructure.
Life Capture — Integration
src/openhuman/life_capture/rpc.rs, src/openhuman/life_capture/runtime.rs, src/openhuman/life_capture/schemas.rs, src/openhuman/life_capture/mod.rs
Add RPC handlers for get_stats and search; introduce runtime singleton for index/embedder init; define controller schemas; expose domain via module entrypoint.
Curated Memory — Core
src/openhuman/curated_memory/types.rs, src/openhuman/curated_memory/store.rs
Define MemoryFile enum and MemorySnapshot struct; implement MemoryStore with persistent async file I/O, entry append/replace/remove, and snapshot capture.
Curated Memory — Integration
src/openhuman/curated_memory/runtime.rs, src/openhuman/curated_memory/rpc.rs, src/openhuman/curated_memory/schemas.rs, src/openhuman/curated_memory/mod.rs
Add runtime singletons for memory/user stores; implement RPC handlers for read/add/replace/remove; define controller schemas and module exports.
Agent Session Plumbing
src/openhuman/agent/harness/session/types.rs, src/openhuman/agent/harness/session/builder.rs, src/openhuman/agent/harness/session/runtime.rs, src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/fork_context.rs, src/openhuman/agent/harness/subagent_runner/ops.rs, src/openhuman/agent/harness/subagent_runner/ops_tests.rs
Thread curated_snapshot (Arc\<MemorySnapshot\>) through Agent session lifecycle; populate snapshot before first turn; pass to PromptContext and ParentExecutionContext for sub-agent inheritance.
Prompt Context & Agent Tests
src/openhuman/agent/prompts/types.rs, src/openhuman/agent/prompts/mod.rs, src/openhuman/agent/prompts/mod_tests.rs, src/openhuman/agent/agents/\\*/prompt.rs (17 files), src/openhuman/agent/agents/loader.rs, src/openhuman/agent/debug/mod.rs, src/openhuman/agent/triage/escalation.rs, src/openhuman/agent/triage/evaluator.rs
Add curated_snapshot: Option<&MemorySnapshot> field to PromptContext; update UserFilesSection to prefer snapshot MEMORY.md/USER.md over workspace files; remove global style suffix; initialize field in all test contexts.
Server Startup & Bootstrap
src/core/all.rs, src/core/jsonrpc.rs
Register life_capture and curated_memory controller domains; introduce bootstrap_controller_runtimes() async function; call bootstrap during server startup after config load.
Module Declarations
src/openhuman/mod.rs
Add pub mod curated_memory and pub mod life_capture declarations.
Planning & Specification
docs/superpowers/specs/2026-04-22-life-capture-layer-design.md, docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md, docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md, AGENTS.md
Document life-capture layer design spec, foundation milestone plan, and Track 1 ship pipeline; update Rust test target from openhuman to openhuman_core.
Dependencies
Cargo.toml
Add r2d2, r2d2_sqlite for connection pooling; add sqlite-vec for vector storage; add httpmock for test mocking.
Tests
tests/life_capture_retrieval_eval.rs, tests/fixtures/life_capture/corpus.json, src/openhuman/life_capture/tests/e2e.rs, src/openhuman/life_capture/tests/mod.rs, tests/agent_harness_public.rs
Add life-capture retrieval evaluation test with recall/MRR metrics; introduce fixture corpus with 25 items and 11 queries; add end-to-end deterministic embedder test; update agent harness test context.

Sequence Diagrams

sequenceDiagram
    participant User as User/App
    participant Runtime as Life Capture<br/>Runtime
    participant Index as Personal<br/>Index
    participant Embedder as Embedder<br/>(OpenAI)
    participant DB as SQLite +<br/>sqlite-vec

    User->>Runtime: bootstrap(workspace_dir)
    Runtime->>Index: open(db_path)
    Index->>DB: migrate() + init schema
    Index->>DB: register vec0 extension
    Runtime->>Embedder: new(api_key, model)
    Runtime->>Runtime: store singletons

    User->>Runtime: search("what was that?", k=10)
    Runtime->>Embedder: embed_batch(["what was that?"])
    Embedder-->>Runtime: [1536D vector]
    Runtime->>Index: hybrid_search(text, vector, k=10)
    Index->>DB: keyword_search (FTS5)
    Index->>DB: vector_search (sqlite-vec distance)
    Index->>Index: RRF + recency ranking
    Index-->>Runtime: [Hit{score,snippet,item}]
    Runtime-->>User: {hits: [...]}
Loading
sequenceDiagram
    participant Agent as Agent<br/>Session
    participant Turn as Turn<br/>Runtime
    participant CuratedMem as Curated<br/>Memory Runtime
    participant Store as Memory<br/>Store
    participant PromptCtx as Prompt<br/>Context

    Agent->>Agent: build() → new Agent
    Agent->>Turn: first turn setup
    Turn->>CuratedMem: get() runtime singleton
    CuratedMem->>Store: snapshot_pair()
    Store-->>CuratedMem: MemorySnapshot{memory,user}
    Turn->>Turn: store snapshot in agent
    
    Turn->>PromptCtx: build context
    PromptCtx->>PromptCtx: curated_snapshot = Some(snapshot)
    PromptCtx->>Turn: ready for prompt rendering
    
    Turn->>Turn: render system prompt
    Turn-->>Agent: exec turn with frozen snapshot

    alt Sub-agent Call
        Turn->>Agent: spawn_subagent()
        Agent->>Agent: ParentExecutionContext{curated_snapshot}
        Agent-->>Turn: sub-agent inherits snapshot
    end
Loading
sequenceDiagram
    participant Runtime as Curated<br/>Memory Runtime
    participant RPC as RPC<br/>Handler
    participant Store as Memory<br/>Store
    participant FS as Filesystem

    Runtime->>Runtime: bootstrap(workspace_dir)
    Runtime->>Store: open(memories/memory)
    Store->>FS: create/migrate MEMORY.md
    Store-->>Runtime: MemoryStore(memory)
    Runtime->>Store: open(memories/user)
    Store->>FS: create/migrate USER.md
    Store-->>Runtime: MemoryStore(user)

    Client->>RPC: handle_add(file="memory", entry="...")
    RPC->>Store: add(entry)
    Store->>FS: append (atomic temp+rename)
    Store-->>RPC: Ok(())
    RPC-->>Client: {file, ok: true}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

Poem

🐰 A rabbit hops through memories, embeddings in paw,
Life-capture and curated dreams without flaw,
Snapshots frozen, snapshots flow, searches soar and seek,
From fuzzy vectors to keywords chic—
One session, one snapshot, precise and unique! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the main change: adding recall@k and MRR metrics aggregation to the life_capture retrieval evaluation test.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/agent/debug/mod.rs (1)

381-396: ⚠️ Potential issue | 🟡 Minor

Verify the dump still matches production when integrations_agent is spawned as a subagent with inherited parent context.

The module doc and DumpedPrompt::text comment claim the dump is byte-identical to what the LLM sees on turn 1. However, integrations_agent is never a turn-1 agent in production — it's always spawned as a typed subagent via spawn_subagent(agent_id="integrations_agent", toolkit=…) by the orchestrator or other agents. In that subagent scenario (per subagent_runner/ops.rs:533), the child inherits parent.curated_snapshot.as_deref() and renders byte-identical MEMORY.md / USER.md blocks per the ParentExecutionContext contract.

The debug dump creates a fresh Agent::from_config_for_agent(…) without parent context and hardcodes curated_snapshot: None, so it shows a baseline prompt that never occurs in production. If the goal is to reflect what a live spawned integrations_agent actually receives, either:

  1. Accept an optional parent_context: Option<&ParentExecutionContext> parameter to render_integrations_agent and thread parent.curated_snapshot through, or
  2. Update the module doc to clarify that the dump shows the base prompt before parent-context inheritance (distinct from the production subagent runtime).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/debug/mod.rs` around lines 381 - 396, The debug dump
builds a PromptContext with curated_snapshot: None which doesn't match
production for integrations_agent spawned as a subagent; modify
render_integrations_agent (or the function that calls
Agent::from_config_for_agent) to accept an Option<&ParentExecutionContext>
parent_context and, when present, set PromptContext.curated_snapshot =
parent_context.curated_snapshot.as_deref(), ensuring you thread the
parent_context through to where PromptContext is constructed (referencing
PromptContext, render_integrations_agent, Agent::from_config_for_agent, and
curated_snapshot); alternatively, if you prefer not to change the API, update
the module doc and DumpedPrompt::text comment to explicitly state the dump is
the base prompt before parent-context inheritance.
🧹 Nitpick comments (16)
src/openhuman/life_capture/redact.rs (1)

34-45: Best-effort CC pattern can match ISBN-13 / similar structured IDs.

\b(?:\d[ \-]?){12,18}\d\b will redact e.g. an ISBN-13 like 978-3-16-148410-0 or a 14-digit GTIN as <CC>. That's typically fine for a personal-indexing redactor (false-positive PII redaction is preferred over leaking), but worth a one-line note in the doc-comment so future readers don't try to "fix" it as a bug.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/redact.rs` around lines 34 - 45, Add a one-line
clarifying comment to the redact function's doc-comment explaining that the CC
regex (static CC) is intentionally broad and may match ISBN-13/GTIN or other
long numeric identifiers (e.g., "978-3-16-148410-0"), and that false-positive
redaction is preferred to leaking PII; update the comment near the CC
declaration or above pub fn redact to mention this so future readers won't treat
those matches as a bug.
src/openhuman/agent/agents/morning_briefing/prompt.rs (1)

64-64: LGTM.

Same mechanical curated_snapshot: None test-fixture update.

Side note (optional, applies repo-wide, not blocking): the six agents (archivist, tool_maker, trigger_triage, morning_briefing, critic, integrations_agent's ctx_with) now share an identical PromptContext test fixture body modulo agent_id. A small test_support helper that returns a baseline PromptContext would collapse all of these and make the next field addition a one-line change instead of N. Not for this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/agents/morning_briefing/prompt.rs` at line 64, The test
fixture update simply sets curated_snapshot: None in the PromptContext instance
used by the morning_briefing agent; keep that change but also consider
(non-blocking) extracting a shared test helper function (e.g., a
test_support::baseline_prompt_context) that returns a baseline PromptContext to
avoid duplicated fixtures across agents (archivist, tool_maker, trigger_triage,
morning_briefing, critic, integrations_agent::ctx_with); for this PR just ensure
the curated_snapshot: None assignment is applied to the PromptContext used in
morning_briefing's tests and leave the refactor to a separate change.
src/core/jsonrpc.rs (1)

667-675: Include the underlying error in the bootstrap-failure log.

Err(_) discards the error; when this branch ever fires (corrupt config, IO error, schema migration), an operator only sees the generic message. Surface {err} so the cause is greppable in logs alongside the existing [core] prefix you already use elsewhere on this path.

♻️ Proposed diff
-    match crate::openhuman::config::Config::load_or_init().await {
-        Ok(config) => {
-            crate::core::all::bootstrap_controller_runtimes(&config.workspace_dir).await;
-        }
-        Err(_) => log::warn!(
-            "[core] config load failed during runtime bootstrap; \
-             controllers with pre-serve runtimes will return 'not initialised'"
-        ),
-    }
+    match crate::openhuman::config::Config::load_or_init().await {
+        Ok(config) => {
+            crate::core::all::bootstrap_controller_runtimes(&config.workspace_dir).await;
+        }
+        Err(err) => log::warn!(
+            "[core] config load failed during runtime bootstrap ({err}); \
+             controllers with pre-serve runtimes will return 'not initialised'"
+        ),
+    }

Based on learnings: "Add substantial, development-oriented logs on new/changed flows for end-to-end debugging; use structured, grep-friendly context".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/jsonrpc.rs` around lines 667 - 675, The Err arm of the async match
over crate::openhuman::config::Config::load_or_init() currently discards the
error; change Err(_) to capture the error (e.g., Err(err)) and include that
error in the log message emitted from the failing branch so the operator can see
the underlying cause. Update the log::warn! call that follows the failed load so
it appends the captured error (using a readable formatter such as {:?} or {})
while preserving the existing "[core]" context and keep
bootstrap_controller_runtimes(&config.workspace_dir).await called only on
Ok(config).
scripts/test_install.sh (1)

23-27: Optional: assert specific exit code (3) for missing platform.

resolve_asset_url is documented to return exit code 3 specifically on missing platform (vs 2 parse error / 4 unreachable). Asserting only "non-zero" lets a future regression where a parse error masquerades as a missing-platform failure slip through.

♻️ Tighten the assertion
-# Also test a missing platform produces exit code 3
-if resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null; then
-  echo "FAIL: expected non-zero exit for missing platform linux-aarch64"
-  exit 1
-fi
+# Missing platform should produce the documented exit code 3.
+set +e
+resolve_asset_url "$FIXTURE" "linux" "aarch64" 2>/dev/null
+rc=$?
+set -e
+if [[ "$rc" -ne 3 ]]; then
+  echo "FAIL: expected exit code 3 for missing platform linux-aarch64, got $rc"
+  exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_install.sh` around lines 23 - 27, The test should assert the
exact exit code 3 from resolve_asset_url for a missing platform instead of just
non-zero; run resolve_asset_url "$FIXTURE" "linux" "aarch64", capture its exit
status (e.g., ret=$?), and fail the test unless ret == 3, emitting a clear
message if it differs; reference the existing test invocation of
resolve_asset_url and the FIXTURE variable to locate where to replace the
generic non-zero check with this exact-status assertion.
src/openhuman/agent/prompts/mod.rs (1)

949-980: Optional: extract shared truncation helper.

inject_snapshot_content and inject_workspace_file_capped (lines 1011-1029) now share the exact char-bounded truncation + header + truncation-marker logic. Worth pulling that into a small inject_labeled_block(prompt, label, content, max_chars) helper so the two callers can't drift on truncation semantics.

♻️ Sketch
+fn inject_labeled_block(prompt: &mut String, label: &str, content: &str, max_chars: usize) {
+    let trimmed = content.trim();
+    if trimmed.is_empty() {
+        return;
+    }
+    let _ = writeln!(prompt, "### {label}\n");
+    let truncated = if trimmed.chars().count() > max_chars {
+        trimmed
+            .char_indices()
+            .nth(max_chars)
+            .map(|(idx, _)| &trimmed[..idx])
+            .unwrap_or(trimmed)
+    } else {
+        trimmed
+    };
+    prompt.push_str(truncated);
+    if truncated.len() < trimmed.len() {
+        let _ = writeln!(
+            prompt,
+            "\n\n[... truncated at {max_chars} chars — use `read` for full file]\n"
+        );
+    } else {
+        prompt.push_str("\n\n");
+    }
+}

inject_snapshot_content becomes a thin wrapper; inject_workspace_file_capped calls it after the file read.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/prompts/mod.rs` around lines 949 - 980,
inject_snapshot_content and inject_workspace_file_capped duplicate the same
char-bounded truncation, header and truncation-marker logic; extract that shared
logic into a small helper function (e.g., inject_labeled_block(prompt: &mut
String, label: &str, content: &str, max_chars: usize)) and have both
inject_snapshot_content and inject_workspace_file_capped call it. The helper
should preserve current behavior: skip empty/whitespace content, write the "###
{label}\n" header, truncate by Unicode character count using
char_indices().nth(max_chars) to slice safely, append the truncated text, and
emit the same "\n\n[... truncated at {max_chars} chars — use `read` for full
file]\n" marker when truncated (otherwise append "\n\n"). Ensure existing
callers pass the same max_chars and that string trimming and newline behavior
remain identical.
tests/life_capture_retrieval_eval.rs (1)

308-343: Minor: spell out the recall_at_1 cap in the doc comment.

For queries where relevant.len() > 1, recall_at_1 is bounded above by 1/|relevant|, so the MEAN R@1 mechanically can't reach 1.0 even when the top hit is always a relevant doc (which is what the baseline MRR=1.00 already implies). Worth a sentence on QueryMetrics so future readers don't read R@1=0.77 as "23% of top-1 results are wrong" when it can also mean "queries average ~1.3 relevant docs and we only inspect one slot."

This is purely a clarity nit — the math is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/life_capture_retrieval_eval.rs` around lines 308 - 343, Add a
one-sentence clarification to the QueryMetrics struct doc comment next to the
recall_at_1 field (or the struct-level comment) stating that for queries with
more than one relevant document recall_at_1 is bounded above by 1/|relevant| (so
mean R@1 cannot reach 1.0 even if the top hit is always relevant), and
optionally note that MRR=1.00 implies top-hit relevance while R@1 is constrained
by the average number of relevant docs; update the comment near QueryMetrics and
the recall_at_1 reference accordingly.
scripts/install.sh (1)

229-249: Optional: fold LATEST_VERSION extraction into resolve_asset_url.

resolve_from_latest_json now invokes python3 twice on the same latest.json — once via resolve_asset_url and once at lines 245-249 to fish out version. A single python helper that prints version\nurl (and parses both in one json.load) avoids the double parse and the second open-file path.

Not blocking — the second call is best-effort with || true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/install.sh` around lines 229 - 249, The script currently calls
python3 twice on the same latest.json: once via resolve_asset_url and again to
set LATEST_VERSION; change resolve_asset_url to parse latest.json once and
return both the asset URL and the version (e.g. output "url|version" or two
newline-separated values), then update the caller so ASSET_URL/ASSET_NAME are
derived from resolve_asset_url's first value and LATEST_VERSION from its second;
update resolve_asset_url's callers and error handling to preserve the existing
exit codes/behavior.
src/openhuman/life_capture/migrations.rs (1)

30-47: Consider using unchecked_transaction() instead of manual transaction handling for consistency.

rusqlite::Connection::unchecked_transaction() is already used throughout the codebase and provides implicit rollback on Drop. This simplifies the code and avoids the closure pattern while maintaining the same atomicity guarantees.

♻️ Suggested change
-        // Run the migration SQL and record it atomically.
-        conn.execute_batch("BEGIN")?;
-        let result = (|| -> Result<()> {
-            conn.execute_batch(sql)?;
-            conn.execute(
-                "INSERT INTO _life_capture_migrations(name, applied_at) \
-                 VALUES (?1, CAST(strftime('%s','now') AS INTEGER))",
-                rusqlite::params![name],
-            )?;
-            Ok(())
-        })();
-        match result {
-            Ok(()) => conn.execute_batch("COMMIT")?,
-            Err(e) => {
-                let _ = conn.execute_batch("ROLLBACK");
-                return Err(e);
-            }
-        }
+        let tx = conn.unchecked_transaction()?;
+        tx.execute_batch(sql)?;
+        tx.execute(
+            "INSERT INTO _life_capture_migrations(name, applied_at) \
+             VALUES (?1, CAST(strftime('%s','now') AS INTEGER))",
+            rusqlite::params![name],
+        )?;
+        tx.commit()?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/migrations.rs` around lines 30 - 47, Replace the
manual BEGIN/COMMIT/ROLLBACK and closure pattern with rusqlite's
unchecked_transaction() to mirror the rest of the codebase: start an unchecked
transaction via conn.unchecked_transaction(), run the migration SQL and the
INSERT into _life_capture_migrations using the transaction handle (instead of
conn.execute_batch/conn.execute), and then call tx.commit()? on success so
Drop-based rollback handles failures; update references to the closure and
explicit ROLLBACK/COMMIT logic around conn.execute_batch and the inner INSERT to
use the transaction API instead.
src/openhuman/life_capture/rpc.rs (1)

78-83: Optional: simplify &Arc<dyn Embedder> to &dyn Embedder.

handle_search doesn't need to clone the Arc, only call methods on it, so the borrow is sufficient and slightly more idiomatic. Callers would change from &embedder (where embedder: Arc<dyn Embedder>) to embedder.as_ref() or &*embedder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/rpc.rs` around lines 78 - 83, Change the
handle_search signature to take a borrowed trait object instead of a reference
to an Arc: replace the parameter type "&Arc<dyn Embedder>" with "&dyn Embedder"
in the function signature (pub async fn handle_search(..., embedder: &dyn
Embedder, ...)). Update all call sites that currently pass "&embedder" where
embedder: Arc<dyn Embedder> to pass "embedder.as_ref()" or "&*embedder" (or
simply "embedder" if they already have &Arc) so the function receives a &dyn
Embedder; no other logic needs to change inside handle_search since it only
calls methods on the embedder trait.
src/openhuman/curated_memory/runtime.rs (1)

34-65: Recommended: extract char limits to named constants.

2200 and 1375 are unexplained magic numbers right at the boundary where stores are opened. A pair of named constants would document intent (which file gets which budget and why), and would also be reusable from tests/migrations.

♻️ Proposed refactor
+const MEMORY_CHAR_LIMIT: usize = 2200;
+const USER_CHAR_LIMIT: usize = 1375;
+
 pub async fn bootstrap(workspace_dir: &std::path::Path) {
     let mem_dir = workspace_dir.join("memories");
     let memory_store = crate::openhuman::curated_memory::MemoryStore::open(
         &mem_dir,
         crate::openhuman::curated_memory::MemoryFile::Memory,
-        2200,
+        MEMORY_CHAR_LIMIT,
     );
     let user_store = crate::openhuman::curated_memory::MemoryStore::open(
         &mem_dir,
         crate::openhuman::curated_memory::MemoryFile::User,
-        1375,
+        USER_CHAR_LIMIT,
     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/curated_memory/runtime.rs` around lines 34 - 65, The bootstrap
function uses unexplained magic numbers 2200 and 1375 when calling
MemoryStore::open for MemoryFile::Memory and MemoryFile::User; extract these
into clearly named constants (e.g., MEMORY_CHAR_LIMIT, USER_CHAR_LIMIT) near the
top of the module and replace the numeric literals in bootstrap so the intent
and reuse from tests/migrations are clear; update any references to
MemoryStore::open in this file to use the new constants and ensure they are
public if tests/migrations outside the module need access.
src/openhuman/curated_memory/store.rs (2)

79-93: Minor: replace error drops the actual limit value, unlike add.

add() reports format!("char limit {} exceeded", self.char_limit), but replace() returns the bare string "char limit". The asymmetry makes the error harder to act on (no number, no verb). Same fix would apply to a future remove() if it ever gains a limit check.

♻️ Proposed change
-        if next.chars().count() > self.char_limit {
-            return Err(std::io::Error::new(std::io::ErrorKind::Other, "char limit"));
-        }
+        if next.chars().count() > self.char_limit {
+            return Err(std::io::Error::new(
+                std::io::ErrorKind::Other,
+                format!("char limit {} exceeded", self.char_limit),
+            ));
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/curated_memory/store.rs` around lines 79 - 93, The replace()
method's error message for exceeding char_limit is terse; update the Err
returned in replace() so it matches add() by including the limit value and a
verb (e.g. format!("char limit {} exceeded", self.char_limit)); locate the Err
created after the next.chars().count() > self.char_limit check in replace() and
replace the bare "char limit" string with a formatted message using
self.char_limit (and apply the same pattern to any future remove() limit
checks).

14-56: Optional: MemoryStore::open does sync std::fs I/O from async callers.

bootstrap(...) in curated_memory/runtime.rs is async fn, but it calls this sync open(...), which performs create_dir_all, read_to_string, and write directly on the runtime thread. It only runs at startup and the files are tiny, so impact is minimal — but switching to tokio::fs would keep the module consistent with the async I/O used in read/add/replace/remove and avoid surprising future callers who invoke open later in a request path. Also, the migration silently drops read_to_string errors via .ok(); a log::warn! on Err would help diagnose seed-migration failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/curated_memory/store.rs` around lines 14 - 56,
MemoryStore::open is doing blocking std::fs I/O from async callers (e.g.,
curated_memory::runtime::bootstrap) and also swallows read_to_string errors;
change open to be async (pub async fn open(...)->std::io::Result<Self>) and
replace blocking calls with tokio::fs equivalents (tokio::fs::create_dir_all,
tokio::fs::read_to_string, tokio::fs::write) awaiting each call, keep the same
migration/truncation logic for seed, and update any callers (notably bootstrap)
to await the new async open; additionally, when the legacy read_to_string
returns Err, emit log::warn! including the error instead of silently dropping
it, while preserving the rest of the flow and existing symbols (open, bootstrap,
and the read/add/replace/remove async API consistency).
src/openhuman/life_capture/schemas.rs (1)

165-173: Minor: as usize cast can overflow on 32-bit, though clamp downstream masks it.

read_optional_u64(...).unwrap_or(10) as usize truncates rather than rejects values larger than usize::MAX. The downstream k.clamp(1, 100) in rpc::handle_search makes this safe today, but it would be clearer to validate at the controller boundary so behavior is independent of downstream clamping.

♻️ Optional defensive validation
-        let k = read_optional_u64(&params, "k")?.unwrap_or(10) as usize;
+        let k_u64 = read_optional_u64(&params, "k")?.unwrap_or(10);
+        let k = usize::try_from(k_u64).unwrap_or(usize::MAX).min(100).max(1);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/schemas.rs` around lines 165 - 173, In
handle_search, the expression read_optional_u64(...).unwrap_or(10) as usize can
overflow on 32-bit targets; validate the optional u64 before casting to usize
instead of relying on rpc::handle_search's clamp. Change the logic in
handle_search (where read_optional_u64 is used) to: obtain the Option<u64>, use
unwrap_or(10) to get a u64, then convert via try_into() (or check value <=
usize::MAX) and return an Err string if it doesn't fit; pass the safely
converted usize into rpc::handle_search so the controller enforces valid bounds
independent of rpc::handle_search's k.clamp(1,100).
src/openhuman/life_capture/embedder.rs (1)

22-26: Optional: don't silently fall back to a timeout-less client.

unwrap_or_else(|_| reqwest::Client::new()) produces a client with no timeouts if the configured builder ever fails. That defeats the purpose of the timeouts and can manifest as request-thread hangs. Prefer fail-fast at startup:

♻️ Proposed change
-        let http = reqwest::Client::builder()
+        let http = reqwest::Client::builder()
             .timeout(std::time::Duration::from_secs(30))
             .connect_timeout(std::time::Duration::from_secs(10))
             .build()
-            .unwrap_or_else(|_| reqwest::Client::new());
+            .expect("[life_capture] failed to build embedder HTTP client");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/embedder.rs` around lines 22 - 26, The current
code builds the reqwest client into `http` using `reqwest::Client::builder()`
but silently falls back to `reqwest::Client::new()` via `unwrap_or_else(|_|
...)`, which yields a client without the configured timeouts; change this to
fail-fast instead: remove the silent fallback on `.build()` and propagate or
panic on error (e.g., replace `unwrap_or_else(|_| reqwest::Client::new())` with
an explicit `expect`/`?`-style handling) so that `http` is never created without
the specified timeouts and startup fails clearly if the builder cannot construct
the client.
src/openhuman/life_capture/index.rs (2)

169-178: Silent fallback to empty source on serde failure can corrupt lookups.

If serde_json::to_value(item.source) returns Err or a non-string value, source becomes "" and the subsequent SELECT … WHERE source = ?1 AND external_id = ?2 matches an unintended row class instead of failing loudly. Same shape for author_json/metadata_json: a serialize failure silently writes literals "null"/"{}". For a Source enum with #[serde(rename_all = …)] this is currently a no-op, but it’s a quiet footgun the day someone adds a non-unit variant.

♻️ Surface the error instead of swallowing it
-                    let source = serde_json::to_value(item.source)
-                        .ok()
-                        .and_then(|v| v.as_str().map(str::to_string))
-                        .unwrap_or_default();
-                    let author_json = item
-                        .author
-                        .as_ref()
-                        .map(|a| serde_json::to_string(a).unwrap_or_else(|_| "null".into()));
-                    let metadata_json = serde_json::to_string(&item.metadata)
-                        .unwrap_or_else(|_| "{}".into());
+                    let source = match serde_json::to_value(item.source)
+                        .context("serialize Item::source")?
+                    {
+                        serde_json::Value::String(s) => s,
+                        other => anyhow::bail!("Source did not serialize to a JSON string: {other:?}"),
+                    };
+                    let author_json = item
+                        .author
+                        .as_ref()
+                        .map(serde_json::to_string)
+                        .transpose()
+                        .context("serialize Item::author")?;
+                    let metadata_json = serde_json::to_string(&item.metadata)
+                        .context("serialize Item::metadata")?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/index.rs` around lines 169 - 178, The current code
silently replaces serialization failures for source, author_json, and
metadata_json
(serde_json::to_value(item.source).ok().and_then(...).unwrap_or_default() and
the unwrap_or_else fallbacks) with empty/"null"/"{}" which can corrupt lookups
(the SELECT WHERE source = ?1 ... may match the wrong row); instead change the
code to propagate/return a serialization error when
serde_json::to_value(item.source) or serde_json::to_string(...) for
item.author/item.metadata fails (e.g., use the ? operator or map_err to convert
into the function's Result error type) so the operation fails fast and no silent
defaults are written. Ensure the variables named source, author_json, and
metadata_json are produced only after successful serialization and that callers
handle the Result accordingly.

571-616: Hybrid post-filtering can under-fill results when sources/since/until are restrictive.

oversample = (q.k * 3).max(20) pulls ≤2*oversample unique candidates into the fusion map; apply_query_filters then drops anything outside the requested sources/time window before truncating to q.k. With a narrow filter (e.g., sources=[Slack] while most candidates are Gmail), the caller can receive far fewer than q.k hits even when plenty of qualifying rows exist further down the rank list.

Either push these filters into the SQL of keyword_search / vector_search (so oversampling is over the filtered candidate space), or grow oversample adaptively when filters are non-empty. SQL pushdown is the cleaner option and removes a Rust-side scan over hits you’ll discard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/index.rs` around lines 571 - 616, Hybrid search
currently oversamples unfiltered candidates then drops many via
apply_query_filters, causing under-filled results when q.sources/q.since/q.until
are restrictive; update the search to push those filters into the candidate-side
queries or adapt oversample. Specifically, change keyword_search and
vector_search to accept the query filters (e.g., pass q or
q.sources/q.since/q.until) and apply them in their SQL WHERE clauses so
oversample is applied to the filtered corpus; if you prefer the interim fix,
detect non-empty q.sources or time bounds in hybrid_search and increase
oversample (e.g., multiply by a factor or use an adaptive loop to grow
oversample until you have ≥q.k filtered hits) instead of only using (q.k *
3).max(20); adjust the call sites where keyword_search/vector_search are invoked
and remove redundant Rust-side scanning that discards filtered rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Line 459: The README/guide currently instructs running "cargo test -p
openhuman_core" which is incorrect because openhuman_core is a lib target, not a
package; update the command to "cargo test -p openhuman" (or remove -p entirely
and use just "cargo test" if you prefer running tests for the whole workspace)
so the -p flag references the actual package name ("openhuman") instead of the
lib name ("openhuman_core"); change the occurrence of cargo test -p
openhuman_core to cargo test -p openhuman in the document text.

In `@Cargo.toml`:
- Around line 58-61: The init closure used to initialize the SQLite pool only
sets PRAGMA via c.pragma_update (the closure that configures connections) but
never loads the sqlite-vec extension; add a call to sqlite_vec::load(&conn)?
inside that same init closure (where c.pragma_update(None, "query_only", "ON")
is invoked) so each new connection has the extension registered (use the
existing conn variable and propagate the error as needed); this ensures
sqlite_vec::load is called for r2d2_sqlite/rusqlite pooled connections before
they are returned.

In `@scripts/install.sh`:
- Around line 6-14: The top-level "set -euo pipefail" leaks into the caller when
this script is sourced; change the install guard so that either (a) do not apply
"set -euo pipefail" until after you check SOURCE_ONLY (the --source-only loop)
and only enable it when SOURCE_ONLY is not set, or (b) save the caller's shell
options (e.g., via "old_opts=$(set +o)" or "old_status=$(set -o)" style) before
applying "set -euo pipefail" and restore them before returning; update the code
around the SOURCE_ONLY check and the install-body entry points (references: the
SOURCE_ONLY variable, the --source-only argument handling loop, and the
top-level set -euo pipefail) so sourcing with --source-only does not mutate the
parent shell options.

In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs`:
- Around line 211-214: The function signature for make_parent is currently split
across multiple lines causing rustfmt failure; collapse the signature to a
single line so rustfmt accepts it — update the declaration of fn make_parent to
place provider: Arc<dyn Provider>, tools: Vec<Box<dyn Tool>> and the return type
-> ParentExecutionContext all on one line (keeping the existing body unchanged)
and ensure the symbols make_parent, Provider, Tool, and ParentExecutionContext
are used exactly as in the file.
- Around line 123-125: The multi-line use declaration and the multi-line
function signature cause rustfmt failures; change the multi-line use of
crate::openhuman::providers to a single-line import (use
crate::openhuman::providers::{ChatRequest as PChatRequest, ChatResponse,
Provider, ToolCall};) and collapse the make_parent function signature (the
make_parent definition spanning lines 211–214) into a single line signature to
match rustfmt expectations; update only the formatting, keeping identifiers and
types unchanged.

In `@src/openhuman/curated_memory/rpc.rs`:
- Around line 31-39: The handle_add function must reject empty entries: before
calling pick(&file, rt) and store.add(&entry).await, check that entry is not
empty (e.g., entry.is_empty() or entry.trim().is_empty() per project convention)
and return an Err(String) with a clear message (consistent with other handlers
like remove/replace, e.g., format!("add: empty entry") or "entry must be
non-empty") instead of appending a stray separator; only call store.add when the
entry is non-empty.

In `@src/openhuman/life_capture/embedder.rs`:
- Around line 12-34: HostedEmbedder currently hard-codes dim() to 1536 while
new() accepts any model string, causing mismatch with INDEX_VECTOR_DIM; update
HostedEmbedder to carry the embedding dimension as configuration (add a dim:
usize field and expose a dim() -> usize method), change HostedEmbedder::new(...)
to accept a dim parameter (or a Model→dim resolver in runtime.rs that maps
OPENHUMAN_EMBEDDINGS_MODEL to the correct dimension), and update sites that
construct HostedEmbedder (runtime.rs) and any checks in rpc::handle_search to
use the new dim field so the pre-flight comparison against INDEX_VECTOR_DIM is
accurate; ensure embed_batch callers still work with the actual returned
dimension.

In `@src/openhuman/life_capture/index.rs`:
- Around line 1-336: The file is too large and should be split into focused
modules: keep the public types and process-wide init in index/mod.rs
(PersonalIndex, ensure_vec_extension_registered, ReaderPool, READER_POOL_SIZE),
move writer logic into index/writer.rs (IndexWriter, its upsert and
upsert_vector, and any writer-side helpers like Source/serde_json fallbacks),
move read/query logic into index/reader.rs (IndexReader, ItemRow,
with_read_conn, fts5_quote, apply_query_filters and reader helpers), and
relocate each #[cfg(test)] mod next to the code it tests; update mod
declarations and use statements accordingly so all referenced symbols
(PersonalIndex, IndexWriter, IndexReader, ItemRow, with_read_conn, fts5_quote,
apply_query_filters, VEC_REGISTERED/ensure_vec_extension_registered) remain
accessible.

In `@src/openhuman/life_capture/runtime.rs`:
- Around line 96-106: The current env lookup treats an empty
OPENHUMAN_EMBEDDINGS_KEY as a valid value and thus never falls back to
OPENAI_API_KEY; update the retrieval so empty strings are treated as unset: call
std::env::var("OPENHUMAN_EMBEDDINGS_KEY") and std::env::var("OPENAI_API_KEY")
but filter out empty results (e.g. map to Option and .filter(|s| !s.is_empty())
or convert empty Ok("") into None) before applying the precedence logic that
assigns to the local api_key and the subsequent let Some(api_key) = api_key else
{ ... } block so a blank OPENHUMAN_EMBEDDINGS_KEY will correctly fall back to
OPENAI_API_KEY.

In `@tests/life_capture_retrieval_eval.rs`:
- Around line 197-226: run_query currently requests only q.k hits so R@5
collapses to R@q.k for queries with k<5; change the search call to request a
fetch_k = q.k.max(5) (use fetch_k when calling reader.keyword_search,
vector_search and hybrid_search) and still return hit_ext_ids(&hits) (leave any
budget-aware assertions like must_contain_in_top using q.k unchanged). Apply the
same change in the other occurrence identified around lines 318-343 (use fetch_k
= q.k.max(5) for retrievals, but keep downstream checks that rely on q.k).

---

Outside diff comments:
In `@src/openhuman/agent/debug/mod.rs`:
- Around line 381-396: The debug dump builds a PromptContext with
curated_snapshot: None which doesn't match production for integrations_agent
spawned as a subagent; modify render_integrations_agent (or the function that
calls Agent::from_config_for_agent) to accept an Option<&ParentExecutionContext>
parent_context and, when present, set PromptContext.curated_snapshot =
parent_context.curated_snapshot.as_deref(), ensuring you thread the
parent_context through to where PromptContext is constructed (referencing
PromptContext, render_integrations_agent, Agent::from_config_for_agent, and
curated_snapshot); alternatively, if you prefer not to change the API, update
the module doc and DumpedPrompt::text comment to explicitly state the dump is
the base prompt before parent-context inheritance.

---

Nitpick comments:
In `@scripts/install.sh`:
- Around line 229-249: The script currently calls python3 twice on the same
latest.json: once via resolve_asset_url and again to set LATEST_VERSION; change
resolve_asset_url to parse latest.json once and return both the asset URL and
the version (e.g. output "url|version" or two newline-separated values), then
update the caller so ASSET_URL/ASSET_NAME are derived from resolve_asset_url's
first value and LATEST_VERSION from its second; update resolve_asset_url's
callers and error handling to preserve the existing exit codes/behavior.

In `@scripts/test_install.sh`:
- Around line 23-27: The test should assert the exact exit code 3 from
resolve_asset_url for a missing platform instead of just non-zero; run
resolve_asset_url "$FIXTURE" "linux" "aarch64", capture its exit status (e.g.,
ret=$?), and fail the test unless ret == 3, emitting a clear message if it
differs; reference the existing test invocation of resolve_asset_url and the
FIXTURE variable to locate where to replace the generic non-zero check with this
exact-status assertion.

In `@src/core/jsonrpc.rs`:
- Around line 667-675: The Err arm of the async match over
crate::openhuman::config::Config::load_or_init() currently discards the error;
change Err(_) to capture the error (e.g., Err(err)) and include that error in
the log message emitted from the failing branch so the operator can see the
underlying cause. Update the log::warn! call that follows the failed load so it
appends the captured error (using a readable formatter such as {:?} or {}) while
preserving the existing "[core]" context and keep
bootstrap_controller_runtimes(&config.workspace_dir).await called only on
Ok(config).

In `@src/openhuman/agent/agents/morning_briefing/prompt.rs`:
- Line 64: The test fixture update simply sets curated_snapshot: None in the
PromptContext instance used by the morning_briefing agent; keep that change but
also consider (non-blocking) extracting a shared test helper function (e.g., a
test_support::baseline_prompt_context) that returns a baseline PromptContext to
avoid duplicated fixtures across agents (archivist, tool_maker, trigger_triage,
morning_briefing, critic, integrations_agent::ctx_with); for this PR just ensure
the curated_snapshot: None assignment is applied to the PromptContext used in
morning_briefing's tests and leave the refactor to a separate change.

In `@src/openhuman/agent/prompts/mod.rs`:
- Around line 949-980: inject_snapshot_content and inject_workspace_file_capped
duplicate the same char-bounded truncation, header and truncation-marker logic;
extract that shared logic into a small helper function (e.g.,
inject_labeled_block(prompt: &mut String, label: &str, content: &str, max_chars:
usize)) and have both inject_snapshot_content and inject_workspace_file_capped
call it. The helper should preserve current behavior: skip empty/whitespace
content, write the "### {label}\n" header, truncate by Unicode character count
using char_indices().nth(max_chars) to slice safely, append the truncated text,
and emit the same "\n\n[... truncated at {max_chars} chars — use `read` for full
file]\n" marker when truncated (otherwise append "\n\n"). Ensure existing
callers pass the same max_chars and that string trimming and newline behavior
remain identical.

In `@src/openhuman/curated_memory/runtime.rs`:
- Around line 34-65: The bootstrap function uses unexplained magic numbers 2200
and 1375 when calling MemoryStore::open for MemoryFile::Memory and
MemoryFile::User; extract these into clearly named constants (e.g.,
MEMORY_CHAR_LIMIT, USER_CHAR_LIMIT) near the top of the module and replace the
numeric literals in bootstrap so the intent and reuse from tests/migrations are
clear; update any references to MemoryStore::open in this file to use the new
constants and ensure they are public if tests/migrations outside the module need
access.

In `@src/openhuman/curated_memory/store.rs`:
- Around line 79-93: The replace() method's error message for exceeding
char_limit is terse; update the Err returned in replace() so it matches add() by
including the limit value and a verb (e.g. format!("char limit {} exceeded",
self.char_limit)); locate the Err created after the next.chars().count() >
self.char_limit check in replace() and replace the bare "char limit" string with
a formatted message using self.char_limit (and apply the same pattern to any
future remove() limit checks).
- Around line 14-56: MemoryStore::open is doing blocking std::fs I/O from async
callers (e.g., curated_memory::runtime::bootstrap) and also swallows
read_to_string errors; change open to be async (pub async fn
open(...)->std::io::Result<Self>) and replace blocking calls with tokio::fs
equivalents (tokio::fs::create_dir_all, tokio::fs::read_to_string,
tokio::fs::write) awaiting each call, keep the same migration/truncation logic
for seed, and update any callers (notably bootstrap) to await the new async
open; additionally, when the legacy read_to_string returns Err, emit log::warn!
including the error instead of silently dropping it, while preserving the rest
of the flow and existing symbols (open, bootstrap, and the
read/add/replace/remove async API consistency).

In `@src/openhuman/life_capture/embedder.rs`:
- Around line 22-26: The current code builds the reqwest client into `http`
using `reqwest::Client::builder()` but silently falls back to
`reqwest::Client::new()` via `unwrap_or_else(|_| ...)`, which yields a client
without the configured timeouts; change this to fail-fast instead: remove the
silent fallback on `.build()` and propagate or panic on error (e.g., replace
`unwrap_or_else(|_| reqwest::Client::new())` with an explicit `expect`/`?`-style
handling) so that `http` is never created without the specified timeouts and
startup fails clearly if the builder cannot construct the client.

In `@src/openhuman/life_capture/index.rs`:
- Around line 169-178: The current code silently replaces serialization failures
for source, author_json, and metadata_json
(serde_json::to_value(item.source).ok().and_then(...).unwrap_or_default() and
the unwrap_or_else fallbacks) with empty/"null"/"{}" which can corrupt lookups
(the SELECT WHERE source = ?1 ... may match the wrong row); instead change the
code to propagate/return a serialization error when
serde_json::to_value(item.source) or serde_json::to_string(...) for
item.author/item.metadata fails (e.g., use the ? operator or map_err to convert
into the function's Result error type) so the operation fails fast and no silent
defaults are written. Ensure the variables named source, author_json, and
metadata_json are produced only after successful serialization and that callers
handle the Result accordingly.
- Around line 571-616: Hybrid search currently oversamples unfiltered candidates
then drops many via apply_query_filters, causing under-filled results when
q.sources/q.since/q.until are restrictive; update the search to push those
filters into the candidate-side queries or adapt oversample. Specifically,
change keyword_search and vector_search to accept the query filters (e.g., pass
q or q.sources/q.since/q.until) and apply them in their SQL WHERE clauses so
oversample is applied to the filtered corpus; if you prefer the interim fix,
detect non-empty q.sources or time bounds in hybrid_search and increase
oversample (e.g., multiply by a factor or use an adaptive loop to grow
oversample until you have ≥q.k filtered hits) instead of only using (q.k *
3).max(20); adjust the call sites where keyword_search/vector_search are invoked
and remove redundant Rust-side scanning that discards filtered rows.

In `@src/openhuman/life_capture/migrations.rs`:
- Around line 30-47: Replace the manual BEGIN/COMMIT/ROLLBACK and closure
pattern with rusqlite's unchecked_transaction() to mirror the rest of the
codebase: start an unchecked transaction via conn.unchecked_transaction(), run
the migration SQL and the INSERT into _life_capture_migrations using the
transaction handle (instead of conn.execute_batch/conn.execute), and then call
tx.commit()? on success so Drop-based rollback handles failures; update
references to the closure and explicit ROLLBACK/COMMIT logic around
conn.execute_batch and the inner INSERT to use the transaction API instead.

In `@src/openhuman/life_capture/redact.rs`:
- Around line 34-45: Add a one-line clarifying comment to the redact function's
doc-comment explaining that the CC regex (static CC) is intentionally broad and
may match ISBN-13/GTIN or other long numeric identifiers (e.g.,
"978-3-16-148410-0"), and that false-positive redaction is preferred to leaking
PII; update the comment near the CC declaration or above pub fn redact to
mention this so future readers won't treat those matches as a bug.

In `@src/openhuman/life_capture/rpc.rs`:
- Around line 78-83: Change the handle_search signature to take a borrowed trait
object instead of a reference to an Arc: replace the parameter type "&Arc<dyn
Embedder>" with "&dyn Embedder" in the function signature (pub async fn
handle_search(..., embedder: &dyn Embedder, ...)). Update all call sites that
currently pass "&embedder" where embedder: Arc<dyn Embedder> to pass
"embedder.as_ref()" or "&*embedder" (or simply "embedder" if they already have
&Arc) so the function receives a &dyn Embedder; no other logic needs to change
inside handle_search since it only calls methods on the embedder trait.

In `@src/openhuman/life_capture/schemas.rs`:
- Around line 165-173: In handle_search, the expression
read_optional_u64(...).unwrap_or(10) as usize can overflow on 32-bit targets;
validate the optional u64 before casting to usize instead of relying on
rpc::handle_search's clamp. Change the logic in handle_search (where
read_optional_u64 is used) to: obtain the Option<u64>, use unwrap_or(10) to get
a u64, then convert via try_into() (or check value <= usize::MAX) and return an
Err string if it doesn't fit; pass the safely converted usize into
rpc::handle_search so the controller enforces valid bounds independent of
rpc::handle_search's k.clamp(1,100).

In `@tests/life_capture_retrieval_eval.rs`:
- Around line 308-343: Add a one-sentence clarification to the QueryMetrics
struct doc comment next to the recall_at_1 field (or the struct-level comment)
stating that for queries with more than one relevant document recall_at_1 is
bounded above by 1/|relevant| (so mean R@1 cannot reach 1.0 even if the top hit
is always relevant), and optionally note that MRR=1.00 implies top-hit relevance
while R@1 is constrained by the average number of relevant docs; update the
comment near QueryMetrics and the recall_at_1 reference accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dbaedfd5-9e5a-4435-bb3d-54cffc304752

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4f192 and 23237de.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (65)
  • .github/workflows/installer-smoke.yml
  • AGENTS.md
  • Cargo.toml
  • docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md
  • docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md
  • docs/superpowers/specs/2026-04-22-life-capture-layer-design.md
  • scripts/fixtures/latest.json
  • scripts/install.sh
  • scripts/test_install.sh
  • src/core/all.rs
  • src/core/jsonrpc.rs
  • src/openhuman/agent/agents/archivist/prompt.rs
  • src/openhuman/agent/agents/code_executor/prompt.rs
  • src/openhuman/agent/agents/critic/prompt.rs
  • src/openhuman/agent/agents/help/prompt.rs
  • src/openhuman/agent/agents/integrations_agent/prompt.rs
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/morning_briefing/prompt.rs
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/agent/agents/planner/prompt.rs
  • src/openhuman/agent/agents/researcher/prompt.rs
  • src/openhuman/agent/agents/summarizer/prompt.rs
  • src/openhuman/agent/agents/tool_maker/prompt.rs
  • src/openhuman/agent/agents/tools_agent/prompt.rs
  • src/openhuman/agent/agents/trigger_reactor/prompt.rs
  • src/openhuman/agent/agents/trigger_triage/prompt.rs
  • src/openhuman/agent/agents/welcome/prompt.rs
  • src/openhuman/agent/debug/mod.rs
  • src/openhuman/agent/harness/fork_context.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/harness/subagent_runner/ops_tests.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/prompts/mod_tests.rs
  • src/openhuman/agent/prompts/types.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/curated_memory/mod.rs
  • src/openhuman/curated_memory/rpc.rs
  • src/openhuman/curated_memory/runtime.rs
  • src/openhuman/curated_memory/schemas.rs
  • src/openhuman/curated_memory/store.rs
  • src/openhuman/curated_memory/types.rs
  • src/openhuman/learning/prompt_sections.rs
  • src/openhuman/life_capture/embedder.rs
  • src/openhuman/life_capture/index.rs
  • src/openhuman/life_capture/migrations.rs
  • src/openhuman/life_capture/migrations/0001_init.sql
  • src/openhuman/life_capture/migrations/0002_vec.sql
  • src/openhuman/life_capture/mod.rs
  • src/openhuman/life_capture/quote_strip.rs
  • src/openhuman/life_capture/redact.rs
  • src/openhuman/life_capture/rpc.rs
  • src/openhuman/life_capture/runtime.rs
  • src/openhuman/life_capture/schemas.rs
  • src/openhuman/life_capture/tests/e2e.rs
  • src/openhuman/life_capture/tests/mod.rs
  • src/openhuman/life_capture/types.rs
  • src/openhuman/mod.rs
  • tests/agent_harness_public.rs
  • tests/fixtures/life_capture/corpus.json
  • tests/life_capture_retrieval_eval.rs

Comment thread AGENTS.md

1. **Specify against the current codebase** — Ground the design in **existing** domains, controller/registry patterns, and JSON-RPC naming (`openhuman.<namespace>_<function>`). Reuse or extend documented flows in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) and sibling guides; avoid parallel architectures.
2. **Implement in Rust** — Add domain logic under `src/openhuman/<domain>/`, wire **schemas + registered handlers** into the shared registry, and land **unit tests** in the crate (`cargo test -p openhuman`, focused modules) until the feature is correct in isolation.
2. **Implement in Rust** — Add domain logic under `src/openhuman/<domain>/`, wire **schemas + registered handlers** into the shared registry, and land **unit tests** in the crate (`cargo test -p openhuman_core`, focused modules) until the feature is correct in isolation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether any Cargo.toml in the repo declares a package named `openhuman_core`,
# vs. just the lib target name. Also list all workspace members and package names.
fd -t f 'Cargo.toml' --hidden --exclude target | while read -r f; do
  echo "=== $f ==="
  awk '
    /^\[package\]/ {p=1; next}
    /^\[/ {p=0}
    p && /^[[:space:]]*name[[:space:]]*=/ {print "package.name:", $0}
  ' "$f"
  awk '
    /^\[lib\]/ {l=1; next}
    /^\[/ {l=0}
    l && /^[[:space:]]*name[[:space:]]*=/ {print "lib.name:", $0}
  ' "$f"
  awk '
    /^\[workspace\]/ {w=1; next}
    /^\[/ {w=0}
    w {print "workspace:", $0}
  ' "$f"
done

Repository: tinyhumansai/openhuman

Length of output: 247


Fix the incorrect cargo test command. Line 459 shows cargo test -p openhuman_core, but the repository only has packages named openhuman and OpenHuman. The -p flag requires a package name, not a lib target name. openhuman_core is the lib name within the openhuman package, so the command will fail with "package ID specification did not match any packages". Change to cargo test -p openhuman.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 459, The README/guide currently instructs running "cargo
test -p openhuman_core" which is incorrect because openhuman_core is a lib
target, not a package; update the command to "cargo test -p openhuman" (or
remove -p entirely and use just "cargo test" if you prefer running tests for the
whole workspace) so the -p flag references the actual package name ("openhuman")
instead of the lib name ("openhuman_core"); change the occurrence of cargo test
-p openhuman_core to cargo test -p openhuman in the document text.

Comment thread Cargo.toml
Comment on lines 58 to +61
rusqlite = { version = "0.37", features = ["bundled"] }
r2d2 = "0.8"
r2d2_sqlite = "0.31"
sqlite-vec = "0.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What rusqlite version does r2d2_sqlite 0.31.0 depend on?

💡 Result:

r2d2_sqlite 0.31.0 depends on rusqlite ^0.37.

Citations:


🏁 Script executed:

#!/bin/bash
# Confirm only one rusqlite version ends up in the lock graph and that
# it matches across rusqlite + r2d2_sqlite re-export.
if [ -f Cargo.lock ]; then
  echo "=== rusqlite versions in Cargo.lock ==="
  awk '/^\[\[package\]\]/{p=0} /name = "rusqlite"/{p=1} p' Cargo.lock | head -40
  echo "=== r2d2_sqlite version in Cargo.lock ==="
  awk '/^\[\[package\]\]/{p=0} /name = "r2d2_sqlite"/{p=1} p' Cargo.lock | head -10
fi

# Check call-site for extension loading hooks (load_extension / sqlite_vec::load).
rg -nP --type=rust -C2 '\b(load_extension|sqlite_vec::load|SqliteConnectionManager|with_init)\b'

Repository: tinyhumansai/openhuman

Length of output: 2079


🏁 Script executed:

#!/bin/bash
# Get full init closure context (more lines)
sed -n '85,105p' src/openhuman/life_capture/index.rs

# Check if sqlite_vec::load is called anywhere
rg -n 'sqlite_vec::load' --type=rust

# Check feature flags in Cargo.toml for bundled and load_extension
rg -n 'features|load_extension|loadable_extension' Cargo.toml

Repository: tinyhumansai/openhuman

Length of output: 4159


🌐 Web query:

Does sqlite-vec require explicit sqlite_vec::load() call or does it auto-register when loaded as an extension?

💡 Result:

sqlite-vec requires an explicit load call to register its functions and virtual tables. It does not auto-register when loaded as an extension. For direct SQLite CLI use, execute .load ./vec0 (or the path to vec0.so/dll) before creating vec0 virtual tables. Official repo sample: .load ./vec0 followed by create virtual table ... using vec0(...). Sources 1,17. In Python bindings (pip install sqlite-vec), call sqlite_vec.load(db) after db.enable_load_extension(True). Official Python docs explicitly state: "use the sqlite_vec.load function to load sqlite-vec SQL functions into a SQLite connection." Sources 2,10. In Rust bindings (cargo add sqlite-vec), explicitly register via unsafe { sqlite3_auto_extension(Some(std::mem::transmute(sqlite3_vec_init as *const ))); } using rusqlite's ffi. This uses SQLite's auto-extension mechanism but requires deliberate caller action (one-time global registration). No evidence of fully automatic registration without code. Official Rust example and docs confirm this explicit step. Sources 23,24. SQLite extensions generally require explicit loading via .load, load_extension, or equivalent API calls; they do not auto-register on mere presence. sqlite-vec follows this, with language-specific wrappers (like sqlite_vec::load) providing convenience but still requiring invocation. Source 3. No sources indicate auto-registration without explicit load/register call.

Citations:


🌐 Web query:

Does rusqlite bundled feature support load_extension for dynamically loading SQLite extensions?

💡 Result:

Yes, the rusqlite "bundled" feature supports load_extension for dynamically loading SQLite extensions.

Citations:


The sqlite-vec extension is not being loaded; vector queries will fail at runtime.

r2d2_sqlite = 0.31 correctly pairs with rusqlite = 0.37 (verified), and the bundled feature supports load_extension. However, the init closure at lines 89–91 only calls c.pragma_update(None, "query_only", "ON") and does not call sqlite_vec::load(&conn). The comment suggesting the extension "registers on the connection at open time" is incorrect—sqlite-vec requires an explicit load call and will not auto-register. Any attempt to query vector tables will fail at runtime because the extension is not available on any connection from the pool. Add sqlite_vec::load(&conn)?; to the init closure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Cargo.toml` around lines 58 - 61, The init closure used to initialize the
SQLite pool only sets PRAGMA via c.pragma_update (the closure that configures
connections) but never loads the sqlite-vec extension; add a call to
sqlite_vec::load(&conn)? inside that same init closure (where
c.pragma_update(None, "query_only", "ON") is invoked) so each new connection has
the extension registered (use the existing conn variable and propagate the error
as needed); this ensures sqlite_vec::load is called for r2d2_sqlite/rusqlite
pooled connections before they are returned.

Comment thread scripts/install.sh
Comment on lines 6 to +14
set -euo pipefail

# Allow tests to source this file without executing the install flow.
SOURCE_ONLY=0
for _arg in "$@"; do
if [[ "$_arg" == "--source-only" ]]; then
SOURCE_ONLY=1
fi
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

set -euo pipefail leaks to the caller when this file is sourced.

The --source-only guard at lines 8-14/361-364 only short-circuits the install body, but set -euo pipefail on line 6 already mutated the parent shell's options by the time the early return runs. A test that sources install.sh --source-only will silently inherit errexit/nounset/pipefail, which can change the behavior of subsequent test code (e.g. unset vars become fatal).

Consider either gating the set on SOURCE_ONLY or saving/restoring the prior shell options.

🛡️ Sketch of the gate
-set -euo pipefail
-
-# Allow tests to source this file without executing the install flow.
-SOURCE_ONLY=0
-for _arg in "$@"; do
-  if [[ "$_arg" == "--source-only" ]]; then
-    SOURCE_ONLY=1
-  fi
-done
+# Allow tests to source this file without executing the install flow.
+SOURCE_ONLY=0
+for _arg in "$@"; do
+  if [[ "$_arg" == "--source-only" ]]; then
+    SOURCE_ONLY=1
+  fi
+done
+if [[ "${SOURCE_ONLY}" != "1" ]]; then
+  set -euo pipefail
+fi

Also applies to: 361-364

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/install.sh` around lines 6 - 14, The top-level "set -euo pipefail"
leaks into the caller when this script is sourced; change the install guard so
that either (a) do not apply "set -euo pipefail" until after you check
SOURCE_ONLY (the --source-only loop) and only enable it when SOURCE_ONLY is not
set, or (b) save the caller's shell options (e.g., via "old_opts=$(set +o)" or
"old_status=$(set -o)" style) before applying "set -euo pipefail" and restore
them before returning; update the code around the SOURCE_ONLY check and the
install-body entry points (references: the SOURCE_ONLY variable, the
--source-only argument handling loop, and the top-level set -euo pipefail) so
sourcing with --source-only does not mutate the parent shell options.

Comment on lines +123 to +125
use crate::openhuman::providers::{
ChatRequest as PChatRequest, ChatResponse, Provider, ToolCall,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

cargo fmt --check fails — collapse to single-line use.

The reformatted multi-line use exceeds nothing and rustfmt collapses it; CI Type Check is currently failing on this. Same fix needed at the make_parent signature below (Line 211-214). As per coding guidelines: "Run cargo fmt and cargo check on all Rust code changes before merging".

🛠️ Proposed fix
-use crate::openhuman::providers::{
-    ChatRequest as PChatRequest, ChatResponse, Provider, ToolCall,
-};
+use crate::openhuman::providers::{ChatRequest as PChatRequest, ChatResponse, Provider, ToolCall};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
use crate::openhuman::providers::{
ChatRequest as PChatRequest, ChatResponse, Provider, ToolCall,
};
use crate::openhuman::providers::{ChatRequest as PChatRequest, ChatResponse, Provider, ToolCall};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs` around lines 123 -
125, The multi-line use declaration and the multi-line function signature cause
rustfmt failures; change the multi-line use of crate::openhuman::providers to a
single-line import (use crate::openhuman::providers::{ChatRequest as
PChatRequest, ChatResponse, Provider, ToolCall};) and collapse the make_parent
function signature (the make_parent definition spanning lines 211–214) into a
single line signature to match rustfmt expectations; update only the formatting,
keeping identifiers and types unchanged.

Comment on lines +211 to +214
fn make_parent(
provider: Arc<dyn Provider>,
tools: Vec<Box<dyn Tool>>,
) -> ParentExecutionContext {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

cargo fmt --check fails — collapse make_parent signature.

Same root cause as the use block above; rustfmt fits this signature on one line.

🛠️ Proposed fix
-fn make_parent(
-    provider: Arc<dyn Provider>,
-    tools: Vec<Box<dyn Tool>>,
-) -> ParentExecutionContext {
+fn make_parent(provider: Arc<dyn Provider>, tools: Vec<Box<dyn Tool>>) -> ParentExecutionContext {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn make_parent(
provider: Arc<dyn Provider>,
tools: Vec<Box<dyn Tool>>,
) -> ParentExecutionContext {
fn make_parent(provider: Arc<dyn Provider>, tools: Vec<Box<dyn Tool>>) -> ParentExecutionContext {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/subagent_runner/ops_tests.rs` around lines 211 -
214, The function signature for make_parent is currently split across multiple
lines causing rustfmt failure; collapse the signature to a single line so
rustfmt accepts it — update the declaration of fn make_parent to place provider:
Arc<dyn Provider>, tools: Vec<Box<dyn Tool>> and the return type ->
ParentExecutionContext all on one line (keeping the existing body unchanged) and
ensure the symbols make_parent, Provider, Tool, and ParentExecutionContext are
used exactly as in the file.

Comment on lines +31 to +39
pub async fn handle_add(
rt: &CuratedMemoryRuntime,
file: String,
entry: String,
) -> Result<RpcOutcome<Value>, String> {
let store = pick(&file, rt)?;
store.add(&entry).await.map_err(|e| format!("add: {e}"))?;
Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: handle_add should reject empty entry.

store.add("") will append ENTRY_SEP to a non-empty file, leaving a stray empty entry between separators. That entry can never be matched by remove(needle) (every non-empty needle "doesn't contain" it) and bloats the file toward char_limit. Aligns with how replace/remove already validate non-empty needle.

🛡️ Proposed validation
 pub async fn handle_add(
     rt: &CuratedMemoryRuntime,
     file: String,
     entry: String,
 ) -> Result<RpcOutcome<Value>, String> {
+    if entry.trim().is_empty() {
+        return Err("entry must not be empty".into());
+    }
     let store = pick(&file, rt)?;
     store.add(&entry).await.map_err(|e| format!("add: {e}"))?;
     Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub async fn handle_add(
rt: &CuratedMemoryRuntime,
file: String,
entry: String,
) -> Result<RpcOutcome<Value>, String> {
let store = pick(&file, rt)?;
store.add(&entry).await.map_err(|e| format!("add: {e}"))?;
Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
}
pub async fn handle_add(
rt: &CuratedMemoryRuntime,
file: String,
entry: String,
) -> Result<RpcOutcome<Value>, String> {
if entry.trim().is_empty() {
return Err("entry must not be empty".into());
}
let store = pick(&file, rt)?;
store.add(&entry).await.map_err(|e| format!("add: {e}"))?;
Ok(RpcOutcome::new(json!({ "file": file, "ok": true }), vec![]))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/curated_memory/rpc.rs` around lines 31 - 39, The handle_add
function must reject empty entries: before calling pick(&file, rt) and
store.add(&entry).await, check that entry is not empty (e.g., entry.is_empty()
or entry.trim().is_empty() per project convention) and return an Err(String)
with a clear message (consistent with other handlers like remove/replace, e.g.,
format!("add: empty entry") or "entry must be non-empty") instead of appending a
stray separator; only call store.add when the entry is non-empty.

Comment on lines +12 to +34
#[derive(Clone)]
pub struct HostedEmbedder {
base_url: String,
api_key: String,
model: String,
http: reqwest::Client,
}

impl HostedEmbedder {
pub fn new(base_url: String, api_key: String, model: String) -> Self {
let http = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.connect_timeout(std::time::Duration::from_secs(10))
.build()
.unwrap_or_else(|_| reqwest::Client::new());
Self {
base_url,
api_key,
model,
http,
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hard-coded dim() = 1536 is a footgun when model is configurable.

HostedEmbedder::new(...) accepts an arbitrary model string (and runtime.rs reads it from OPENHUMAN_EMBEDDINGS_MODEL), but dim() is hard-coded. If a user sets the env var to e.g. text-embedding-3-large (3072) or text-embedding-ada-002 (1536), the dim() answer is decoupled from reality. The pre-flight check in rpc::handle_search (embedder.dim() != INDEX_VECTOR_DIM) will silently pass even when the actual returned vector won't fit the index, and the failure will surface only mid-request with the less-actionable "embedding at index 0 has dim 3072 (expected 1536)" error from embed_batch.

Either pin the model to one whose dim is known to match the index, or carry the dim as configuration:

♻️ Proposed fix — accept dim in the constructor
 pub struct HostedEmbedder {
     base_url: String,
     api_key: String,
     model: String,
+    dim: usize,
     http: reqwest::Client,
 }

 impl HostedEmbedder {
-    pub fn new(base_url: String, api_key: String, model: String) -> Self {
+    pub fn new(base_url: String, api_key: String, model: String, dim: usize) -> Self {
         let http = reqwest::Client::builder()
             .timeout(std::time::Duration::from_secs(30))
             .connect_timeout(std::time::Duration::from_secs(10))
             .build()
             .unwrap_or_else(|_| reqwest::Client::new());
         Self {
             base_url,
             api_key,
             model,
+            dim,
             http,
         }
     }
 }
@@
-    fn dim(&self) -> usize {
-        1536
-    }
+    fn dim(&self) -> usize {
+        self.dim
+    }

runtime.rs would then need a small model→dim resolver (or a third env var).

Also applies to: 106-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/embedder.rs` around lines 12 - 34, HostedEmbedder
currently hard-codes dim() to 1536 while new() accepts any model string, causing
mismatch with INDEX_VECTOR_DIM; update HostedEmbedder to carry the embedding
dimension as configuration (add a dim: usize field and expose a dim() -> usize
method), change HostedEmbedder::new(...) to accept a dim parameter (or a
Model→dim resolver in runtime.rs that maps OPENHUMAN_EMBEDDINGS_MODEL to the
correct dimension), and update sites that construct HostedEmbedder (runtime.rs)
and any checks in rpc::handle_search to use the new dim field so the pre-flight
comparison against INDEX_VECTOR_DIM is accurate; ensure embed_batch callers
still work with the actual returned dimension.

Comment on lines +1 to +336
use anyhow::Context;
use once_cell::sync::OnceCell;
use r2d2_sqlite::SqliteConnectionManager;
use rusqlite::{ffi, Connection, OpenFlags, OptionalExtension};
use std::path::Path;
use std::sync::Arc;
use tokio::sync::Mutex;
use tracing::{debug, trace};

/// r2d2 pool of read-only SQLite connections. Used for file-backed indexes
/// in production so WAL actually buys us concurrent readers; in-memory test
/// indexes keep the single-connection layout since a shared-cache URI adds
/// ceremony for no throughput win at test-fixture scale.
pub(crate) type ReaderPool = r2d2::Pool<SqliteConnectionManager>;

/// Personal index — SQLite database with FTS5 + sqlite-vec virtual tables loaded.
///
/// Internally split into a dedicated writer connection (serialised behind
/// `tokio::sync::Mutex`) and an optional pool of read-only connections. Writes
/// go through the writer (single-writer SQLite model); searches use the pool
/// when present so concurrent RPC callers don't block on each other or on a
/// long-running ingest. In-memory handles leave the pool `None` — both
/// readers and writers share the single connection, matching the pre-pool
/// behaviour tests depend on.
pub struct PersonalIndex {
/// Writer connection — always present. Also used as the sole reader when
/// `reader_pool` is `None` (in-memory handles).
pub writer: Arc<Mutex<Connection>>,
/// Optional read-only connection pool. `Some` for file-backed indexes;
/// `None` for in-memory ones so the writer's schema is visible to readers
/// without a shared-cache URI.
pub(crate) reader_pool: Option<Arc<ReaderPool>>,
}

static VEC_REGISTERED: OnceCell<()> = OnceCell::new();

/// Register `sqlite3_vec_init` as a SQLite auto-extension exactly once per process.
/// Every connection opened after this point loads the vec0 module automatically.
pub(crate) fn ensure_vec_extension_registered() {
VEC_REGISTERED.get_or_init(|| unsafe {
let init: unsafe extern "C" fn() = sqlite_vec::sqlite3_vec_init;
let entry: unsafe extern "C" fn(
*mut ffi::sqlite3,
*mut *mut std::os::raw::c_char,
*const ffi::sqlite3_api_routines,
) -> std::os::raw::c_int = std::mem::transmute(init as *const ());
let rc = ffi::sqlite3_auto_extension(Some(entry));
if rc != ffi::SQLITE_OK {
panic!("sqlite3_auto_extension(sqlite_vec) failed: rc={rc}");
}
});
}

/// Default pool size — small enough to stay within SQLite's default
/// connection fan-out without heroics, large enough that a handful of
/// concurrent RPC searches don't queue up behind each other while an
/// ingest holds the writer. Override is deliberately not exposed: the
/// numbers here are only loosely tuned and are best treated as an
/// implementation detail, not a knob.
const READER_POOL_SIZE: u32 = 4;

impl PersonalIndex {
/// Open (or create) the personal index at `path`. Loads sqlite-vec, runs
/// migrations on the writer, then spins up a read-only r2d2 pool against
/// the same file so searches can run concurrently with an in-flight
/// write.
pub async fn open(path: &Path) -> anyhow::Result<Self> {
ensure_vec_extension_registered();
let path_buf = path.to_path_buf();
let (writer, pool) =
tokio::task::spawn_blocking(move || -> anyhow::Result<(Connection, ReaderPool)> {
// Writer: exclusive RW connection, WAL journal so readers
// don't block on it, migrations run here.
let writer = Connection::open(&path_buf).context("open sqlite writer")?;
writer.pragma_update(None, "journal_mode", "WAL")?;
writer.pragma_update(None, "foreign_keys", "ON")?;
super::migrations::run(&writer).context("run life_capture migrations")?;

// Reader pool: read-only connections on the same file. Each
// pooled connection gets `query_only = 1` as a belt-and-
// suspenders guard so a mis-routed write here fails loudly
// instead of racing the writer. Foreign-key enforcement is
// writer-only in SQLite, so readers skip it.
//
// `SQLITE_OPEN_READ_ONLY` alone would also work, but keeping
// the connection RW + `query_only` lets sqlite-vec register
// on the connection at open time (the extension's own init
// writes to the sqlite_sequence table on first use).
let manager = SqliteConnectionManager::file(&path_buf).with_init(|c| {
c.pragma_update(None, "query_only", "ON")?;
Ok(())
});
let pool = r2d2::Pool::builder()
.max_size(READER_POOL_SIZE)
.build(manager)
.context("build reader pool")?;
Ok((writer, pool))
})
.await
.context("open task panicked")??;
Ok(Self {
writer: Arc::new(Mutex::new(writer)),
reader_pool: Some(Arc::new(pool)),
})
}

/// Open an in-memory index (for tests). Single connection shared between
/// reader and writer — keeps the fixture path free of shared-cache URI
/// ceremony that buys no real concurrency at test-fixture scale.
pub async fn open_in_memory() -> anyhow::Result<Self> {
ensure_vec_extension_registered();
let conn = tokio::task::spawn_blocking(|| -> anyhow::Result<Connection> {
let conn = Connection::open_with_flags(
":memory:",
OpenFlags::SQLITE_OPEN_READ_WRITE
| OpenFlags::SQLITE_OPEN_CREATE
| OpenFlags::SQLITE_OPEN_NO_MUTEX,
)
.context("open in-memory sqlite")?;
conn.pragma_update(None, "foreign_keys", "ON")?;
super::migrations::run(&conn).context("run life_capture migrations")?;
Ok(conn)
})
.await
.context("open task panicked")??;
Ok(Self {
writer: Arc::new(Mutex::new(conn)),
reader_pool: None,
})
}
}

/// Writer for the personal index. Upserts items keyed by (source, external_id)
/// so re-ingesting the same source row updates in place; vectors are written
/// separately via `upsert_vector` since embeddings come from a remote call.
pub struct IndexWriter {
conn: Arc<Mutex<Connection>>,
}

impl IndexWriter {
pub fn new(idx: &PersonalIndex) -> Self {
Self {
conn: Arc::clone(&idx.writer),
}
}

/// Upsert items by (source, external_id). FTS rows are kept in sync via
/// the SQL triggers defined in 0001_init.sql.
///
/// Mutates `items[i].id` to the canonical (existing) UUID when a row with
/// the same (source, external_id) already exists — so callers can write the
/// vector under the correct id without orphaning the previous one. Cascades
/// `item_vectors` deletes for any caller-supplied id that loses the race.
pub async fn upsert(
&self,
items: &mut [crate::openhuman::life_capture::types::Item],
) -> anyhow::Result<()> {
if items.is_empty() {
return Ok(());
}
// Clone across the blocking boundary, write canonical ids back below.
let mut owned = items.to_vec();
let conn = Arc::clone(&self.conn);
let result: anyhow::Result<Vec<crate::openhuman::life_capture::types::Item>> =
tokio::task::spawn_blocking(move || {
let mut guard = conn.blocking_lock();
let tx = guard.transaction().context("begin upsert tx")?;
for item in owned.iter_mut() {
let source = serde_json::to_value(item.source)
.ok()
.and_then(|v| v.as_str().map(str::to_string))
.unwrap_or_default();
let author_json = item
.author
.as_ref()
.map(|a| serde_json::to_string(a).unwrap_or_else(|_| "null".into()));
let metadata_json = serde_json::to_string(&item.metadata)
.unwrap_or_else(|_| "{}".into());
let new_id = item.id.to_string();

// Look up the existing canonical id, if any.
let existing_id: Option<String> = tx
.query_row(
"SELECT id FROM items WHERE source = ?1 AND external_id = ?2",
rusqlite::params![source, item.external_id],
|row| row.get(0),
)
.optional()
.context("lookup existing item by (source, external_id)")?;

match existing_id {
Some(canonical) => {
// Existing row wins. If the caller supplied a
// different id, drop any vector they may have
// already written under that wrong id.
if canonical != new_id {
tx.execute(
"DELETE FROM item_vectors WHERE item_id = ?1",
rusqlite::params![new_id],
)
.context("orphan-clean stale vector by caller id")?;
}
tx.execute(
"UPDATE items \
SET ts = ?1, author_json = ?2, subject = ?3, \
text = ?4, metadata_json = ?5 \
WHERE id = ?6",
rusqlite::params![
item.ts.timestamp(),
author_json,
item.subject.as_deref(),
item.text,
metadata_json,
canonical,
],
)
.context("update existing item row")?;
// Mutate the caller's item so subsequent
// upsert_vector writes under the canonical id.
item.id = uuid::Uuid::parse_str(&canonical)
.context("existing items.id is not a valid uuid")?;
}
None => {
tx.execute(
"INSERT INTO items(id, source, external_id, ts, author_json, subject, text, metadata_json) \
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8)",
rusqlite::params![
new_id,
source,
item.external_id,
item.ts.timestamp(),
author_json,
item.subject.as_deref(),
item.text,
metadata_json,
],
)
.context("insert new item row")?;
}
}
}
tx.commit().context("commit upsert tx")?;
Ok(owned)
})
.await
.context("upsert task panicked")?;

let updated = result?;
// Move canonical ids back into the caller's slice.
for (slot, item) in items.iter_mut().zip(updated.into_iter()) {
*slot = item;
}
Ok(())
}

/// Replace the vector for an item atomically: DELETE + INSERT inside a
/// single transaction so a failed INSERT doesn't permanently remove the
/// item's vector. (vec0 doesn't support ON CONFLICT on its primary key.)
pub async fn upsert_vector(&self, item_id: &uuid::Uuid, vector: &[f32]) -> anyhow::Result<()> {
let id = item_id.to_string();
let v_json = serde_json::to_string(vector).context("serialize vector")?;
let conn = Arc::clone(&self.conn);
tokio::task::spawn_blocking(move || -> anyhow::Result<()> {
let mut guard = conn.blocking_lock();
let tx = guard.transaction().context("begin upsert_vector tx")?;
let exists: i64 = tx
.query_row(
"SELECT COUNT(1) FROM items WHERE id = ?1",
rusqlite::params![id],
|row| row.get(0),
)
.context("check item exists")?;
if exists == 0 {
anyhow::bail!("cannot upsert vector for unknown item {id}");
}
tx.execute(
"DELETE FROM item_vectors WHERE item_id = ?1",
rusqlite::params![id],
)
.context("delete prior vector")?;
tx.execute(
"INSERT INTO item_vectors(item_id, embedding) VALUES (?1, ?2)",
rusqlite::params![id, v_json],
)
.context("insert vector")?;
tx.commit().context("commit upsert_vector tx")?;
Ok(())
})
.await
.context("upsert_vector task panicked")?
}
}

#[cfg(test)]
mod tests {
use super::*;

#[tokio::test]
async fn vec_extension_loads_and_reports_version() {
let idx = PersonalIndex::open_in_memory().await.expect("open");
let conn = idx.writer.lock().await;
let version: String = conn
.query_row("SELECT vec_version()", [], |row| row.get(0))
.expect("vec_version");
assert!(
version.starts_with('v'),
"unexpected vec_version: {version}"
);
}

#[tokio::test]
async fn vec0_table_accepts_insert_and_returns_match() {
let idx = PersonalIndex::open_in_memory().await.expect("open");
let conn = idx.writer.lock().await;

let v: Vec<f32> = (0..1536).map(|i| (i as f32) * 0.001).collect();
let v_json = serde_json::to_string(&v).unwrap();

conn.execute(
"INSERT INTO item_vectors(item_id, embedding) VALUES (?1, ?2)",
rusqlite::params!["00000000-0000-0000-0000-000000000001", v_json],
)
.expect("insert vec");

let id: String = conn
.query_row(
"SELECT item_id FROM item_vectors \
WHERE embedding MATCH ?1 \
ORDER BY distance LIMIT 1",
rusqlite::params![v_json],
|row| row.get(0),
)
.expect("vec MATCH query");
assert_eq!(id, "00000000-0000-0000-0000-000000000001");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

File exceeds the ~500-line module-size guideline (≈927 lines).

This module bundles PersonalIndex setup, the writer, the reader, FTS sanitization, query post-filtering, and four test modules in one file. Splitting reduces cognitive load and keeps each unit sharply focused.

Suggested layout under src/openhuman/life_capture/:

  • index/mod.rs — re-exports + PersonalIndex, ensure_vec_extension_registered, ReaderPool, pool sizing.
  • index/writer.rsIndexWriter + writer-side helpers (the Source serialization, serde_json fallbacks).
  • index/reader.rsIndexReader, ItemRow, with_read_conn, fts5_quote, apply_query_filters.
  • Existing #[cfg(test)] mod blocks move next to the code they exercise.

As per coding guidelines: "Rust source files should be kept to approximately ≤500 lines per file; split modules when growing beyond this size".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/index.rs` around lines 1 - 336, The file is too
large and should be split into focused modules: keep the public types and
process-wide init in index/mod.rs (PersonalIndex,
ensure_vec_extension_registered, ReaderPool, READER_POOL_SIZE), move writer
logic into index/writer.rs (IndexWriter, its upsert and upsert_vector, and any
writer-side helpers like Source/serde_json fallbacks), move read/query logic
into index/reader.rs (IndexReader, ItemRow, with_read_conn, fts5_quote,
apply_query_filters and reader helpers), and relocate each #[cfg(test)] mod next
to the code it tests; update mod declarations and use statements accordingly so
all referenced symbols (PersonalIndex, IndexWriter, IndexReader, ItemRow,
with_read_conn, fts5_quote, apply_query_filters,
VEC_REGISTERED/ensure_vec_extension_registered) remain accessible.

Comment on lines +96 to +106
let api_key = std::env::var("OPENHUMAN_EMBEDDINGS_KEY")
.or_else(|_| std::env::var("OPENAI_API_KEY"))
.ok();
let Some(api_key) = api_key else {
log::info!(
"[life_capture] embedder not configured (set OPENAI_API_KEY or \
OPENHUMAN_EMBEDDINGS_KEY); life_capture.search will return \
'embedder not configured', life_capture.get_stats remains available"
);
return;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: empty OPENHUMAN_EMBEDDINGS_KEY shadows OPENAI_API_KEY.

std::env::var(...) returns Ok("") when the variable is set but empty, so .or_else doesn't trigger. A user with OPENHUMAN_EMBEDDINGS_KEY= (e.g. exported with no value in a shell init) and a valid OPENAI_API_KEY will hit the auth-failure path instead of falling back. Filter out empty strings to make precedence intuitive:

🛡️ Proposed change
-    let api_key = std::env::var("OPENHUMAN_EMBEDDINGS_KEY")
-        .or_else(|_| std::env::var("OPENAI_API_KEY"))
-        .ok();
+    let api_key = std::env::var("OPENHUMAN_EMBEDDINGS_KEY")
+        .ok()
+        .filter(|s| !s.is_empty())
+        .or_else(|| std::env::var("OPENAI_API_KEY").ok())
+        .filter(|s| !s.is_empty());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let api_key = std::env::var("OPENHUMAN_EMBEDDINGS_KEY")
.or_else(|_| std::env::var("OPENAI_API_KEY"))
.ok();
let Some(api_key) = api_key else {
log::info!(
"[life_capture] embedder not configured (set OPENAI_API_KEY or \
OPENHUMAN_EMBEDDINGS_KEY); life_capture.search will return \
'embedder not configured', life_capture.get_stats remains available"
);
return;
};
let api_key = std::env::var("OPENHUMAN_EMBEDDINGS_KEY")
.ok()
.filter(|s| !s.is_empty())
.or_else(|| std::env::var("OPENAI_API_KEY").ok())
.filter(|s| !s.is_empty());
let Some(api_key) = api_key else {
log::info!(
"[life_capture] embedder not configured (set OPENAI_API_KEY or \
OPENHUMAN_EMBEDDINGS_KEY); life_capture.search will return \
'embedder not configured', life_capture.get_stats remains available"
);
return;
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/runtime.rs` around lines 96 - 106, The current env
lookup treats an empty OPENHUMAN_EMBEDDINGS_KEY as a valid value and thus never
falls back to OPENAI_API_KEY; update the retrieval so empty strings are treated
as unset: call std::env::var("OPENHUMAN_EMBEDDINGS_KEY") and
std::env::var("OPENAI_API_KEY") but filter out empty results (e.g. map to Option
and .filter(|s| !s.is_empty()) or convert empty Ok("") into None) before
applying the precedence logic that assigns to the local api_key and the
subsequent let Some(api_key) = api_key else { ... } block so a blank
OPENHUMAN_EMBEDDINGS_KEY will correctly fall back to OPENAI_API_KEY.

Comment on lines +197 to +226
async fn run_query(reader: &IndexReader, q: &FixtureQuery) -> Result<Vec<String>> {
let hits = match q.kind {
QueryKind::Keyword => reader
.keyword_search(&q.text, q.k)
.await
.with_context(|| format!("keyword_search failed for {}", q.id))?,
QueryKind::Semantic => {
let v = embed(&q.text);
reader
.vector_search(&v, q.k)
.await
.with_context(|| format!("vector_search failed for {}", q.id))?
}
QueryKind::Mixed => {
let v = embed(&q.text);
let query = Query {
text: q.text.clone(),
k: q.k,
sources: q.sources.clone(),
since: q.since,
until: q.until,
};
reader
.hybrid_search(&query, &v)
.await
.with_context(|| format!("hybrid_search failed for {}", q.id))?
}
};
Ok(hit_ext_ids(&hits))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

R@5 collapses to R@q.k when the query asks for fewer than 5 hits.

run_query only retrieves q.k results, and found_at clamps with n = k.min(hits_ext.len()). So for any fixture query with k < 5, the reported recall_at_5 is identical to recall_at_k — positions 4-5 are never inspected, even when more relevant items exist there. Today the baseline survives (R@5 > R@3 → at least some queries set k≥5), but as new queries are added with smaller k the per-kind/MEAN R@5 numbers will silently understate recall.

Decoupling the retrieval cutoff from the metric cutoff fixes this without changing the fixture:

♻️ Suggested fix in run_query
 async fn run_query(reader: &IndexReader, q: &FixtureQuery) -> Result<Vec<String>> {
+    // Always retrieve enough to score the largest reported R@k, regardless of q.k.
+    let retrieve_k = q.k.max(5);
     let hits = match q.kind {
         QueryKind::Keyword => reader
-            .keyword_search(&q.text, q.k)
+            .keyword_search(&q.text, retrieve_k)
             .await
             .with_context(|| format!("keyword_search failed for {}", q.id))?,
         QueryKind::Semantic => {
             let v = embed(&q.text);
             reader
-                .vector_search(&v, q.k)
+                .vector_search(&v, retrieve_k)
                 .await
                 .with_context(|| format!("vector_search failed for {}", q.id))?
         }
         QueryKind::Mixed => {
             let v = embed(&q.text);
             let query = Query {
                 text: q.text.clone(),
-                k: q.k,
+                k: retrieve_k,
                 sources: q.sources.clone(),
                 since: q.since,
                 until: q.until,
             };
             reader
                 .hybrid_search(&query, &v)
                 .await
                 .with_context(|| format!("hybrid_search failed for {}", q.id))?
         }
     };
     Ok(hit_ext_ids(&hits))
 }

Note that this changes what must_contain_in_top and must_not_contain_in_top see — they should keep using q.k (already do, via q.must_contain_in_top.unwrap_or(q.k)), so the assertions remain query-budget-faithful while the metrics get an honest top-5.

Also applies to: 318-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/life_capture_retrieval_eval.rs` around lines 197 - 226, run_query
currently requests only q.k hits so R@5 collapses to R@q.k for queries with k<5;
change the search call to request a fetch_k = q.k.max(5) (use fetch_k when
calling reader.keyword_search, vector_search and hybrid_search) and still return
hit_ext_ids(&hits) (leave any budget-aware assertions like must_contain_in_top
using q.k unchanged). Apply the same change in the other occurrence identified
around lines 318-343 (use fetch_k = q.k.max(5) for retrievals, but keep
downstream checks that rely on q.k).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants