feat(life_capture): A1 iMessage ingest controller by jwalin-shah · Pull Request #818 · tinyhumansai/openhuman

jwalin-shah · 2026-04-23T01:27:06Z

⚠️ Depends on #817

This PR stacks on #817 (life-capture foundation). Commit be1112f9 introduces the life_capture.ingest controller which requires the PersonalIndex + IndexWriter types from #817. Do not merge until #817 lands — after that I'll rebase this onto new main and the diff auto-shrinks to just the A1 delta (~1 commit).

Summary

Add life_capture.ingest RPC controller — idempotent upsert by (source, external_id), embeds text, atomic vector replace.
Switch iMessage scanner's ingest_group() from openhuman.memory_doc_ingest to openhuman.life_capture_ingest, so chat-day transcripts land in the PersonalIndex (the same store life_capture.search reads from).
Fix life_capture.search envelope shape: wrap hits in {"hits": [...]} per its declared schema (was returning a bare array).
New #[ignore]d e2e at tests/imessage_ingest_e2e.rs: ingest → search → re-ingest, asserts replaced=true and total_items unchanged.

Test plan

cargo test -p openhuman --lib openhuman::life_capture — 27 passing
cargo test --test imessage_ingest_e2e -- --ignored --test-threads=1 — 2 passing
cargo check --manifest-path Cargo.toml
cargo check --manifest-path app/src-tauri/Cargo.toml
Manual: live tick against real ~/Library/Messages/chat.db via yarn tauri dev, confirm hits appear in life_capture.search

Summary by CodeRabbit

New Features
- Life-capture: local ingest, indexing, embedding and search for Gmail/Calendar/iMessage
- EventKit bridge: calendar reads and reminders creation on macOS
- People: contact resolver, scoring, and address-book seeding
- Curated memory: session snapshots and editable personal notes
Tests
- Installer smoke test added to CI to validate resolver/install flow
Documentation
- Detailed life-capture architecture and ship pipeline plans added
macOS Permissions
- Privacy usage strings for Contacts, Calendars, and Reminders added

coderabbitai · 2026-04-23T01:27:13Z

📝 Walkthrough

Walkthrough

Adds foundational life-capture and curated-memory subsystems plus people/eventkit domains, wiring index/embedder/ingest/storage into server startup and agent prompt contexts; also adds installer resolver test and macOS permissions, many new modules, schemas, RPC handlers, and extensive tests.

Changes

Cohort / File(s)	Summary
CI & Installer `.github/workflows/installer-smoke.yml`, `scripts/install.sh`, `scripts/test_install.sh`, `scripts/fixtures/latest.json`	Added resolver unit test to smoke CI, introduced `--source-only` mode, refactored asset resolution (`resolve_asset_url`), added HTTP reachability helper and test fixture.
Dependencies & macOS config `Cargo.toml`, `app/src-tauri/Info.plist`	Added r2d2/sqlite-vec/httpmock/objc2 and macOS-specific crates; added Contacts/Calendars/Reminders privacy usage strings.
Core startup & registry `src/core/jsonrpc.rs`, `src/core/all.rs`	Synchronous bootstrap of curated-memory and life-capture at server start; registered new controller namespaces (`life_capture`, `people`, `curated_memory`, `eventkit`) and descriptions.
Life-capture index & runtime `src/openhuman/life_capture/*`, `src/openhuman/life_capture/index.rs`, `.../migrations.rs`, `.../runtime.rs`, `.../embedder.rs`, `.../rpc.rs`, `.../schemas.rs`, `.../types.rs`	New PersonalIndex (SQLite + sqlite-vec) with migrations, reader/writer APIs, embedder trait + hosted impl, RPC handlers (ingest/search/stats), runtime initialization and schemas.
Ingest bridge (Composio) `src/openhuman/life_capture/ingest/composio.rs`, `src/openhuman/life_capture/ingest/mod.rs`	Composio bridge that normalizes Gmail/Calendar webhook triggers into life-capture ingest flow and registers idempotent subscriber.
Quote stripping & redaction utilities `src/openhuman/life_capture/quote_strip.rs`, `.../redact.rs`	Implement quoted-reply stripping and PII redaction utilities with tests.
Curated memory subsystem `src/openhuman/curated_memory/*`	New MemoryStore, snapshot type, runtime singleton, RPC handlers (read/add/replace/remove), controller schemas and re-exports.
People domain (resolver, store, scoring) `src/openhuman/people/*`, `src/openhuman/people/resolver.rs`, `.../store.rs`, `.../scorer.rs`, `.../schemas.rs`, `.../rpc.rs`, `.../types.rs`	New PeopleStore, HandleResolver, address-book integration, deterministic aliasing, interaction recording, scoring functions, RPC handlers and schemas.
EventKit bridge (macOS) `src/openhuman/eventkit/*`, `src/openhuman/eventkit/calendar.rs`, `.../reminders.rs`, `.../store.rs`, `.../runtime.rs`, `.../schemas.rs`, `.../types.rs`	macOS EventKit calendar read + reminders write with authorization bridging, SQLite cache/migrations, runtime singletons, RPC handlers and schemas; non-macOS stubs return NotSupported.
Agent prompt & snapshot plumbing `src/openhuman/agent/harness/*`, `src/openhuman/agent/prompts/{mod,types}.rs`, `src/openhuman/agent/prompts/tests.rs`, `src/openhuman/agent/.../prompt.rs`	Added session-scoped `curated_snapshot` to Agent/PromptContext/ParentExecutionContext; capture snapshot on first turn; inject snapshot-backed user files into prompt rendering; updated many tests to include `curated_snapshot: None`.
iMessage scanner update `app/src-tauri/src/imessage_scanner/mod.rs`	Switches ingest RPC to `openhuman.life_capture_ingest` and adapts payload to life-capture schema; parses `day` → unix ts.
Tests reorganization & new suites `src/openhuman/channels/providers/telegram/channel_tests.rs`, `src/openhuman/providers/compatible_tests.rs`, `src/openhuman/agent/prompts/tests.rs`, `tests/composio_ingest_e2e.rs`, `src/openhuman/life_capture/tests/e2e.rs`	Moved inline tests to separate files, added large Telegram & provider test suites, life-capture e2e, and Composio ingest e2e (ignored).
Examples & extras `examples/real_imessage_demo.rs`	New macOS iMessage example that builds an in-memory personal index from iMessage DB and runs keyword queries.
Public re-exports & module entries `src/openhuman/{mod.rs, curated_memory/mod.rs, life_capture/mod.rs, eventkit/mod.rs, people/mod.rs}`	Added module entrypoints and re-exports for new domains to crate API surface.
Small API/struct adjustments various files (`tests/*`, `src/openhuman/webview_notifications/types.rs`, `src/openhuman/channels/providers/telegram/channel.rs`)	Minor visibility change (TelegramTypingTask pub(super)), derive Default for NotificationSettings, updated test stubs to set `curated_snapshot: None`.

Sequence Diagram(s)

sequenceDiagram
    participant Gmail as Gmail Webhook
    participant Composio as ComposioBridge
    participant Embedder as HostedEmbedder
    participant Index as PersonalIndex
    participant Writer as IndexWriter
    participant RPC as life_capture.rpc

    Gmail->>Composio: Trigger event (messageId, payload)
    Composio->>Composio: Normalize → Item{source,external_id,ts,subject,text,metadata}
    Composio->>Embedder: embed_batch([text])
    Embedder-->>Composio: vector
    Composio->>Writer: upsert(Item)
    Writer->>Index: write/transaction (dedupe by source+external_id)
    Index-->>Writer: item_id, replaced
    Writer-->>Composio: upsert result
    Composio->>RPC: return ingest outcome (item_id,replaced)

sequenceDiagram
    participant User as Agent Session
    participant Agent as Agent
    participant Snap as CuratedSnapshot
    participant Prompt as PromptBuilder
    participant Reader as IndexReader
    participant Index as PersonalIndex

    User->>Agent: start conversation (first turn)
    Agent->>Snap: ensure_curated_snapshot()
    Snap-->>Agent: optional snapshot
    Agent->>Prompt: build(PromptContext{curated_snapshot})
    Prompt->>Reader: hybrid_search(query, snapshot-aware)
    Reader->>Index: read FTS + vectors (frozen snapshot used for user files)
    Index-->>Reader: hits
    Reader-->>Agent: inject citations
    Agent->>User: respond with cited items

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: dynamic connected integrations in agent system prompts #520 — Adds similar ParentExecutionContext/Agent/PromptContext snapshot plumbing; overlaps on session-scoped curated snapshot threading.
feat(composio): native Rust providers for per-toolkit profile, sync, triggers #509 — Adds Composio event-bus plumbing and subscribers; directly related to the Composio→life-capture bridge.
feat(screen-intelligence): standalone server with OCR + vision pipeline #382 — Modifies server startup bootstrap ordering; overlaps with synchronous curated-memory/life-capture init in startup.

Suggested reviewers

senamakel

"🐰
I nibble logs and stitch the thread,
From inbox crumbs to memories spread;
Contacts, calendars, vectors too,
I tuck them safe for me and you—
A hopping patchwork, fresh and new!"

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch test/imessage-live-e2e

Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/ iMessage into a local personal_index.db, with a Today brief and retrieval- tuned chat as the two surfaces. Three composable privacy modes (Convenience / Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from ~/projects/inbox.

Two implementation plans drafted from the life-capture spec: - Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds. - Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db, Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread stripping, hybrid retrieval (vector + keyword + recency), controller schema + RPC. End-to-end test with synthetic items. No ingestion or UI yet — those are subsequent milestone plans.

Refactors scripts/install.sh to expose resolve_asset_url and verify_asset_reachable. Adds scripts/test_install.sh that exercises the resolver against a committed fixture latest.json. Failures now report the resolved URL and the parse error instead of dying silently.

…le, keyword projection column

… hardening

Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an agent-writable, char-bounded curated memory file pair. F15 builds the char-bounded store with atomic writes; F16 wires a session-start snapshot into the system prompt (preserves prefix cache); F17 exposes memory.{add,replace,remove,read} through the controller dispatch so both the agent loop and skills can use it. Sits between TinyHumans synthesis (volatile) and personal_index.db (raw events) — fills the "deliberately curated facts" gap that neither covered.

Adds src/openhuman/life_capture/ module tree (stubs for embedder, index, migrations, quote_strip, redact, rpc, schemas, types, plus tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs. Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing rusqlite/regex/once_cell/async-trait/tempfile already in tree (Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).

…dexStats

…egex Previously the country-code group required at least one digit (\+?\d{1,3}), so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the whole prefix in a non-capturing optional group so parenthesized area codes match without a leading country code.

…te, char-bounded)

… table Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets WAL+foreign_keys, open_in_memory() is for tests. vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.

…_id) dedupe Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>. Exposes ensure_vec_extension_registered() crate-wide so the migrations test can register vec0 before opening its in-memory connection.

- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …), plus the user:local ACL EXISTS clause so the same query shape works for the multi-token team v2 ACL without rewrites. - vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement) and ORDER BY distance; score = 1/(1 + distance) so callers can blend it with keyword bm25 on the same monotonic scale. Shared ItemRow + into_hit() helper lets both queries reuse the same row shape; rusqlite query_map closures hand-build it because rusqlite has no sqlx FromRow equivalent.

Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs, normalises each independently to [0,1], then re-ranks with 0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30). Documents present in only one leg get 0 for the missing signal but still compete on the others; only items with neither signal are dropped. Same-vector twin items now break their tie by recency.

…rch) Adds two controllers exposed via the registry: - life_capture.get_stats — total items, per-source counts, last-ingest ts. - life_capture.search — hybrid (vector + keyword + recency) search with embed-then-rank, optional k (default 10, capped at 100). Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>) and have no per-call context object. F14 will call runtime::init() at app startup; until then handlers return a structured 'not initialised' error so the failure mode is loud, not silent. Schemas registered in core/all.rs alongside cron.

… → search) FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector so the same text round-trips identically through vec0 — keeps the e2e test hermetic, no network call. Verifies that: - quote-strip drops the 'On … wrote:' block before indexing, - redact masks the email address before indexing, - the upserted item resurfaces via hybrid_search with both transformations preserved on the returned text.

After Config::load_or_init succeeds, open the personal_index.db at config.workspace_dir and register the life_capture runtime. Embedder is env-gated: OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1) OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small) If no key is set we still open the index file (so Plan #2's ingestion worker can write to it) but skip runtime registration — controllers then return the 'not initialised' error from runtime::get() instead of silently calling a misconfigured embedder. Config-schema integration deferred to a follow-up; env-driven keeps F14 non-invasive while we land the rest of the foundation.

…ection snapshot_pair(memory, user) returns a MemorySnapshot containing plain strings — no reference back to the stores — so taking a snapshot at session start and reusing it across turns gives a stable system-prompt prefix and lets the LLM prefix-cache hit on every subsequent turn. Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader, but openhuman assembles agent prompts per-agent under src/openhuman/agent/agents/*/prompt.rs (not the plan's stale src-tauri/src/commands/chat.rs path); the prompt-builder integration is deferred to Plan #2 alongside the agent-context refactor.

… controllers Adds four controllers under the memory_curated namespace (the 'memory' namespace is already owned by the long-term memory subsystem): - memory_curated.read — read MEMORY.md or USER.md - memory_curated.add — append entry, char-bounded - memory_curated.replace — substring replace, char-bounded - memory_curated.remove — drop entries containing a needle 'file' input is a typed enum (memory|user) so adapters reject anything else at validation time. Runtime state lives in curated_memory::runtime behind a tokio OnceCell, mirroring life_capture's pattern: startup in jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes' char limits (memory: 2200, user: 1375) and registers the runtime. Handlers return 'not initialised' until init runs, so failure is loud.

…ings Findings from Codex + Gemini second-opinion review: 1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On (source, external_id) conflict the row's id was kept but the caller's fresh UUID was used for upsert_vector — the vector wrote under an id no row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same transaction; mutates caller's Item.id to the canonical id (so the next upsert_vector lands on the right row) and orphan-deletes any vector already written under the wrong id. Signature change: upsert(&[Item]) -> upsert(&mut [Item]). 2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a failed INSERT permanently lost the item's vector. Now wrapped in tx. 3. Runtime over-gated get_stats on the embedder (Codex). Split runtime into separate INDEX + EMBEDDER OnceCells: get_stats only requires index, search requires both. Index initialises whenever the workspace dir is reachable; embedder is env-gated as before. 4. Startup race (both reviewers): runtime init lived inside the post- serve tokio::spawn block, so axum::serve was already accepting before the OnceCells were set. Hoisted both bootstraps (curated_memory and life_capture) into helpers called inline before axum::serve. 5. runtime::get error message lied (Codex) — said 'set embeddings.api_key in config' but startup actually reads env vars. Fixed text. Bonus: rename controller namespace memory_curated -> curated_memory (Codex preference; nothing depends on it yet so renamed before clients do). Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable). 4696 lib tests pass.

…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.

Critical: - curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`, which could rewrite MEMORY.md / USER.md from an empty baseline on transient I/O or permission failures. Reject empty needle in replace/remove. - curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC boundary (remove with "" deletes every entry). - life_capture/index.rs: upsert_vector rejects orphan ids (items row missing), which would have inserted vectors that never join in vector_search. - life_capture/embedder.rs: validate response length matches input, indices are contiguous 0..n, and every vector matches dim() — prevents silent misalignment or wrong-dimensional vectors flowing into the 1536-wide sqlite-vec table. Added 30s request + 10s connect timeouts. Retrieval hardening: - life_capture/index.rs: keyword_search now escapes FTS5 operators by tokenizing on whitespace and wrapping each token as a quoted literal. Prevents errors or unintended matches from stray quotes, AND/OR/NEAR, column specifiers in user input. - life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until via post-filter (single consistent pass across keyword+vector fusion). - life_capture/rpc.rs: handle_search validates embedder dim and response length against the fixed 1536-wide index column — clear RPC error instead of a cryptic sqlite-vec failure when the embeddings model is swapped. Docs: - docs/event-bus.md: remove erroneous `.await` on register_native_global (it is sync; the handler closure is async). - life-capture design spec: soften PII redaction claim — only regex is implemented today; light NER is flagged as a future enhancement.

Rendered prompts now surface runtime curated-memory writes: - `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`. - `UserFilesSection` prefers the snapshot over the workspace-file loader when one is attached, and injects `USER.md` alongside `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper. - `Session` carries `Option<Arc<MemorySnapshot>>`, populated by `ensure_curated_snapshot` on the first turn from `curated_memory::runtime::get()`. Reused across turns so prompt bytes stay frozen (KV-cache prefix contract) while mid-session `curated_memory.add/replace/remove` writes land on the NEXT session. - `ParentExecutionContext` inherits the snapshot so sub-agents render identical `MEMORY.md`/`USER.md` blocks as the parent. - Legacy workspace-file fallback preserved for embeds that don't initialise the curated-memory runtime (pure unit tests).

…ctness (A7 review blockers) Block lifetime (calendar.rs, reminders.rs): replace the UB cast `&*block as *const _ as *mut _` with `RcBlock::as_ptr(&block).cast()`. `RcBlock::as_ptr` returns a proper `*mut Block<F>` owned by the RcBlock; holding `block` alive until after `blocking_recv()` returns guarantees the block outlives EventKit's retention window with no UB. EKEventStore caching (runtime.rs, calendar.rs, reminders.rs): add a process-global `Arc<Mutex<EventStoreHandle>>` singleton via `OnceLock`. `EventStoreHandle` is a newtype over `Retained<EKEventStore>` with manual `Send + Sync` — EKEventStore is Apple-documented as thread-safe and is always guarded by the Mutex at call sites. `fetch_events` and `save_reminder` now accept the cached store instead of constructing their own, matching Apple's one-store-per-process recommendation. runtime::get() is also changed from panic to `Result<_, String>` so RPC callers receive a clean error instead of a core crash on DB open/migration failure. Dedupe correctness (calendar.rs): chose option (a) — skip events that lack `calendarItemExternalIdentifier` entirely (log at `info`). The fallback to the local `calendarItemIdentifier` caused re-emission: before iCloud syncs, the local id yields hash A; after sync the external id yields hash B — same event appears twice. Skipping once is safer than silently duping; the log makes the miss rate observable. Validation (rpc.rs): `handle_create_reminder` now rejects empty/blank title, priority > 9, and malformed RFC 3339 due_date with structured error strings. Past due_date logs a warn but does not reject (lenient for clock skew). Tests added: - dedup: `no_dupe_on_resync_with_same_external_id` + `different_external_ids_stored_separately` - runtime: `get_error_on_bad_path` (no panic on bad DB path) - rpc validation: empty title, whitespace title, priority > 9, accepts priority 9, malformed due_date, valid future due_date, past due_date warn-not-reject - non-macOS stubs: `list_events` and `create_reminder` stub tests compile but are only active on non-macOS targets; macOS CI runs the live path All 20 eventkit tests pass. `cargo check --bin openhuman-core` is clean.

# Conflicts: # Cargo.lock # Cargo.toml

Moves the inline #[cfg(test)] mod tests block out of prompts/mod.rs into prompts/tests.rs to bring the production file under the project's <=500 line guideline and reduce mod.rs weight per the "light mod.rs" rule. No behavior change.

Moves the inline #[cfg(test)] mod tests block out of channel.rs into channel_tests.rs to bring the production file under the project's <=500 line guideline. No behavior change.

…xt stub

Tidies whitespace left over from the parallel extraction commits in b946aff and 82b739c — rustfmt re-flowed a handful of long string literals in channel_tests.rs and compatible_tests.rs and dropped one trailing blank line in channel.rs. No behavior change.

coderabbitai

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/openhuman/agent/triage/escalation.rs (1)
125-176: ⚠️ Potential issue | 🔴 Critical

Escalated sub-agents will receive None for curated_snapshot and never populate it.

dispatch_target_agent builds a fresh Agent via Agent::from_config(), which initializes curated_snapshot: None (session builder is synchronous and defers population to the first turn via ensure_curated_snapshot()). The snapshot is captured at line 175 before run_subagent() executes. However, run_subagent() dispatches to either run_typed_mode or run_fork_mode, neither of which executes a full session turn—they only run a provider loop. Therefore, ensure_curated_snapshot() is never called, and the escalated sub-agent will never have access to curated memory, unlike normal subagent dispatch where the parent has already populated its snapshot during turn execution.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/escalation.rs` around lines 125 - 176,
dispatch_target_agent builds an Agent via Agent::from_config() but captures
curated_snapshot before it can be populated, so escalated sub-agents get None;
call Agent::ensure_curated_snapshot().await (or the synchronous accessor that
forces population) on the newly built Agent before constructing
ParentExecutionContext so curated_snapshot is populated, then use
agent.curated_snapshot() for the ParentExecutionContext; update
dispatch_target_agent to await that population step prior to calling
run_subagent/run_typed_mode/run_fork_mode.
app/src-tauri/src/imessage_scanner/mod.rs (2)
3-6: ⚠️ Potential issue | 🟡 Minor

Update the scanner rustdoc for life-capture ingest.

The module docs still describe openhuman.memory_doc_ingest and the old playbook convention, but the scanner now writes to openhuman.life_capture_ingest.

Based on learnings, new/changed behavior must ship with matching rustdoc / code comments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src-tauri/src/imessage_scanner/mod.rs` around lines 3 - 6, The
module-level rustdoc in imessage_scanner (top of mod.rs) is outdated: it
references `openhuman.memory_doc_ingest` and the old playbook convention; update
the documentation to reflect current behavior by replacing mentions of
`openhuman.memory_doc_ingest` with `openhuman.life_capture_ingest`, remove or
update the reference to `docs/webview-integration-playbook.md`/WhatsApp
convention, and briefly describe that the scanner emits one
`openhuman.life_capture_ingest` JSON-RPC call per `(chat_identifier, day)` group
(read-only from `~/Library/Messages/chat.db`) so the comments match the actual
runtime behavior.
436-443: ⚠️ Potential issue | 🟠 Major

Treat JSON-RPC errors as ingest failures before advancing the cursor.

JSON-RPC controller failures can arrive as HTTP 200 with an "error" envelope. This currently logs success and lets the tick advance the cursor, so failed life-capture writes can be skipped permanently. Also avoid logging key, since it can include a phone number or email chat identifier.
Suggested envelope check
-    let res = http_client().post(&url).json(&body).send().await?;
+    let res = http_client().post(&url).json(&body).send().await?;
+    let status = res.status();
+    let response_body = res.text().await?;
 
-    if !res.status().is_success() {
-        anyhow::bail!("core rpc {}: {}", res.status(), res.text().await?);
+    if !status.is_success() {
+        anyhow::bail!("core rpc {}: {}", status, response_body);
+    }
+
+    let env: serde_json::Value = serde_json::from_str(&response_body)?;
+    if let Some(error) = env.get("error") {
+        anyhow::bail!("life_capture_ingest rpc error: {}", error);
     }
 
-    log::info!("[imessage] life_capture ingest ok key={}", key);
+    log::info!("[imessage] life_capture ingest ok");
As per coding guidelines, never log secrets or full PII — redact.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src-tauri/src/imessage_scanner/mod.rs` around lines 436 - 443, The HTTP
response handling after http_client().post(&url).json(&body).send().await? (res)
must detect JSON-RPC error envelopes even when HTTP 200, so parse the response
body as JSON (e.g., serde_json::Value) and if it contains an "error" field
(non-null) treat it as a failure (anyhow::bail!) instead of logging success and
advancing the cursor; also remove logging of the raw key value in the success
log (redact or omit it) so life_capture ingest success uses a non-PII message
and failures include the error details from the JSON envelope.
src/openhuman/agent/prompts/mod.rs (1)
718-720: ⚠️ Potential issue | 🟠 Major

Thread the curated snapshot into the direct sub-agent renderer too.

UserFilesSection now honors ctx.curated_snapshot, but this renderer still re-reads MEMORY.md from disk. Sub-agent prompts can diverge mid-session after curated-memory writes, violating the session-frozen prompt/cache contract documented above.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/prompts/mod.rs` around lines 718 - 720, The sub-agent
renderer still calls inject_workspace_file_capped(&mut out, workspace_dir,
"MEMORY.md", USER_FILE_MAX_CHARS) when options.include_memory_md, causing a disk
read instead of using the session-curated memory; change this branch to check
ctx.curated_snapshot (or equivalent field) and, if present, inject the curated
snapshot contents into the renderer output (or call an overload of
inject_workspace_file_capped that accepts raw content) instead of reading
MEMORY.md from workspace_dir so UserFilesSection and the direct sub-agent
renderer both use the same in-memory curated snapshot.

🟡 Minor comments (14)

src/openhuman/life_capture/redact.rs-8-17 (1)
8-17: ⚠️ Potential issue | 🟡 Minor

PHONE regex matches false positives (10-digit order/tracking IDs), but proposed \b fix breaks parentheses format.

The pattern correctly identifies the false-positive risk: plain 10-digit runs like 1234567890 match regardless of context (order#1234567890, tracking 1234567890 shipped), getting redacted before indexing and becoming unsearchable. However, the proposed \b fix breaks the existing test case—(415) 555-0123 fails to match because \b between the opening paren and first digit prevents the match. The parentheses format must remain matchable per the test suite (line 46).

A more nuanced fix is needed—such as requiring at least one separator/paren form, using negative lookahead/lookbehind to exclude bare digit runs, or anchoring to word boundaries only where appropriate (after parens, around separators). Consider what formats should legitimately match when preceded/followed by alphanumerics (e.g., order#1234567890 should not, but tel:(415)555-0123 might).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/redact.rs` around lines 8 - 17, PHONE's regex is
matching bare 10-digit runs (false positives) and adding a simple \b breaks
parenthesized numbers; update the PHONE Regex (the static PHONE Lazy<Regex> /
Regex::new call) to require either explicit separators or parentheses around the
area code or to forbid matching a continuous digit-run with no separators by
using alternation plus lookarounds (e.g., one branch that matches formats with
separators or $...$ and another branch that uses negative lookbehind/negative
lookahead to reject a preceding/following alphanumeric digit when matching a
plain 10-digit sequence); implement this by modifying the pattern inside
Regex::new so it only matches numbers with separators/parentheses or ensures the
10-digit branch is not adjacent to word characters.
src/core/all.rs-287-296 (1)
287-296: ⚠️ Potential issue | 🟡 Minor

Add descriptions for the other newly registered namespaces.

life_capture and curated_memory are now registered controllers, but namespace_description() only adds people and eventkit, so CLI/help discovery will show no description for two new user-facing namespaces.
Proposed fix
         "learning" => Some(
             "User context enrichment — LinkedIn profile scraping and onboarding intelligence.",
         ),
+        "life_capture" => Some(
+            "Personal index ingestion, search, stats, and retrieval for captured life data.",
+        ),
+        "curated_memory" => Some(
+            "Curated MEMORY.md and USER.md scratchpad reads and writes.",
+        ),
         "people" => Some(
             "Contact resolution and recency × frequency × reciprocity × depth scoring.",
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/all.rs` around lines 287 - 296, namespace_description() only returns
descriptions for "people", "notification", and "eventkit", so the newly
registered namespaces "life_capture" and "curated_memory" appear undocumented in
CLI/help; update the match in namespace_description() to include arms for
"life_capture" and "curated_memory" returning concise user-facing descriptions
(similar style to "people" and "eventkit") so those controllers surface
meaningful help text; reference the function namespace_description() and add the
two new string keys "life_capture" and "curated_memory" with appropriate
description strings.
src/openhuman/life_capture/quote_strip.rs-14-33 (1)
14-33: ⚠️ Potential issue | 🟡 Minor

Preserve true no-op behavior when no quote is present.

strip_quoted_reply() currently trims and rewrites line endings even when there is no marker and no quoted > line, so “passthrough” inputs can still be changed.
Proposed fix
 pub fn strip_quoted_reply(body: &str) -> String {
     // Cut at the earliest of the two marker types.
     let mut cut: Option<usize> = None;
     for re in [&*ON_DATE_WROTE, &*OUTLOOK_SEP] {
         if let Some(m) = re.find(body) {
             cut = Some(cut.map_or(m.start(), |c| c.min(m.start())));
         }
     }
     let head = if let Some(idx) = cut {
         &body[..idx]
     } else {
         body
     };
 
+    if cut.is_none() && !head.lines().any(|l| l.trim_start().starts_with('>')) {
+        return body.to_string();
+    }
+
     // Drop any line starting with '>'.
     let kept: Vec<&str> = head
         .lines()
         .filter(|l| !l.trim_start().starts_with('>'))
         .collect();
     kept.join("\n").trim().to_string()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/quote_strip.rs` around lines 14 - 33,
strip_quoted_reply currently always rebuilds the string which can change line
endings and whitespace even when there is nothing to strip; modify
strip_quoted_reply so that if no marker from ON_DATE_WROTE or OUTLOOK_SEP is
found and no line begins with '>' it returns body.to_string() immediately
(preserving exact input). Otherwise keep the existing logic that computes cut,
slices head, filters '>' lines, joins and trims; reference the function
strip_quoted_reply and the regex markers ON_DATE_WROTE and OUTLOOK_SEP to locate
where to add the early-return check.
src/openhuman/life_capture/migrations/0001_init.sql-9-9 (1)
9-9: ⚠️ Potential issue | 🟡 Minor

Index text_keyword in FTS or remove the column.

text_keyword is documented as the raw FTS/BM25 projection, but items_fts and its triggers index text only. Connectors that populate text_keyword will not affect keyword search.
Suggested migration adjustment
 CREATE VIRTUAL TABLE IF NOT EXISTS items_fts USING fts5(
     subject,
-    text,
+    text_keyword,
     content='items',
     content_rowid='rowid',
     tokenize='porter unicode61'
 );
@@
-    INSERT INTO items_fts(rowid, subject, text)
-    VALUES (new.rowid, COALESCE(new.subject, ''), new.text);
+    INSERT INTO items_fts(rowid, subject, text_keyword)
+    VALUES (new.rowid, COALESCE(new.subject, ''), COALESCE(new.text_keyword, new.text));
@@
-    INSERT INTO items_fts(items_fts, rowid, subject, text)
-    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), old.text);
+    INSERT INTO items_fts(items_fts, rowid, subject, text_keyword)
+    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), COALESCE(old.text_keyword, old.text));
@@
-    INSERT INTO items_fts(items_fts, rowid, subject, text)
-    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), old.text);
-    INSERT INTO items_fts(rowid, subject, text)
-    VALUES (new.rowid, COALESCE(new.subject, ''), new.text);
+    INSERT INTO items_fts(items_fts, rowid, subject, text_keyword)
+    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), COALESCE(old.text_keyword, old.text));
+    INSERT INTO items_fts(rowid, subject, text_keyword)
+    VALUES (new.rowid, COALESCE(new.subject, ''), COALESCE(new.text_keyword, new.text));
 END;
Also applies to: 30-35, 39-54
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/migrations/0001_init.sql` at line 9, The schema
adds a text_keyword column but the FTS virtual table and triggers only index the
text column (items_fts and its triggers), so populate or index text_keyword in
the FTS pipeline or remove the unused column; update the FTS definition
(items_fts) and its associated insert/update/delete triggers to include
text_keyword in the FTS content and bm25/FTS projections, or drop text_keyword
from the CREATE TABLE if you intend to only index text, making sure to update
any migration lines referencing text_keyword (and the triggers that mirror into
items_fts) so connectors that write text_keyword actually affect search.
docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md-24-25 (1)
24-25: ⚠️ Potential issue | 🟡 Minor

Refresh the documented RPC surface to match the shipped controller.

The plan still references embed_and_store, a bare-array search return, and life_capture.* call names. This PR ships openhuman.life_capture_ingest and wraps search results as { "hits": [...] }, so the plan will mislead follow-up work and smoke tests.

Based on learnings, new/changed behavior must ship with matching rustdoc / code comments and architecture docs when user-visible behavior changes.

Also applies to: 1423-1438, 2093-2097
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md` around lines
24 - 25, Update the plan/docs to match the shipped controller surface: replace
references to the removed embed_and_store API
(LifeCaptureController::embed_and_store) with the actual ingest RPC name
openhuman.life_capture_ingest, change any mention of search returning a bare
array to document that search returns an object with "hits": [...] (not a raw
array), and ensure RpcOutcome-style handler mentions align with the current
controller handlers (e.g., LifeCaptureController get_status, get_stats, search)
and openhuman.life_capture_ingest call names so follow-up work and smoke tests
reflect the real, shipped behavior.
scripts/install.sh-203-217 (1)
203-217: ⚠️ Potential issue | 🟡 Minor

verify_asset_reachable is defined but never called — dead code or missing wiring?

The helper is declared with exponential-backoff HEAD retries but no call site wires it into resolve_from_latest_json or the post-resolution flow. Either hook it in after ASSET_URL is resolved (e.g., before the curl -fL ... -o ${DOWNLOAD_PATH} step) so resolution-time confidence matches download-time behaviour, or drop it to avoid drift.

Want me to wire verify_asset_reachable "${ASSET_URL}" into the post-resolution path?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/install.sh` around lines 203 - 217, The verify_asset_reachable helper
is never invoked causing dead code or missed pre-download checks; after
ASSET_URL is resolved (e.g., in resolve_from_latest_json or immediately after
that resolution step) call verify_asset_reachable "${ASSET_URL}" and abort on
non-zero return before performing the curl download that writes to DOWNLOAD_PATH
(the existing curl -fL ... -o ${DOWNLOAD_PATH} step), so the script performs the
HEAD retry/backoff check prior to attempting the actual download.
src/openhuman/eventkit/rpc.rs-35-39 (1)
35-39: ⚠️ Potential issue | 🟡 Minor

Minor: end <= start rejects equal timestamps, but error text says "must be greater".

If a caller passes the same value for start_ts and end_ts (e.g., an instantaneous probe), the condition fails and the error says end_ts must be greater than start_ts, which is consistent — but the schema description doesn't advertise that constraint. Either loosen to end < start (allow equal) or document the strict-greater-than requirement on the end_ts input in schemas.rs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/eventkit/rpc.rs` around lines 35 - 39, The code currently
rejects equal timestamps via the check "if end <= start" in
src/openhuman/eventkit/rpc.rs; to allow instantaneous probes, change the
condition to "if end < start" and update the error message to reflect the new
non-strict constraint (e.g., "end_ts ({end}) must be >= start_ts ({start})");
also update the schema/doc in schemas.rs if it currently claims a
strict-greater-than requirement so the API docs match the new behavior.
src/openhuman/eventkit/schemas.rs-244-247 (1)
244-247: ⚠️ Potential issue | 🟡 Minor

Silent clamp masks the rpc.rs priority validation.

rpc::handle_create_reminder in src/openhuman/eventkit/rpc.rs explicitly rejects priorities >9 with "create_reminder: 'priority' must be 0–9 (got {p})", but this adapter pre-clamps via v.min(9), so priority=42 silently becomes 9 and the rpc validation never fires. Callers get no indication their input was invalid, and the adapter and rpc layer disagree on the contract.

Either drop the clamp and let rpc reject, or validate explicitly in the adapter with the same error text.
🐛 Proposed fix — let rpc validation handle the range
-        let priority = params
-            .get("priority")
-            .and_then(|v| v.as_u64())
-            .map(|v| v.min(9) as u8);
+        let priority = match params.get("priority") {
+            None | Some(Value::Null) => None,
+            Some(Value::Number(n)) => Some(
+                n.as_u64()
+                    .ok_or("create_reminder: 'priority' must be an unsigned integer 0–9")?
+                    as u8,
+            ),
+            Some(_) => return Err(
+                "create_reminder: 'priority' must be an unsigned integer 0–9".into(),
+            ),
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/eventkit/schemas.rs` around lines 244 - 247, The adapter in
src/openhuman/eventkit/schemas.rs silently clamps priority via .map(|v| v.min(9)
as u8), masking rpc::handle_create_reminder's validation; remove the clamp so
the code only parses the numeric value (e.g., keep .and_then(|v|
v.as_u64()).map(|v| v as u8)) and let rpc::handle_create_reminder enforce the
0–9 range, or alternatively perform explicit validation in the adapter that
returns the identical error string used by rpc::handle_create_reminder; update
the priority parsing in the schemas.rs snippet (the code that defines priority)
accordingly.
src/openhuman/life_capture/migrations.rs-53-68 (1)
53-68: ⚠️ Potential issue | 🟡 Minor

Remove unused run_async or document its purpose.

run_async at line 53 is not called anywhere in the codebase. The actual pattern in PersonalIndex::open() and PersonalIndex::open_in_memory() uses spawn_blocking with the sync run() function, then wraps the connection in Arc<tokio::sync::Mutex<Connection>> at the index level—not within migrations.

If run_async exists for future extensibility or as part of a public API contract, add a comment explaining the use case. Otherwise, remove it to reduce maintenance surface.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/migrations.rs` around lines 53 - 68, The function
run_async is unused—either remove it or document its intended public
API/extension use; if removing, delete the run_async function and any tests/docs
referencing it so migrations.rs only exposes run, otherwise add a clear doc
comment above run_async explaining why it exists (e.g., convenience async
wrapper for blocking run used by external callers) and mark it
#[allow(dead_code)] or pub(crate)/pub as appropriate, and ensure references in
PersonalIndex::open and PersonalIndex::open_in_memory remain using the current
spawn_blocking + run pattern to avoid changing behavior.
docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md-13-342 (1)
13-342: ⚠️ Potential issue | 🟡 Minor

Duplicated "Workstream A" risks confusing the agentic executor.

The file now has the new Workstream A (Task A4 at line 19) and "Workstream A (original) — kept for history" (line 60) which also defines Tasks A1–A4 with the same IDs. Given the doc header instructs workers to execute with superpowers:executing-plans, a runner iterating on unchecked - [ ] items will hit both copies of A4 and re-try work already completed in the new section. Either move the history block behind a collapsible <details>, prefix its tasks with A-old-*, or drop it entirely and keep the context in a short paragraph.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md` around lines 13 -
342, The document contains duplicated "Workstream A" sections (the new
Workstream A with Task A4 and the "Workstream A (original) — kept for history"
block) which will confuse automated executors; remove or demote the historical
copy so only one actionable Workstream A/A4 remains. Fix by either (a) moving
the entire "Workstream A (original) — kept for history" block into a
non-actionable collapsible section (e.g., wrap in a <details> summary) or (b)
renaming its task IDs to non-conflicting identifiers like A-old-*, or (c) delete
the historical duplicate entirely and condense its context into a short
paragraph; ensure the actionable Task A4 / "Task A4: Wire the resolver test into
CI and drop the ubuntu matrix row" and the checklist steps remain unchanged and
uniquely numbered.
src/openhuman/life_capture/embedder.rs-21-25 (1)
21-25: ⚠️ Potential issue | 🟡 Minor

Silent fallback drops the configured timeouts.

reqwest::Client::builder().…build().unwrap_or_else(|_| reqwest::Client::new()) discards the 30s request / 10s connect timeouts on any builder error and hands back a client with no timeouts, which can hang request threads indefinitely. Builder failures here are TLS/system-level — rare but fatal to diagnose once you hit them. Prefer bubbling the error (Result<Self, _>) or expect with a clear message; reqwest::Client::new() is not a safer fallback (it internally unwraps the same builder).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/embedder.rs` around lines 21 - 25, The current use
of reqwest::Client::builder().…build().unwrap_or_else(|_|
reqwest::Client::new()) silently drops the configured timeouts (30s request /
10s connect) on builder errors; replace this by propagating the build error or
failing with an explicit message so you don't return a client without timeouts.
Update the code that initializes the local variable http (the
reqwest::Client::builder().timeout(...).connect_timeout(...).build().unwrap_or_else(...))
to either return a Result from the enclosing constructor/function (e.g.,
Result<Self, reqwest::Error>) and use .build()? or .map_err(...) to bubble the
error, or call .build().expect("failed to build reqwest client with configured
timeouts: <context>") so failures are explicit; do not fallback to
reqwest::Client::new() which loses the timeouts. Ensure the change is applied
where http is created so configured timeouts are always honored or the error
surfaces.
src/openhuman/life_capture/index.rs-167-177 (1)
167-177: ⚠️ Potential issue | 🟡 Minor

Silent fallback to empty source string hides serialization bugs.
let source = serde_json::to_value(item.source)
    .ok()
    .and_then(|v| v.as_str().map(str::to_string))
    .unwrap_or_default();
If Source is ever extended in a way that doesn't serialize to a bare string (e.g. variant with data), this collapses silently to "". The row is then written with source = "", which won't match anything in Query.sources filtering in apply_query_filters, nor will anyone easily spot it in the DB. Hybrid search will return zero hits for an otherwise-correctly-ingested item.

Similarly, the unchecked author_json and metadata_json fallbacks (unwrap_or_else(|_| "null".into()) / "{}".into()) will round-trip through from_str on read and mask the underlying error.
🛡️ Proposed fix
-                    let source = serde_json::to_value(item.source)
-                        .ok()
-                        .and_then(|v| v.as_str().map(str::to_string))
-                        .unwrap_or_default();
-                    let author_json = item
-                        .author
-                        .as_ref()
-                        .map(|a| serde_json::to_string(a).unwrap_or_else(|_| "null".into()));
-                    let metadata_json = serde_json::to_string(&item.metadata)
-                        .unwrap_or_else(|_| "{}".into());
+                    let source = serde_json::to_value(item.source)
+                        .context("serialize item.source")?
+                        .as_str()
+                        .ok_or_else(|| anyhow::anyhow!("Source must serialize to a JSON string"))?
+                        .to_string();
+                    let author_json = item
+                        .author
+                        .as_ref()
+                        .map(serde_json::to_string)
+                        .transpose()
+                        .context("serialize item.author")?;
+                    let metadata_json = serde_json::to_string(&item.metadata)
+                        .context("serialize item.metadata")?;
Given the upsert entry point already returns anyhow::Result, propagating is the natural shape.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/index.rs` around lines 167 - 177, The current code
silently swallows serialization errors for item.source, author, and metadata
(the owned.iter_mut block building source, author_json, metadata_json), which
hides bugs and writes empty/placeholder values; update the logic to propagate
serde_json serialization errors back to the caller (the upsert entry point which
returns anyhow::Result) instead of using unwrap_or_default/unwrap_or_else —
e.g., replace the silent fallbacks with ?-style propagation or map_err
conversions so serde_json::to_value(item.source),
serde_json::to_string(&item.metadata), and serde_json::to_string(a) return
errors to the caller and abort the upsert on serialization failure.
src/openhuman/people/resolver.rs-103-175 (1)
103-175: ⚠️ Potential issue | 🟡 Minor

Seeded Person.display_name is dropped when the contact also has an email/phone.

When a contact has both an email and a display name (the common case: AddressBookContact { display_name: Some("Alice Smith"), emails: ["alice@..."], phones: [...] }), the primary handle is Handle::Email, so resolve_or_create mints a Person with:
// from resolve_or_create, Handle::Email branch
(display_name, primary_email, primary_phone) = (None, Some(email), None)
The contact's display name is then only attached as an alias row in handle_aliases — it never makes it into Person.display_name. The UI (handle_list in rpc.rs) reads p.display_name directly, so address-book-seeded contacts render without a name until something else enriches them.

The existing test seed_from_address_book_populates_store doesn't catch this because it only asserts on PersonId equality, not on display_name.
♻️ Sketch of a fix

Either:

After resolve_or_create, do a best-effort store.enrich_person(pid, display_name, primary_phone) that fills in any NULL columns on the existing row; or

Introduce a resolver.upsert_from_contact(&AddressBookContact) path that builds the full Person skeleton up front instead of going through the single-handle resolve_or_create.

Option 1 is the smaller change and matches the "seed is idempotent, don't clobber" contract. Also worth extending seed_from_address_book_populates_store with:
+        let alice = s.get(alice_id.unwrap()).await.unwrap().unwrap();
+        assert_eq!(alice.display_name.as_deref(), Some("Alice Smith"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/resolver.rs` around lines 103 - 175,
seed_from_address_book currently uses resolve_or_create(&primary) which
creates/returns a Person without the contact's display_name when primary is an
email/phone, so the contact name only becomes a handle alias and never populates
Person.display_name; after Successfully resolving/creating the primary (the
Ok(pid) branch in seed_from_address_book) call a best-effort enrichment on the
stored person to set display_name and primary_phone if those columns are NULL
(e.g., add a store.enrich_person(pid, display_name_opt, phone_opt) or similar),
ensuring you pass the contact.display_name and the primary phone and only update
NULL fields to preserve existing data; update the test
seed_from_address_book_populates_store to assert Person.display_name is
populated for a contact with email+display_name.
src/openhuman/people/address_book.rs-126-177 (1)
126-177: ⚠️ Potential issue | 🟡 Minor

request_access can silently hang if invoked on the main thread.

The doc comment says "Must not be called from the main thread on macOS (CNContactStore will deadlock)", but nothing actually enforces this, and SystemContactsSource::fetch_contacts → fetch_via_cn_contact_store → request_access is reachable from any async task. If the tokio runtime happens to schedule this on the main thread (e.g. from an RPC handler in an app bundle started via NSApplicationMain), rx.recv() will block forever with no timeout. Consider either:

Adding a rx.recv_timeout(Duration::from_secs(30)) with a distinct AddressBookError::Timeout, or

Wrapping the call path in tokio::task::spawn_blocking at the handle_refresh_address_book RPC boundary so it never runs on the async executor thread.

This would prevent a permanently stuck RPC if the invariant is accidentally violated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/address_book.rs` around lines 126 - 177, request_access
can block forever if called on the main thread because rx.recv() waits
indefinitely; update the code to avoid permanent hangs by either (A) replacing
rx.recv() with rx.recv_timeout(Duration::from_secs(30)) and return a new
AddressBookError::Timeout on timeout (update the AddressBookError enum
accordingly), or (B) ensure callers never call request_access on an async
executor thread by moving the blocking path into a background thread using
tokio::task::spawn_blocking at the RPC boundary (e.g. in
handle_refresh_address_book / SystemContactsSource::fetch_contacts ->
fetch_via_cn_contact_store) so request_access stays on a blocking thread;
reference request_access, rx.recv, AddressBookError, fetch_via_cn_contact_store,
fetch_contacts, and handle_refresh_address_book when making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9bf11610-907f-4555-9e51-47feb8aa8433

📥 Commits

Reviewing files that changed from the base of the PR and between 3dc0d0d and 2452a1b.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (93)

.github/workflows/installer-smoke.yml
Cargo.toml
app/src-tauri/Info.plist
app/src-tauri/src/imessage_scanner/mod.rs
docs/superpowers/plans/2026-04-21-imessage-live-harness.md
docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md
docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md
docs/superpowers/specs/2026-04-22-life-capture-layer-design.md
scripts/fixtures/latest.json
scripts/install.sh
scripts/test_install.sh
src/core/all.rs
src/core/jsonrpc.rs
src/openhuman/agent/agents/archivist/prompt.rs
src/openhuman/agent/agents/code_executor/prompt.rs
src/openhuman/agent/agents/critic/prompt.rs
src/openhuman/agent/agents/integrations_agent/prompt.rs
src/openhuman/agent/agents/loader.rs
src/openhuman/agent/agents/morning_briefing/prompt.rs
src/openhuman/agent/agents/orchestrator/prompt.rs
src/openhuman/agent/agents/planner/prompt.rs
src/openhuman/agent/agents/researcher/prompt.rs
src/openhuman/agent/agents/summarizer/prompt.rs
src/openhuman/agent/agents/tool_maker/prompt.rs
src/openhuman/agent/agents/tools_agent/prompt.rs
src/openhuman/agent/agents/trigger_reactor/prompt.rs
src/openhuman/agent/agents/trigger_triage/prompt.rs
src/openhuman/agent/agents/welcome/prompt.rs
src/openhuman/agent/harness/fork_context.rs
src/openhuman/agent/harness/session/builder.rs
src/openhuman/agent/harness/session/runtime.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/agent/harness/session/types.rs
src/openhuman/agent/harness/subagent_runner/ops.rs
src/openhuman/agent/prompts/mod.rs
src/openhuman/agent/prompts/tests.rs
src/openhuman/agent/prompts/types.rs
src/openhuman/agent/triage/escalation.rs
src/openhuman/agent/triage/evaluator.rs
src/openhuman/channels/providers/telegram/channel.rs
src/openhuman/channels/providers/telegram/channel_tests.rs
src/openhuman/context/debug_dump.rs
src/openhuman/curated_memory/mod.rs
src/openhuman/curated_memory/rpc.rs
src/openhuman/curated_memory/runtime.rs
src/openhuman/curated_memory/schemas.rs
src/openhuman/curated_memory/store.rs
src/openhuman/curated_memory/types.rs
src/openhuman/eventkit/calendar.rs
src/openhuman/eventkit/migrations/0001_init.sql
src/openhuman/eventkit/mod.rs
src/openhuman/eventkit/reminders.rs
src/openhuman/eventkit/rpc.rs
src/openhuman/eventkit/runtime.rs
src/openhuman/eventkit/schemas.rs
src/openhuman/eventkit/store.rs
src/openhuman/eventkit/types.rs
src/openhuman/learning/prompt_sections.rs
src/openhuman/life_capture/embedder.rs
src/openhuman/life_capture/index.rs
src/openhuman/life_capture/ingest/composio.rs
src/openhuman/life_capture/ingest/mod.rs
src/openhuman/life_capture/migrations.rs
src/openhuman/life_capture/migrations/0001_init.sql
src/openhuman/life_capture/migrations/0002_vec.sql
src/openhuman/life_capture/mod.rs
src/openhuman/life_capture/quote_strip.rs
src/openhuman/life_capture/redact.rs
src/openhuman/life_capture/rpc.rs
src/openhuman/life_capture/runtime.rs
src/openhuman/life_capture/schemas.rs
src/openhuman/life_capture/tests/e2e.rs
src/openhuman/life_capture/tests/mod.rs
src/openhuman/life_capture/types.rs
src/openhuman/mod.rs
src/openhuman/people/address_book.rs
src/openhuman/people/migrations.rs
src/openhuman/people/migrations/0001_init.sql
src/openhuman/people/mod.rs
src/openhuman/people/resolver.rs
src/openhuman/people/rpc.rs
src/openhuman/people/schemas.rs
src/openhuman/people/scorer.rs
src/openhuman/people/store.rs
src/openhuman/people/tests.rs
src/openhuman/people/types.rs
src/openhuman/providers/compatible.rs
src/openhuman/providers/compatible_tests.rs
tests/agent_harness_public.rs
tests/composio_ingest_e2e.rs
tests/fixtures/life_capture/corpus.json
tests/imessage_ingest_e2e.rs
tests/life_capture_retrieval_eval.rs

coderabbitai · 2026-04-23T05:49:25Z

+    // --- Curated-memory + life-capture runtime bootstrap ------------------
+    //
+    // These MUST initialise before `axum::serve` starts accepting connections
+    // so the very first RPC can't race the OnceCell — earlier we set them
+    // inside the tokio::spawn block below and the server began accepting
+    // before they were ready, surfacing as spurious 'not initialised' errors
+    // for the first few requests.
+    //
+    // Loading the config a second time here (the spawn block also calls
+    // `load_or_init`) is fine — it caches and is idempotent.
+    if let Ok(config) = crate::openhuman::config::Config::load_or_init().await {
+        bootstrap_curated_memory(&config.workspace_dir).await;
+        bootstrap_life_capture(&config.workspace_dir).await;


🛠️ Refactor suggestion | 🟠 Major

Move domain runtime bootstrap out of the JSON-RPC transport.

jsonrpc.rs now hard-codes curated-memory and life-capture initialization details, including storage paths, env-driven embedder setup, and bridge registration. Please move this behind a domain/runtime bootstrap API or registry-level startup hook so the transport stays domain-agnostic.

As per coding guidelines, src/core/{cli.rs,jsonrpc.rs} should expose domain functionality through the controller registry and “do not add domain-specific branches or one-off transport logic in src/core/cli.rs or src/core/jsonrpc.rs”.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/core/jsonrpc.rs` around lines 658 - 670, The JSON-RPC transport currently inits domain runtime directly by calling crate::openhuman::config::Config::load_or_init().await then bootstrap_curated_memory(&config.workspace_dir).await and bootstrap_life_capture(&config.workspace_dir).await; move that logic out of jsonrpc.rs and behind a domain/runtime bootstrap API or registry startup hook. Create or extend the controller/registry interface (e.g., add a run_startup_hooks or register_runtime_bootstrap method) and have the curated-memory and life-capture initialization (including env-driven embedder setup and bridge registration) implemented as a domain module that registers itself with that API; then replace the direct calls in jsonrpc.rs with a single call to the registry startup hook (or no-op if none registered), keeping jsonrpc.rs domain-agnostic and relying on controller/registry to perform load_or_init and to invoke bootstrap_curated_memory and bootstrap_life_capture.

coderabbitai · 2026-04-23T05:49:26Z

+    let writer = IndexWriter::new(idx);
+    writer
+        .upsert(&mut items)
+        .await
+        .map_err(|e| format!("upsert: {e}"))?;
+    let canonical_id = items[0].id;
+    let replaced = canonical_id != requested_id;
+
+    let mut vecs = embedder
+        .embed_batch(&[text.as_str()])
+        .await
+        .map_err(|e| format!("embed: {e}"))?;
+    let vector = vecs.pop().ok_or("embedder returned no vectors")?;
+    writer
+        .upsert_vector(&canonical_id, &vector)
+        .await
+        .map_err(|e| format!("upsert_vector: {e}"))?;


⚠️ Potential issue | 🟠 Major

Make item upsert and vector replacement failure-atomic.

The item is committed before embedding/vector replacement. If embed_batch or upsert_vector fails, the index can contain a new/updated item with no vector, or updated text with the old vector. Pre-embed and validate dimensions before mutating, then wrap item upsert + vector replace in one IndexWriter transaction/helper.

Minimum mitigation before adding a transaction helper

+ const INDEX_VECTOR_DIM: usize = 1536; + if embedder.dim() != INDEX_VECTOR_DIM { + return Err(format!( + "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}", + embedder.dim() + )); + } + + let mut vecs = embedder + .embed_batch(&[text.as_str()]) + .await + .map_err(|e| format!("embed: {e}"))?; + let vector = vecs.pop().ok_or("embedder returned no vectors")?; + if vector.len() != INDEX_VECTOR_DIM { + return Err(format!( + "embedding length {} does not match index dim {INDEX_VECTOR_DIM}", + vector.len() + )); + } + let writer = IndexWriter::new(idx); writer .upsert(&mut items) .await .map_err(|e| format!("upsert: {e}"))?; let canonical_id = items[0].id; let replaced = canonical_id != requested_id; - let mut vecs = embedder - .embed_batch(&[text.as_str()]) - .await - .map_err(|e| format!("embed: {e}"))?; - let vector = vecs.pop().ok_or("embedder returned no vectors")?; writer .upsert_vector(&canonical_id, &vector) .await

This still needs a transaction-backed writer API to fully guarantee atomic replacement.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let writer = IndexWriter::new(idx);

writer

.upsert(&mut items)

.await

.map_err(|e| format!("upsert: {e}"))?;

let canonical_id = items[0].id;

let replaced = canonical_id != requested_id;

let mut vecs = embedder

.embed_batch(&[text.as_str()])

.await

.map_err(|e| format!("embed: {e}"))?;

let vector = vecs.pop().ok_or("embedder returned no vectors")?;

writer

.upsert_vector(&canonical_id, &vector)

.await

.map_err(|e| format!("upsert_vector: {e}"))?;

const INDEX_VECTOR_DIM: usize = 1536;

if embedder.dim() != INDEX_VECTOR_DIM {

return Err(format!(

"embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",

embedder.dim()

));

}

let mut vecs = embedder

.embed_batch(&[text.as_str()])

.await

.map_err(|e| format!("embed: {e}"))?;

let vector = vecs.pop().ok_or("embedder returned no vectors")?;

if vector.len() != INDEX_VECTOR_DIM {

return Err(format!(

"embedding length {} does not match index dim {INDEX_VECTOR_DIM}",

vector.len()

));

}

let writer = IndexWriter::new(idx);

writer

.upsert(&mut items)

.await

.map_err(|e| format!("upsert: {e}"))?;

let canonical_id = items[0].id;

let replaced = canonical_id != requested_id;

writer

.upsert_vector(&canonical_id, &vector)

.await

.map_err(|e| format!("upsert_vector: {e}"))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/rpc.rs` around lines 154 - 170, You currently upsert the item (IndexWriter::upsert) before calling embedder.embed_batch and upsert_vector, which can leave the index inconsistent if embedding/vector replacement fails; move embedder.embed_batch(&[text.as_str()]) before any mutation, validate the returned vector dimensions against the index schema, then perform the item upsert and vector write inside a single atomic operation on IndexWriter (add or use a transaction/helper such as IndexWriter::transaction or a new method like IndexWriter::upsert_with_vector that takes (&mut items, &canonical_id, &vector) and guarantees commit/rollback semantics); ensure you still derive canonical_id/requested_id as before but only call writer.upsert/_upsert_vector from inside that transactional helper so failure does not leave a half-applied item or stale vector.

…bview NotificationSettings - PersonId inherently defined to_string which shadowed the ToString blanket impl from Display (clippy correctness error, deny-by-default) - webview_notifications::NotificationSettings Default returned Self { enabled: false }; replaced manual impl with derive

# Conflicts: # src/openhuman/mod.rs

- Drop event title from the 'missing external identifier' log message; event titles are personal data that must not appear in logs. - Propagate store::is_known() and store::upsert_event() errors instead of swallowing them. Failing open caused the same event to be re-emitted on every sync because the cache was never updated. Addresses @coderabbitai on src/openhuman/eventkit/calendar.rs:156 and :217

Gmail message IDs and calendar event IDs are stable external identifiers that create a durable PII trail in logs. Remove them from debug/warn log fields; the source tag retains enough context for correlation. Addresses @coderabbitai on src/openhuman/life_capture/ingest/composio.rs:112

…ardcode HostedEmbedder::dim() was hardcoded to 1536 regardless of which model was configured. Add dim as a constructor parameter so model and vector width stay in sync. At the bootstrap site, resolve dim from OPENHUMAN_EMBEDDINGS_DIM (override), known model names (text-embedding-3-large→3072), or default 1536. Addresses @coderabbitai on src/openhuman/life_capture/embedder.rs:32

…ure state Previously, the item row was committed before the embedding call. A failed embed_batch would leave a committed row with no vector, making it unfindable by search. Reorder to: embed first (no DB side effects) → upsert item row → upsert vector, so a failed embed aborts without touching the index. Addresses @coderabbitai on src/openhuman/life_capture/rpc.rs:170

The handler accepts any JSON value for metadata but the schema declared it as Option<String>, causing schema-driven clients to reject object payloads. Change to Option<Json> to match the actual contract. Addresses @coderabbitai on src/openhuman/life_capture/schemas.rs:192

The created flag in the RPC response was always true when create_if_missing=true because resolve_or_create returned a bare PersonId with no signal about whether a new row was minted or an existing one was found. Thread the insertion boolean out of the resolver so handle_resolve can report the true creation state. Update resolver tests to assert the was_created flag, and update link() and seed_from_address_book() callers to destructure the tuple. Addresses @coderabbitai on src/openhuman/people/rpc.rs:80

Replace manual BEGIN/COMMIT/ROLLBACK with Connection::unchecked_transaction() which auto-rolls back on drop, matching the rusqlite idiomatic pattern and the life_capture migration style. Add entry/per-migration/completion debug log lines as required by coding guidelines (debug logging on new/changed flows). Addresses @coderabbitai on src/openhuman/people/migrations.rs:46

…ath error Both tests mutated OPENHUMAN_WORKSPACE while sharing the CONN OnceLock, causing a race where the second test observed the first test's DB and never exercised its own branch. - Add a test-local Mutex to serialize all tests in this module. - Rewrite get_error_on_bad_path to test create_dir_all directly (rather than get()), because get() fast-paths via OnceLock once initialized by get_is_idempotent. The new version asserts is_err() rather than silently discarding the result. Addresses @coderabbitai on src/openhuman/eventkit/runtime.rs:137

…composio e2e Silently ignoring init_index/init_embedder errors allowed the test to write through a local Arc while JSON-RPC search read a pre-existing global runtime, turning failures into order-dependent timeouts. Propagate errors with .expect() so the test fails immediately with a clear message. Addresses @coderabbitai on tests/composio_ingest_e2e.rs:72

…red_controllers re-exports Domain mod.rs must expose the canonical registry names in addition to the domain-prefixed aliases so the controller registry wiring in all.rs can call them uniformly across domains. Aliases are kept for backward compatibility. Addresses @coderabbitai on src/openhuman/curated_memory/mod.rs:10

…untime Both tests reuse the process-global PersonalIndex via OnceCell, so when tokio runs them in parallel they race on `total_items`. Add a static tokio::sync::Mutex they both acquire for the duration of the test, and switch the absolute count assertion in the idempotency test to a delta against a baseline captured under the lock. Same pattern already used elsewhere (eventkit env-var lock, composio fail-fast guard).

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (3)

src/openhuman/agent/debug/mod.rs (1)

395-395: Preserve prompt-dump parity by threading a real curated snapshot

Hardcoding curated_snapshot: None can make dump output diverge from live first-turn prompt bytes when curated memory is active. Given this module’s byte-identical contract, use the same best-effort snapshot strategy here and pass as_deref() into PromptContext.

♻️ Proposed adjustment

+    let curated_snapshot = match crate::openhuman::curated_memory::runtime::get() {
+        Ok(rt) => match crate::openhuman::curated_memory::snapshot_pair(&rt.memory, &rt.user).await
+        {
+            Ok(snap) => Some(std::sync::Arc::new(snap)),
+            Err(_) => None,
+        },
+        Err(_) => None,
+    };
+
     let ctx = PromptContext {
         workspace_dir: agent.workspace_dir(),
         model_name: &model_name,
@@
         connected_identities_md: crate::openhuman::agent::prompts::render_connected_identities(),
         include_profile: !definition.omit_profile,
         include_memory_md: !definition.omit_memory_md,
-        curated_snapshot: None,
+        curated_snapshot: curated_snapshot.as_deref(),
     };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/debug/mod.rs` at line 395, The code currently hardcodes
curated_snapshot: None which breaks byte-identical dumps when curated memory is
enabled; replace that None by producing and threading the same best-effort
curated snapshot used elsewhere and pass its Option<&[u8]> into PromptContext
via curated_snapshot.as_deref(). Locate where curated_snapshot is set and where
PromptContext is constructed, compute the snapshot using the existing snapshot
strategy (same helper/logic used in the live path), store it in curated_snapshot
(Option<Vec<u8>>), and call curated_snapshot.as_deref() when creating the
PromptContext.

src/openhuman/people/tests.rs (1)

60-68: Drop the unused Utc::now() call.

let _now = Utc::now(); is unrelated to UUID round-tripping and adds nothing to the test. Either drop it or move it into a separate test that actually exercises timestamp-related behavior.

♻️ Proposed cleanup

 #[test]
 fn person_id_uuid_format() {
     let id = PersonId::new();
     // Round-trips through a string.
     let s = id.to_string();
     let parsed: uuid::Uuid = s.parse().unwrap();
     assert_eq!(parsed, id.0);
-    let _now = Utc::now();
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/tests.rs` around lines 60 - 68, The test function
person_id_uuid_format contains an unused call to Utc::now() that should be
removed; update the test by deleting the line let _now = Utc::now(); (or move it
into a new timestamp-focused test if you need to exercise time behavior) so the
test only verifies PersonId::new(), id.to_string(), and parsing back to
uuid::Uuid (referencing PersonId and the person_id_uuid_format test).

src/openhuman/people/rpc.rs (1)

16-19: Add entry/exit logging to the RPC handlers.

Per coding guidelines, Rust code should add tracing/log at debug/trace for entry/exit points, branch decisions, and external calls with grep-friendly [domain] prefixes. handle_refresh_address_book already does this; handle_list, handle_resolve, and handle_score are silent. They each issue DB calls (store.list, interactions_for, lookup, insert_person) which are useful to trace by person_id / handle kind when diagnosing production issues.

♻️ Suggested logging

 pub async fn handle_list(store: &PeopleStore, limit: usize) -> Result<RpcOutcome<Value>, String> {
     let limit = limit.clamp(1, 500);
+    tracing::debug!("[people::rpc] handle_list entry limit={limit}");
     let people = store.list().await.map_err(|e| format!("list: {e}"))?;
     let now = Utc::now();
@@
+    tracing::debug!(
+        "[people::rpc] handle_list returning {} people",
+        people_json.len()
+    );
     Ok(RpcOutcome::new(json!({ "people": people_json }), vec![]))
 }
@@
 ) -> Result<RpcOutcome<Value>, String> {
+    tracing::debug!(
+        "[people::rpc] handle_resolve entry kind={} create_if_missing={create_if_missing}",
+        handle.as_key().0
+    );
     let resolver = HandleResolver::new(store);
@@
 ) -> Result<RpcOutcome<Value>, String> {
+    tracing::debug!("[people::rpc] handle_score entry person_id={person_id}");
     let interactions = store

As per coding guidelines: "Use log / tracing at debug / trace level in Rust; include stable grep-friendly prefixes ([domain], [rpc], [ui-flow]) and correlation fields (request IDs, method names, entity IDs)".

Also applies to: 62-67, 121-124

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/rpc.rs` around lines 16 - 19, Add entry/exit and
key-event tracing to the RPC handlers handle_list, handle_resolve, and
handle_score similar to handle_refresh_address_book: emit debug/trace logs with
grep-friendly prefixes like "[rpc]" or "[people]" at function entry and exit,
log branch decisions and external calls (store.list, interactions_for, lookup,
insert_person) including correlation fields (request id or method name) and
entity identifiers (person_id, handle/kind) so the DB/lookup calls and outcomes
are visible for diagnostics; follow existing tracing style used in
handle_refresh_address_book for level and message format.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 865-868: The doc comment "Initializes long-lived socket/event-bus
infrastructure." is now incorrectly attached to bootstrap_curated_memory; move
that leading doc comment (and any subsequent related lines) from above async fn
bootstrap_curated_memory to directly above the async fn bootstrap_skill_runtime
so bootstrap_skill_runtime regains its intended documentation and
bootstrap_curated_memory has only its correct docs or none.

In `@src/openhuman/agent/harness/session/types.rs`:
- Around line 126-134: The doc comment for the Session-scoped `curated_snapshot`
is out of sync with implementation: it states the snapshot is captured "when the
agent is built" but the code actually initializes `curated_snapshot` to None in
the builder and captures it on first turn via `ensure_curated_snapshot`; update
the rustdoc to describe this lifecycle (None in builder, captured lazily on
first turn by `ensure_curated_snapshot`, forwarded into
`PromptContext::curated_snapshot` each turn so runtime writes apply to
subsequent sessions) and mention that it is None when the curated-memory runtime
(`crate::openhuman::curated_memory::runtime`) isn't initialized.

In `@src/openhuman/life_capture/ingest/composio.rs`:
- Around line 415-422: The helper fresh_bridge() currently calls
std::mem::forget(dir) which leaks the TempDir; change fresh_bridge() to return
ownership of the TempDir (e.g., return (LifeCaptureComposioBridge,
tempfile::TempDir)) or hold the TempDir inside the LifeCaptureComposioBridge
wrapper so it lives until tests drop it; update all test call sites that do let
bridge = fresh_bridge().await; to destructure the tuple (let (bridge, _dir) =
fresh_bridge().await;) or accept the wrapper that owns the TempDir so the
TempDir is dropped at test end, and ensure PersonalIndex::open(&db).await and
the Arc<dyn Embedder> (TestEmbedder) usage remain unchanged.
- Around line 601-620: The test bridge_has_stable_name_and_domain currently
contains a tautological assertion (assert_eq!(name, name)) and never calls
LifeCaptureComposioBridge::name() or domains(), nor uses the Stub Embedder;
replace the bogus assertion by constructing a real bridge (use the existing
fresh_bridge helper or the constructor that returns a PersonalIndex handle) and
call LifeCaptureComposioBridge::name() and LifeCaptureComposioBridge::domains()
to assert expected values, or if you prefer to keep it lightweight convert the
test to #[tokio::test], call fresh_bridge(...) to get a bridge instance, then
assert_eq!(bridge.name(), "life_capture::composio_bridge") and
assert!(bridge.domains().contains(...)) and remove the unused Stub/Embedder impl
if you choose the fresh_bridge approach.

In `@src/openhuman/life_capture/rpc.rs`:
- Around line 76-94: The code hardcodes INDEX_VECTOR_DIM = 1536 in handle_search
causing mismatches with configurable embedders; expose the real index vector
dimension from the PersonalIndex (or lift INDEX_VECTOR_DIM into a single shared
constant used by bootstrap_life_capture, PersonalIndex, handle_search and
handle_ingest), then update handle_search to compare embedder.dim() against that
shared/exposed dimension and validate query_vec.len() the same way; also add the
same dimension checks to handle_ingest before calling upsert_vector (validate
embedder.dim() and the produced vector length and return a clear RPC error on
mismatch) and ensure bootstrap_life_capture uses the same shared dimension when
constructing/configuring the embedder.

In `@src/openhuman/people/address_book.rs`:
- Around line 129-169: The CNContactStore FFI calls (e.g., request_access,
enumerateContactsWithFetchRequest_error_usingBlock inside
imp::fetch_via_cn_contact_store) block a Tokio worker; wrap these synchronous
calls in tokio::task::spawn_blocking. Either (A) move the spawn_blocking into
the SystemContactsSource implementation so ContactsSource::fetch_contacts calls
imp::fetch_via_cn_contact_store inside spawn_blocking and returns the awaited
result, or (B) keep fetch_contacts synchronous but change callers (e.g.,
HandleResolver::seed_from_address_book / read_with) to call
tokio::task::spawn_blocking(|| source.fetch_contacts()) and propagate errors
(map join errors to AddressBookError::Other). Ensure request_access remains
inside the blocking closure so rx.recv() never runs on a Tokio worker thread.

In `@src/openhuman/people/resolver.rs`:
- Around line 41-78: resolve_or_create has a TOCTOU: it calls
PeopleStore::lookup then creates a new PersonId and calls
PeopleStore::insert_person, which can race and produce orphan Person rows when
two callers concurrently resolve the same Handle; fix by moving the
read-or-create into an atomic transactional operation inside the store
(preferred): add a new PeopleStore method like lookup_or_insert_person(handle,
person_factory) that within a single transaction SELECTs from handle_aliases
and, if missing, INSERTs into people and INSERTs the handle_aliases (or returns
the existing person_id), then update resolve_or_create to call
lookup_or_insert_person instead of separate lookup + insert_person;
alternatively (if you must keep insert_person), after insert_person re-query
lookup(canonical) and if the returned PersonId != the one you inserted, delete
the orphan people row and return the winner's id.

In `@src/openhuman/people/rpc.rs`:
- Around line 16-58: handle_list currently issues an
interactions_for(p.id).await inside a sequential loop which causes an N+1 query
pattern and serial spawn_blocking calls; fix by either (A) adding an aggregate
method on PeopleStore (suggest names: list_with_scores(limit) or
interaction_summaries()) that runs a single SQL query to return interaction
aggregates needed by score() and then compute score() in-memory for each person
before sorting and truncating by limit, or (B) minimally parallelizing the
per-person fetches by collecting futures for store.interactions_for(p.id) and
awaiting them with futures::future::join_all so the DB calls run concurrently,
and ensure you apply limit before expensive work by sorting/truncating after
computing scores. Use the function name handle_list and the method
interactions_for as anchors when making the change.

---

Nitpick comments:
In `@src/openhuman/agent/debug/mod.rs`:
- Line 395: The code currently hardcodes curated_snapshot: None which breaks
byte-identical dumps when curated memory is enabled; replace that None by
producing and threading the same best-effort curated snapshot used elsewhere and
pass its Option<&[u8]> into PromptContext via curated_snapshot.as_deref().
Locate where curated_snapshot is set and where PromptContext is constructed,
compute the snapshot using the existing snapshot strategy (same helper/logic
used in the live path), store it in curated_snapshot (Option<Vec<u8>>), and call
curated_snapshot.as_deref() when creating the PromptContext.

In `@src/openhuman/people/rpc.rs`:
- Around line 16-19: Add entry/exit and key-event tracing to the RPC handlers
handle_list, handle_resolve, and handle_score similar to
handle_refresh_address_book: emit debug/trace logs with grep-friendly prefixes
like "[rpc]" or "[people]" at function entry and exit, log branch decisions and
external calls (store.list, interactions_for, lookup, insert_person) including
correlation fields (request id or method name) and entity identifiers
(person_id, handle/kind) so the DB/lookup calls and outcomes are visible for
diagnostics; follow existing tracing style used in handle_refresh_address_book
for level and message format.

In `@src/openhuman/people/tests.rs`:
- Around line 60-68: The test function person_id_uuid_format contains an unused
call to Utc::now() that should be removed; update the test by deleting the line
let _now = Utc::now(); (or move it into a new timestamp-focused test if you need
to exercise time behavior) so the test only verifies PersonId::new(),
id.to_string(), and parsing back to uuid::Uuid (referencing PersonId and the
person_id_uuid_format test).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53db1672-b237-409a-922b-375440e06437

📥 Commits

Reviewing files that changed from the base of the PR and between 2452a1b and 26664ca.

📒 Files selected for processing (29)

src/core/all.rs
src/core/jsonrpc.rs
src/openhuman/agent/debug/mod.rs
src/openhuman/agent/harness/session/builder.rs
src/openhuman/agent/harness/session/runtime.rs
src/openhuman/agent/harness/session/turn.rs
src/openhuman/agent/harness/session/types.rs
src/openhuman/agent/harness/subagent_runner/ops.rs
src/openhuman/curated_memory/mod.rs
src/openhuman/eventkit/calendar.rs
src/openhuman/eventkit/runtime.rs
src/openhuman/learning/prompt_sections.rs
src/openhuman/life_capture/embedder.rs
src/openhuman/life_capture/ingest/composio.rs
src/openhuman/life_capture/rpc.rs
src/openhuman/life_capture/schemas.rs
src/openhuman/mod.rs
src/openhuman/people/address_book.rs
src/openhuman/people/migrations.rs
src/openhuman/people/resolver.rs
src/openhuman/people/rpc.rs
src/openhuman/people/schemas.rs
src/openhuman/people/scorer.rs
src/openhuman/people/tests.rs
src/openhuman/people/types.rs
src/openhuman/webview_notifications/types.rs
tests/agent_harness_public.rs
tests/composio_ingest_e2e.rs
tests/imessage_ingest_e2e.rs

✅ Files skipped from review due to trivial changes (2)

src/openhuman/learning/prompt_sections.rs
src/openhuman/curated_memory/mod.rs

🚧 Files skipped from review as they are similar to previous changes (6)

tests/agent_harness_public.rs
src/openhuman/agent/harness/session/runtime.rs
src/openhuman/mod.rs
src/core/all.rs
src/openhuman/eventkit/calendar.rs
src/openhuman/eventkit/runtime.rs

coderabbitai · 2026-04-26T01:58:32Z

 /// Initializes long-lived socket/event-bus infrastructure.
+/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
+/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
+async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {


⚠️ Potential issue | 🟡 Minor

Stale doc comment now attached to bootstrap_curated_memory.

Line 865's /// Initializes long-lived socket/event-bus infrastructure. was originally the rustdoc for bootstrap_skill_runtime, but inserting the two new helpers above it leaves this line as a leading doc comment on bootstrap_curated_memory (the next item Rust sees), so the rendered docs/intellisense for bootstrap_curated_memory will start with an unrelated sentence about socket/event-bus, and bootstrap_skill_runtime is left undocumented.

📝 Proposed fix — move the doc comment back to `bootstrap_skill_runtime`

-/// Initializes long-lived socket/event-bus infrastructure. /// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the /// curated-memory runtime singleton. Idempotent — second-init is a no-op. async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {

…and add it just above bootstrap_skill_runtime (around line 984):

+/// Initializes long-lived socket/event-bus infrastructure. pub async fn bootstrap_skill_runtime() {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Initializes long-lived socket/event-bus infrastructure.

/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the

/// curated-memory runtime singleton. Idempotent — second-init is a no-op.

async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {

/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the

/// curated-memory runtime singleton. Idempotent — second-init is a no-op.

async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/core/jsonrpc.rs` around lines 865 - 868, The doc comment "Initializes long-lived socket/event-bus infrastructure." is now incorrectly attached to bootstrap_curated_memory; move that leading doc comment (and any subsequent related lines) from above async fn bootstrap_curated_memory to directly above the async fn bootstrap_skill_runtime so bootstrap_skill_runtime regains its intended documentation and bootstrap_curated_memory has only its correct docs or none.

coderabbitai · 2026-04-26T01:58:32Z

+    /// Session-scoped snapshot of the curated-memory pair
+    /// (`MEMORY.md` + `USER.md`) read once from the
+    /// [`crate::openhuman::curated_memory::runtime`] singleton when the
+    /// agent is built. Forwarded into [`PromptContext::curated_snapshot`]
+    /// every turn so the rendered prompt bytes stay frozen for the
+    /// session's KV-cache prefix, while runtime writes via
+    /// `curated_memory.add/replace/remove` land on the NEXT session.
+    /// `None` when the curated-memory runtime isn't initialised (unit
+    /// tests, older embeds) — the renderer falls back to workspace files.


⚠️ Potential issue | 🟡 Minor

Doc timing mismatch for curated_snapshot lifecycle

Line 129 says the snapshot is read when the agent is built, but current behavior captures it on first turn (ensure_curated_snapshot), with builder initialization at None. Please update this doc block to reflect the real lifecycle.

📝 Suggested doc correction

- /// [`crate::openhuman::curated_memory::runtime`] singleton when the - /// agent is built. Forwarded into [`PromptContext::curated_snapshot`] + /// [`crate::openhuman::curated_memory::runtime`] singleton on the + /// first turn (`ensure_curated_snapshot`). Forwarded into [`PromptContext::curated_snapshot`]

As per coding guidelines, "When adding or changing domain behavior, ship matching documentation (rustdoc, code comments, or updates to AGENTS.md / architecture docs) before handoff."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Session-scoped snapshot of the curated-memory pair

/// (`MEMORY.md` + `USER.md`) read once from the

/// [`crate::openhuman::curated_memory::runtime`] singleton when the

/// agent is built. Forwarded into [`PromptContext::curated_snapshot`]

/// every turn so the rendered prompt bytes stay frozen for the

/// session's KV-cache prefix, while runtime writes via

/// `curated_memory.add/replace/remove` land on the NEXT session.

/// `None` when the curated-memory runtime isn't initialised (unit

/// tests, older embeds) — the renderer falls back to workspace files.

/// Session-scoped snapshot of the curated-memory pair

/// (`MEMORY.md` + `USER.md`) read once from the

/// [`crate::openhuman::curated_memory::runtime`] singleton on the

/// first turn (`ensure_curated_snapshot`). Forwarded into [`PromptContext::curated_snapshot`]

/// every turn so the rendered prompt bytes stay frozen for the

/// session's KV-cache prefix, while runtime writes via

/// `curated_memory.add/replace/remove` land on the NEXT session.

/// `None` when the curated-memory runtime isn't initialised (unit

/// tests, older embeds) — the renderer falls back to workspace files.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/types.rs` around lines 126 - 134, The doc comment for the Session-scoped `curated_snapshot` is out of sync with implementation: it states the snapshot is captured "when the agent is built" but the code actually initializes `curated_snapshot` to None in the builder and captures it on first turn via `ensure_curated_snapshot`; update the rustdoc to describe this lifecycle (None in builder, captured lazily on first turn by `ensure_curated_snapshot`, forwarded into `PromptContext::curated_snapshot` each turn so runtime writes apply to subsequent sessions) and mention that it is None when the curated-memory runtime (`crate::openhuman::curated_memory::runtime`) isn't initialized.

coderabbitai · 2026-04-26T01:58:32Z

+    async fn fresh_bridge() -> LifeCaptureComposioBridge {
+        let dir = tempdir().expect("tempdir");
+        let db = dir.path().join("lc.db");
+        std::mem::forget(dir);
+        let idx = Arc::new(PersonalIndex::open(&db).await.expect("open"));
+        let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder);
+        LifeCaptureComposioBridge::new(idx, emb)
+    }


⚠️ Potential issue | 🟡 Minor

std::mem::forget(dir) leaks the temp directory on every test run.

tempfile::TempDir cleans up on drop; mem::forget permanently leaks the directory at dir.path() to disk. With 4 tests using fresh_bridge, every cargo test run leaves 4 stray directories under the system temp root. Return the TempDir from fresh_bridge (or store it in the bridge wrapper) so it is dropped at end of test instead.

🔧 Proposed fix

- async fn fresh_bridge() -> LifeCaptureComposioBridge { - let dir = tempdir().expect("tempdir"); - let db = dir.path().join("lc.db"); - std::mem::forget(dir); - let idx = Arc::new(PersonalIndex::open(&db).await.expect("open")); - let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder); - LifeCaptureComposioBridge::new(idx, emb) - } + async fn fresh_bridge() -> (LifeCaptureComposioBridge, tempfile::TempDir) { + let dir = tempdir().expect("tempdir"); + let db = dir.path().join("lc.db"); + let idx = Arc::new(PersonalIndex::open(&db).await.expect("open")); + let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder); + (LifeCaptureComposioBridge::new(idx, emb), dir) + }

Then update each let bridge = fresh_bridge().await; to let (bridge, _dir) = fresh_bridge().await;.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/ingest/composio.rs` around lines 415 - 422, The helper fresh_bridge() currently calls std::mem::forget(dir) which leaks the TempDir; change fresh_bridge() to return ownership of the TempDir (e.g., return (LifeCaptureComposioBridge, tempfile::TempDir)) or hold the TempDir inside the LifeCaptureComposioBridge wrapper so it lives until tests drop it; update all test call sites that do let bridge = fresh_bridge().await; to destructure the tuple (let (bridge, _dir) = fresh_bridge().await;) or accept the wrapper that owns the TempDir so the TempDir is dropped at test end, and ensure PersonalIndex::open(&db).await and the Arc<dyn Embedder> (TestEmbedder) usage remain unchanged.

coderabbitai · 2026-04-26T01:58:32Z

+    #[test]
+    fn bridge_has_stable_name_and_domain() {
+        // Construct with stubs just to exercise the trait methods.
+        struct Stub;
+        #[async_trait]
+        impl Embedder for Stub {
+            async fn embed_batch(&self, _t: &[&str]) -> anyhow::Result<Vec<Vec<f32>>> {
+                Ok(vec![])
+            }
+            fn dim(&self) -> usize {
+                1536
+            }
+        }
+        // Lightweight non-init path — we don't actually need a live index
+        // for the trait introspection bits, but we do need a real
+        // PersonalIndex handle. Reuse the fresh_bridge pattern via a
+        // smoke check in the runtime test.
+        let name = "life_capture::composio_bridge";
+        assert_eq!(name, name); // shape guard for the name constant
+    }


⚠️ Potential issue | 🟡 Minor

Tautological assertion provides no coverage.

assert_eq!(name, name) at line 619 is always true and does not exercise LifeCaptureComposioBridge::name() or domains(). Either construct a real bridge (e.g. via fresh_bridge) and assert against the trait methods, or delete the test — the surrounding Stub Embedder impl is also unused.

🔧 Proposed fix

#[test] fn bridge_has_stable_name_and_domain() { - // Construct with stubs just to exercise the trait methods. - struct Stub; - #[async_trait] - impl Embedder for Stub { - async fn embed_batch(&self, _t: &[&str]) -> anyhow::Result<Vec<Vec<f32>>> { - Ok(vec![]) - } - fn dim(&self) -> usize { - 1536 - } - } - // Lightweight non-init path — we don't actually need a live index - // for the trait introspection bits, but we do need a real - // PersonalIndex handle. Reuse the fresh_bridge pattern via a - // smoke check in the runtime test. - let name = "life_capture::composio_bridge"; - assert_eq!(name, name); // shape guard for the name constant + let bridge = fresh_bridge().await; + assert_eq!(bridge.name(), "life_capture::composio_bridge"); + assert_eq!(bridge.domains(), Some(&["composio"][..])); }

(Make the test #[tokio::test] if you switch to fresh_bridge.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/ingest/composio.rs` around lines 601 - 620, The test bridge_has_stable_name_and_domain currently contains a tautological assertion (assert_eq!(name, name)) and never calls LifeCaptureComposioBridge::name() or domains(), nor uses the Stub Embedder; replace the bogus assertion by constructing a real bridge (use the existing fresh_bridge helper or the constructor that returns a PersonalIndex handle) and call LifeCaptureComposioBridge::name() and LifeCaptureComposioBridge::domains() to assert expected values, or if you prefer to keep it lightweight convert the test to #[tokio::test], call fresh_bridge(...) to get a bridge instance, then assert_eq!(bridge.name(), "life_capture::composio_bridge") and assert!(bridge.domains().contains(...)) and remove the unused Stub/Embedder impl if you choose the fresh_bridge approach.

coderabbitai · 2026-04-26T01:58:32Z

+    const INDEX_VECTOR_DIM: usize = 1536;
+    if embedder.dim() != INDEX_VECTOR_DIM {
+        return Err(format!(
+            "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
+            embedder.dim()
+        ));
+    }
+
+    let mut vecs = embedder
+        .embed_batch(&[text.as_str()])
+        .await
+        .map_err(|e| format!("embed: {e}"))?;
+    let query_vec = vecs.pop().ok_or("embedder returned no vectors")?;
+    if query_vec.len() != INDEX_VECTOR_DIM {
+        return Err(format!(
+            "embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
+            query_vec.len()
+        ));
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm whether PersonalIndex's sqlite-vec column dim is fixed at 1536, and whether # any path other than rpc.rs hardcodes a vector dim. rg -nP --type=rust -C3 '\b(1536|3072|vector_dim|VECTOR_DIM|item_vectors)\b' -g '!**/tests/**' ast-grep --pattern 'CREATE VIRTUAL TABLE $$$'

Repository: tinyhumansai/openhuman

Length of output: 19658

🏁 Script executed:

# Check handle_ingest function in rpc.rs around lines 123-178 sed -n '120,185p' src/openhuman/life_capture/rpc.rs

Repository: tinyhumansai/openhuman

Length of output: 2271

🏁 Script executed:

# Search for CREATE VIRTUAL TABLE and schema definitions rg -n 'CREATE VIRTUAL TABLE|CREATE TABLE' --type=rust src/openhuman/life_capture/

Repository: tinyhumansai/openhuman

Length of output: 173

🏁 Script executed:

# Check if PersonalIndex has a vector_dim or similar method rg -n 'fn.*dim|pub fn' src/openhuman/life_capture/index.rs | head -20

Repository: tinyhumansai/openhuman

Length of output: 166

🏁 Script executed:

# Look for the actual table schema/migration that defines item_vectors find src/openhuman/life_capture -name "*.rs" -exec grep -l "item_vectors\|CREATE" {} \;

Repository: tinyhumansai/openhuman

Length of output: 143

🏁 Script executed:

# Check migrations.rs more thoroughly for table definitions cat src/openhuman/life_capture/migrations.rs

Repository: tinyhumansai/openhuman

Length of output: 4456

🏁 Script executed:

# Look for PersonalIndex definition and any dimension-related methods grep -n "impl PersonalIndex\|fn.*dim\|pub fn\|struct PersonalIndex" src/openhuman/life_capture/index.rs | head -30

Repository: tinyhumansai/openhuman

Length of output: 220

🏁 Script executed:

# Check the migration SQL files - specifically 0002_vec.sql for the item_vectors schema find src/openhuman/life_capture/migrations -name "*.sql" -exec cat {} \;

Repository: tinyhumansai/openhuman

Length of output: 2588

🏁 Script executed:

# Also check if there are any other methods in PersonalIndex that might expose vector_dim sed -n '61,200p' src/openhuman/life_capture/index.rs | grep -n "pub fn\|fn "

Repository: tinyhumansai/openhuman

Length of output: 269

Hardcoded INDEX_VECTOR_DIM = 1536 is asymmetric and tightly coupled to the index schema.

Two concerns:

Index dim is hardcoded in the schema but embedder dim is configurable. Per src/core/jsonrpc.rs:944-956, bootstrap reads OPENHUMAN_EMBEDDINGS_MODEL and OPENHUMAN_EMBEDDINGS_DIM to construct the embedder with per-model dimensions (e.g., 3072 for text-embedding-3-large). With that configuration, handle_search will hard-fail with "embedder dim 3072 does not match index dim 1536"—a correct rejection, since the item_vectors schema (migration 0002_vec.sql) is permanently fixed at float[1536]. Either drop the env-driven dim configurability in bootstrap_life_capture, or have PersonalIndex expose its actual schema dim and check against that instead of a hardcoded constant.

handle_ingest lacks the dimension checks that handle_search enforces. handle_search validates embedder.dim() == INDEX_VECTOR_DIM and vector.len() == INDEX_VECTOR_DIM (lines 76–87), but handle_ingest does not—it embeds and writes the vector unconditionally (lines 149–178). If a misconfigured embedder produces 3072-dim vectors, ingest will pass them to upsert_vector, and sqlite-vec will surface a low-level error rather than the clean RPC message search produces.

🔧 Suggested change for `handle_ingest`

) -> Result<RpcOutcome<Value>, String> { if external_id.trim().is_empty() { return Err("external_id must not be empty".into()); } if text.trim().is_empty() { return Err("text must not be empty".into()); } let ts = DateTime::<Utc>::from_timestamp(ts, 0) .ok_or_else(|| format!("invalid ts (out of range): {ts}"))?; + const INDEX_VECTOR_DIM: usize = 1536; + if embedder.dim() != INDEX_VECTOR_DIM { + return Err(format!( + "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}", + embedder.dim() + )); + } + let mut vecs = embedder .embed_batch(&[text.as_str()]) .await .map_err(|e| format!("embed: {e}"))?; let vector = vecs.pop().ok_or("embedder returned no vectors")?; + if vector.len() != INDEX_VECTOR_DIM { + return Err(format!( + "embedding length {} does not match index dim {INDEX_VECTOR_DIM}", + vector.len() + )); + }

Better still, lift INDEX_VECTOR_DIM to a single shared constant (or expose it via PersonalIndex) so search, ingest, and bootstrap cannot drift.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/rpc.rs` around lines 76 - 94, The code hardcodes INDEX_VECTOR_DIM = 1536 in handle_search causing mismatches with configurable embedders; expose the real index vector dimension from the PersonalIndex (or lift INDEX_VECTOR_DIM into a single shared constant used by bootstrap_life_capture, PersonalIndex, handle_search and handle_ingest), then update handle_search to compare embedder.dim() against that shared/exposed dimension and validate query_vec.len() the same way; also add the same dimension checks to handle_ingest before calling upsert_vector (validate embedder.dim() and the produced vector length and return a clear RPC error on mismatch) and ensure bootstrap_life_capture uses the same shared dimension when constructing/configuring the embedder.

coderabbitai · 2026-04-26T01:58:32Z

+    fn request_access(store: &CNContactStore) -> Result<(), AddressBookError> {
+        unsafe {
+            let status = CNContactStore::authorizationStatusForEntityType(CNEntityType::Contacts);
+            match status {
+                CNAuthorizationStatus::Authorized | CNAuthorizationStatus::Limited => {
+                    tracing::debug!("[people::address_book] contacts access already authorized");
+                    return Ok(());
+                }
+                CNAuthorizationStatus::Denied | CNAuthorizationStatus::Restricted => {
+                    return Err(AddressBookError::PermissionDenied);
+                }
+                _ => {
+                    tracing::debug!(
+                        "[people::address_book] requesting contacts access (status={status:?})"
+                    );
+                }
+            }
+
+            let (tx, rx) = std::sync::mpsc::channel::<Result<(), AddressBookError>>();
+            let tx = Arc::new(Mutex::new(Some(tx)));
+            let tx_clone = Arc::clone(&tx);
+
+            let block = RcBlock::new(move |granted: Bool, _error: *mut NSError| {
+                let mut slot = tx_clone.lock().unwrap();
+                if let Some(sender) = slot.take() {
+                    let result = if granted.as_bool() {
+                        Ok(())
+                    } else {
+                        Err(AddressBookError::PermissionDenied)
+                    };
+                    let _ = sender.send(result);
+                }
+            });
+
+            store.requestAccessForEntityType_completionHandler(CNEntityType::Contacts, &*block);
+
+            rx.recv().map_err(|_| {
+                AddressBookError::Other("contacts permission callback never fired".into())
+            })?
+        }
+    }


⚠️ Potential issue | 🟠 Major

Synchronous blocking on the Tokio worker — wrap CNContactStore calls in spawn_blocking.

request_access calls rx.recv() on a std::sync::mpsc channel, which is a blocking, non-async wait. It can sit there for many seconds (or indefinitely) while the macOS TCC dialog is up. enumerateContactsWithFetchRequest_error_usingBlock is also a synchronous blocking call. Both run on whatever thread invokes fetch_via_cn_contact_store.

Because HandleResolver::seed_from_address_book calls address_book::read_with(source)? directly (no spawn_blocking) and read_with calls source.fetch_contacts() synchronously, the entire CNContactStore round‑trip executes on a Tokio worker thread inside the handle_refresh_address_book future. With the default multi-thread runtime that means one worker is parked until the user dismisses the TCC dialog; with current_thread it would deadlock the entire server.

Best fix is to push the FFI work onto a blocking thread:

impl ContactsSource for SystemContactsSource { fn fetch_contacts(&self) -> Result<Vec<AddressBookContact>, AddressBookError> { // Block this thread; callers must invoke from spawn_blocking or a sync context. imp::fetch_via_cn_contact_store() } }

…and update HandleResolver::seed_from_address_book (or add an async variant on ContactsSource) to wrap the call:

let source = source; // by ref isn't Send; restructure as needed let contacts = tokio::task::spawn_blocking(move || source.fetch_contacts()) .await .map_err(|e| AddressBookError::Other(format!("join: {e}")))??;

Either move the spawn_blocking into read_with/fetch_contacts, or make ContactsSource::fetch_contacts async and have SystemContactsSource do the spawn_blocking internally.

Also applies to: 171-269

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/address_book.rs` around lines 129 - 169, The CNContactStore FFI calls (e.g., request_access, enumerateContactsWithFetchRequest_error_usingBlock inside imp::fetch_via_cn_contact_store) block a Tokio worker; wrap these synchronous calls in tokio::task::spawn_blocking. Either (A) move the spawn_blocking into the SystemContactsSource implementation so ContactsSource::fetch_contacts calls imp::fetch_via_cn_contact_store inside spawn_blocking and returns the awaited result, or (B) keep fetch_contacts synchronous but change callers (e.g., HandleResolver::seed_from_address_book / read_with) to call tokio::task::spawn_blocking(|| source.fetch_contacts()) and propagate errors (map join errors to AddressBookError::Other). Ensure request_access remains inside the blocking closure so rx.recv() never runs on a Tokio worker thread.

coderabbitai · 2026-04-26T01:58:32Z

+    pub async fn resolve_or_create(&self, handle: &Handle) -> Result<(PersonId, bool), String> {
+        let canonical = handle.canonicalize();
+        if let Some(id) = self
+            .store
+            .lookup(&canonical)
+            .await
+            .map_err(|e| format!("lookup: {e}"))?
+        {
+            return Ok((id, false));
+        }
+        let id = PersonId::new();
+        let (display_name, primary_email, primary_phone) = match &canonical {
+            Handle::DisplayName(s) => (Some(s.clone()), None, None),
+            Handle::Email(s) => (None, Some(s.clone()), None),
+            Handle::IMessage(s) => {
+                if s.contains('@') {
+                    (None, Some(s.clone()), None)
+                } else {
+                    (None, None, Some(s.clone()))
+                }
+            }
+        };
+        let now = Utc::now();
+        let person = Person {
+            id,
+            display_name,
+            primary_email,
+            primary_phone,
+            handles: vec![canonical.clone()],
+            created_at: now,
+            updated_at: now,
+        };
+        self.store
+            .insert_person(&person, &[canonical])
+            .await
+            .map_err(|e| format!("insert_person: {e}"))?;
+        Ok((id, true))
+    }


⚠️ Potential issue | 🟠 Major

TOCTOU between lookup and insert_person can produce orphan Person rows.

resolve_or_create does a lookup, then mints a fresh PersonId, then calls insert_person (which inserts into people and INSERT OR IGNORE into handle_aliases). If two callers resolve the same canonical handle concurrently, both pass lookup, both generate distinct UUIDs, and both succeed in inserting their Person row. The first to commit wins the handle_aliases(kind, value) row; the second's alias is silently ignored, leaving its Person row orphaned (no handle pointing to it) — and both calls return was_created = true.

Practically, the seed_from_address_book path is sequential per call so this is unlikely there, but handle_resolve is a public RPC and concurrent inbound requests for the same handle (e.g., the iMessage scanner from PR #818 + a UI lookup) can race.

Two ways to make this atomic without redesigning the API:

Push the read-or-create into a single transaction inside PeopleStore, e.g., lookup_or_insert_person(handle, person_factory) that does SELECT person_id FROM handle_aliases ... and INSERT INTO people ...; INSERT INTO handle_aliases ... inside one transaction().

Keep insert_person as-is, but after it returns, re-query lookup(canonical) and, if the returned PersonId differs from the one we just minted, delete our orphan people row (or never insert it and re-use the winner's id).

Option 1 is cleaner and avoids the cleanup path entirely.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/resolver.rs` around lines 41 - 78, resolve_or_create has a TOCTOU: it calls PeopleStore::lookup then creates a new PersonId and calls PeopleStore::insert_person, which can race and produce orphan Person rows when two callers concurrently resolve the same Handle; fix by moving the read-or-create into an atomic transactional operation inside the store (preferred): add a new PeopleStore method like lookup_or_insert_person(handle, person_factory) that within a single transaction SELECTs from handle_aliases and, if missing, INSERTs into people and INSERTs the handle_aliases (or returns the existing person_id), then update resolve_or_create to call lookup_or_insert_person instead of separate lookup + insert_person; alternatively (if you must keep insert_person), after insert_person re-query lookup(canonical) and if the returned PersonId != the one you inserted, delete the orphan people row and return the winner's id.

coderabbitai · 2026-04-26T01:58:32Z

+pub async fn handle_list(store: &PeopleStore, limit: usize) -> Result<RpcOutcome<Value>, String> {
+    let limit = limit.clamp(1, 500);
+    let people = store.list().await.map_err(|e| format!("list: {e}"))?;
+    let now = Utc::now();
+
+    let mut ranked: Vec<(Value, f32)> = Vec::with_capacity(people.len());
+    for p in people {
+        let interactions = store
+            .interactions_for(p.id)
+            .await
+            .map_err(|e| format!("interactions_for: {e}"))?;
+        let s = score(&interactions, now);
+        let handles: Vec<Value> = p
+            .handles
+            .iter()
+            .map(|h| {
+                let (kind, value) = h.as_key();
+                json!({ "kind": kind, "value": value })
+            })
+            .collect();
+        ranked.push((
+            json!({
+                "person_id": p.id.to_string(),
+                "display_name": p.display_name,
+                "primary_email": p.primary_email,
+                "primary_phone": p.primary_phone,
+                "handles": handles,
+                "score": s.score,
+                "components": {
+                    "recency": s.recency,
+                    "frequency": s.frequency,
+                    "reciprocity": s.reciprocity,
+                    "depth": s.depth,
+                },
+                "interaction_count": interactions.len(),
+            }),
+            s.score,
+        ));
+    }
+    ranked.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
+    let people_json: Vec<Value> = ranked.into_iter().take(limit).map(|(v, _)| v).collect();
+    Ok(RpcOutcome::new(json!({ "people": people_json }), vec![]))
+}


⚠️ Potential issue | 🟠 Major

N+1 query pattern when listing people; no top‑K short‑circuit.

handle_list calls store.interactions_for(p.id).await once per person inside a sequential loop and only applies limit after sorting. Each interactions_for is a spawn_blocking round-trip plus a SQL query against interactions, so cost grows linearly with the number of people regardless of limit. With realistic address-book sizes (hundreds to thousands of contacts) this will dominate response time and serialize behind a single Tokio blocking thread per call.

Two reasonable mitigations:

Add an aggregate path on PeopleStore (e.g., list_with_scores(limit) or interaction_summaries()) that fetches the rows needed for scoring in a single query, then run score() over the in-memory result.

At minimum, parallelize the per-person fetches with futures::future::join_all so they don't serialize end-to-end.

Either way, since score = 0 when any component is 0 and frequency = 0 for people with no interactions in the window, the long tail of dormant contacts is doing a SQL round-trip just to land at the bottom of the list.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/rpc.rs` around lines 16 - 58, handle_list currently issues an interactions_for(p.id).await inside a sequential loop which causes an N+1 query pattern and serial spawn_blocking calls; fix by either (A) adding an aggregate method on PeopleStore (suggest names: list_with_scores(limit) or interaction_summaries()) that runs a single SQL query to return interaction aggregates needed by score() and then compute score() in-memory for each person before sorting and truncating by limit, or (B) minimally parallelizing the per-person fetches by collecting futures for store.interactions_for(p.id) and awaiting them with futures::future::join_all so the DB calls run concurrently, and ensure you apply limit before expensive work by sorting/truncating after computing scores. Use the function name handle_list and the method interactions_for as anchors when making the change.

Reads last N messages from $CHAT_DB (default ~/Library/Messages/chat.db), ingests into an in-memory PersonalIndex via IndexWriter::upsert, then runs IndexReader::keyword_search over a hardcoded query list. No embeddings — pure FTS5/bm25 over real personal data. Use to benchmark ingest throughput on a real corpus and to sanity-check that keyword search ranking + snippet generation behave on real text. On a 5000-message slice: ingest at ~47k items/s, search returns 3 hits with snippets in sub-millisecond time. Run: cargo run --example real_imessage_demo --release Requires Full Disk Access for the running terminal.

coderabbitai

🧹 Nitpick comments (2)

examples/real_imessage_demo.rs (2)

63-73: Nit: collapse the two None branches.

is_from_me || handle.is_empty() reads the same and drops one branch.

Proposed tweak

-        let author = if is_from_me {
-            None
-        } else if handle.is_empty() {
-            None
-        } else {
-            Some(Person {
-                display_name: None,
-                email: None,
-                source_id: Some(handle.clone()),
-            })
-        };
+        let author = if is_from_me || handle.is_empty() {
+            None
+        } else {
+            Some(Person {
+                display_name: None,
+                email: None,
+                source_id: Some(handle.clone()),
+            })
+        };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/real_imessage_demo.rs` around lines 63 - 73, Collapse the two None
branches when building author: replace the nested if/else that checks is_from_me
then handle.is_empty() with a single condition (is_from_me || handle.is_empty())
so author is None in either case, otherwise return Some(Person { display_name:
None, email: None, source_id: Some(handle.clone()) }); update the block that
assigns to author (the author variable and Person construction) accordingly.

33-44: Add comment documenting that recent iMessage rows on Big Sur+ will be skipped.

On macOS Big Sur and later, the Messages app stores the body of many (most) recent messages in the attributedBody BLOB column (a binary plist/NSAttributedString archive) and leaves message.text NULL. The current query's WHERE m.text IS NOT NULL AND length(m.text) > 0 filter will silently skip those rows, causing the demo to pull back only older plain-text messages and potentially appear to have sparse results on modern chat databases. Add a comment before line 33 to document this limitation so users understand why the demo may return fewer messages than expected.

Suggested fix

+    // NOTE: on Big Sur+ many rows have NULL `text` and the body lives in
+    // `attributedBody` (binary plist). This demo only ingests the plain-text
+    // subset; expect a fraction of recent messages to be skipped.
     let mut stmt = conn.prepare(
         "SELECT m.ROWID, \
                 COALESCE(m.text, '') as text, \

A fuller implementation would SELECT m.attributedBody and decode the NSString payload from the archived plist when text is NULL, but documenting the limitation is sufficient for a demo.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/real_imessage_demo.rs` around lines 33 - 44, Add a brief comment
just above the SQL prepare call (the conn.prepare(...) that builds `stmt`)
explaining that on macOS Big Sur and later many recent message bodies are stored
in the `attributedBody` BLOB (an archived NSAttributedString/plist) and that the
WHERE clause `m.text IS NOT NULL AND length(m.text) > 0` will therefore skip
those rows, causing the demo to return only older plain-text messages; note that
a fuller fix would SELECT and decode `m.attributedBody` when `text` is NULL, but
for the demo we document the limitation instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/real_imessage_demo.rs`:
- Around line 63-73: Collapse the two None branches when building author:
replace the nested if/else that checks is_from_me then handle.is_empty() with a
single condition (is_from_me || handle.is_empty()) so author is None in either
case, otherwise return Some(Person { display_name: None, email: None, source_id:
Some(handle.clone()) }); update the block that assigns to author (the author
variable and Person construction) accordingly.
- Around line 33-44: Add a brief comment just above the SQL prepare call (the
conn.prepare(...) that builds `stmt`) explaining that on macOS Big Sur and later
many recent message bodies are stored in the `attributedBody` BLOB (an archived
NSAttributedString/plist) and that the WHERE clause `m.text IS NOT NULL AND
length(m.text) > 0` will therefore skip those rows, causing the demo to return
only older plain-text messages; note that a fuller fix would SELECT and decode
`m.attributedBody` when `text` is NULL, but for the demo we document the
limitation instead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70f624eb-1dd2-46fe-a539-137d2f592428

📥 Commits

Reviewing files that changed from the base of the PR and between 26664ca and c03e59a.

📒 Files selected for processing (1)

examples/real_imessage_demo.rs

jwalin-shah requested a review from a team April 23, 2026 01:27

jwalinsshah added 28 commits April 22, 2026 20:14

docs(spec): cut Slack from life-capture v1, defer to v1.1

36eb444

docs(plans): incorporate Onyx insights — ACL on items, sync_state tab…

e160e58

…le, keyword projection column

docs(plans): drop ubuntu from CI matrix in Track 1; preserve resolver…

d7f2284

… hardening

feat(life_capture): core types — Item, Source, Person, Query, Hit, In…

1d45ffb

…dexStats

feat(life_capture): PII redaction (email, phone, SSN, CC)

da09f01

feat(life_capture): strip quoted replies from email bodies

a938f90

feat(life_capture): SQLite schema + migration loader (rusqlite)

24d8264

feat(life_capture): Embedder trait + HostedEmbedder (OpenAI-compatible)

ae57f28

feat(curated_memory): MemoryStore for MEMORY.md + USER.md (atomic wri…

37da5e1

…te, char-bounded)

style(life_capture, curated_memory): apply rustfmt + finish curated_m…

2efab6a

…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.

jwalinsshah added 8 commits April 22, 2026 22:15

Merge A2: Composio→life_capture bridge for Gmail + Calendar

53dd7f6

Merge A5: people resolver + scorer + CNContactStore

48df662

Merge A7: EventKit Calendar read + Reminders write

9c48626

# Conflicts: # Cargo.lock # Cargo.toml

refactor(channels/telegram): extract tests to sibling file

b946aff

Moves the inline #[cfg(test)] mod tests block out of channel.rs into channel_tests.rs to bring the production file under the project's <=500 line guideline. No behavior change.

fix(test): add missing curated_snapshot field to ParentExecutionConte…

0a67c89

…xt stub

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

style: cargo fmt --all

0391899

jwalin-shah marked this pull request as draft April 23, 2026 07:37

jwalin-shah changed the title ~~feat(life_capture): ingest controller + iMessage scanner bridge~~ feat(life_capture): A1 iMessage ingest controller Apr 23, 2026

jwalin-shah mentioned this pull request Apr 23, 2026

feat(life_capture, curated_memory): personal index + curated memory scratchpad (Plan #1 foundation) #817

Open

7 tasks

jwalinsshah added 12 commits April 23, 2026 10:42

Merge remote-tracking branch 'upstream/main' into test/imessage-live-e2e

f969703

# Conflicts: # src/openhuman/mod.rs

jwalin-shah marked this pull request as ready for review April 25, 2026 08:03

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

Conversation

jwalin-shah commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Depends on #817

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwalin-shah commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading