Skip to content

feat(life_capture): A1 iMessage ingest controller#818

Open
jwalin-shah wants to merge 59 commits intotinyhumansai:mainfrom
jwalin-shah:test/imessage-live-e2e
Open

feat(life_capture): A1 iMessage ingest controller#818
jwalin-shah wants to merge 59 commits intotinyhumansai:mainfrom
jwalin-shah:test/imessage-live-e2e

Conversation

@jwalin-shah
Copy link
Copy Markdown
Contributor

@jwalin-shah jwalin-shah commented Apr 23, 2026

⚠️ Depends on #817

This PR stacks on #817 (life-capture foundation). Commit be1112f9 introduces the life_capture.ingest controller which requires the PersonalIndex + IndexWriter types from #817. Do not merge until #817 lands — after that I'll rebase this onto new main and the diff auto-shrinks to just the A1 delta (~1 commit).


Summary

  • Add life_capture.ingest RPC controller — idempotent upsert by (source, external_id), embeds text, atomic vector replace.
  • Switch iMessage scanner's ingest_group() from openhuman.memory_doc_ingest to openhuman.life_capture_ingest, so chat-day transcripts land in the PersonalIndex (the same store life_capture.search reads from).
  • Fix life_capture.search envelope shape: wrap hits in {"hits": [...]} per its declared schema (was returning a bare array).
  • New #[ignore]d e2e at tests/imessage_ingest_e2e.rs: ingest → search → re-ingest, asserts replaced=true and total_items unchanged.

Test plan

  • cargo test -p openhuman --lib openhuman::life_capture — 27 passing
  • cargo test --test imessage_ingest_e2e -- --ignored --test-threads=1 — 2 passing
  • cargo check --manifest-path Cargo.toml
  • cargo check --manifest-path app/src-tauri/Cargo.toml
  • Manual: live tick against real ~/Library/Messages/chat.db via yarn tauri dev, confirm hits appear in life_capture.search

Summary by CodeRabbit

  • New Features

    • Life-capture: local ingest, indexing, embedding and search for Gmail/Calendar/iMessage
    • EventKit bridge: calendar reads and reminders creation on macOS
    • People: contact resolver, scoring, and address-book seeding
    • Curated memory: session snapshots and editable personal notes
  • Tests

    • Installer smoke test added to CI to validate resolver/install flow
  • Documentation

    • Detailed life-capture architecture and ship pipeline plans added
  • macOS Permissions

    • Privacy usage strings for Contacts, Calendars, and Reminders added

@jwalin-shah jwalin-shah requested a review from a team April 23, 2026 01:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Adds foundational life-capture and curated-memory subsystems plus people/eventkit domains, wiring index/embedder/ingest/storage into server startup and agent prompt contexts; also adds installer resolver test and macOS permissions, many new modules, schemas, RPC handlers, and extensive tests.

Changes

Cohort / File(s) Summary
CI & Installer
.github/workflows/installer-smoke.yml, scripts/install.sh, scripts/test_install.sh, scripts/fixtures/latest.json
Added resolver unit test to smoke CI, introduced --source-only mode, refactored asset resolution (resolve_asset_url), added HTTP reachability helper and test fixture.
Dependencies & macOS config
Cargo.toml, app/src-tauri/Info.plist
Added r2d2/sqlite-vec/httpmock/objc2 and macOS-specific crates; added Contacts/Calendars/Reminders privacy usage strings.
Core startup & registry
src/core/jsonrpc.rs, src/core/all.rs
Synchronous bootstrap of curated-memory and life-capture at server start; registered new controller namespaces (life_capture, people, curated_memory, eventkit) and descriptions.
Life-capture index & runtime
src/openhuman/life_capture/*, src/openhuman/life_capture/index.rs, .../migrations.rs, .../runtime.rs, .../embedder.rs, .../rpc.rs, .../schemas.rs, .../types.rs
New PersonalIndex (SQLite + sqlite-vec) with migrations, reader/writer APIs, embedder trait + hosted impl, RPC handlers (ingest/search/stats), runtime initialization and schemas.
Ingest bridge (Composio)
src/openhuman/life_capture/ingest/composio.rs, src/openhuman/life_capture/ingest/mod.rs
Composio bridge that normalizes Gmail/Calendar webhook triggers into life-capture ingest flow and registers idempotent subscriber.
Quote stripping & redaction utilities
src/openhuman/life_capture/quote_strip.rs, .../redact.rs
Implement quoted-reply stripping and PII redaction utilities with tests.
Curated memory subsystem
src/openhuman/curated_memory/*
New MemoryStore, snapshot type, runtime singleton, RPC handlers (read/add/replace/remove), controller schemas and re-exports.
People domain (resolver, store, scoring)
src/openhuman/people/*, src/openhuman/people/resolver.rs, .../store.rs, .../scorer.rs, .../schemas.rs, .../rpc.rs, .../types.rs
New PeopleStore, HandleResolver, address-book integration, deterministic aliasing, interaction recording, scoring functions, RPC handlers and schemas.
EventKit bridge (macOS)
src/openhuman/eventkit/*, src/openhuman/eventkit/calendar.rs, .../reminders.rs, .../store.rs, .../runtime.rs, .../schemas.rs, .../types.rs
macOS EventKit calendar read + reminders write with authorization bridging, SQLite cache/migrations, runtime singletons, RPC handlers and schemas; non-macOS stubs return NotSupported.
Agent prompt & snapshot plumbing
src/openhuman/agent/harness/*, src/openhuman/agent/prompts/{mod,types}.rs, src/openhuman/agent/prompts/tests.rs, src/openhuman/agent/.../prompt.rs
Added session-scoped curated_snapshot to Agent/PromptContext/ParentExecutionContext; capture snapshot on first turn; inject snapshot-backed user files into prompt rendering; updated many tests to include curated_snapshot: None.
iMessage scanner update
app/src-tauri/src/imessage_scanner/mod.rs
Switches ingest RPC to openhuman.life_capture_ingest and adapts payload to life-capture schema; parses day → unix ts.
Tests reorganization & new suites
src/openhuman/channels/providers/telegram/channel_tests.rs, src/openhuman/providers/compatible_tests.rs, src/openhuman/agent/prompts/tests.rs, tests/composio_ingest_e2e.rs, src/openhuman/life_capture/tests/e2e.rs
Moved inline tests to separate files, added large Telegram & provider test suites, life-capture e2e, and Composio ingest e2e (ignored).
Examples & extras
examples/real_imessage_demo.rs
New macOS iMessage example that builds an in-memory personal index from iMessage DB and runs keyword queries.
Public re-exports & module entries
src/openhuman/{mod.rs, curated_memory/mod.rs, life_capture/mod.rs, eventkit/mod.rs, people/mod.rs}
Added module entrypoints and re-exports for new domains to crate API surface.
Small API/struct adjustments
various files (tests/*, src/openhuman/webview_notifications/types.rs, src/openhuman/channels/providers/telegram/channel.rs)
Minor visibility change (TelegramTypingTask pub(super)), derive Default for NotificationSettings, updated test stubs to set curated_snapshot: None.

Sequence Diagram(s)

sequenceDiagram
    participant Gmail as Gmail Webhook
    participant Composio as ComposioBridge
    participant Embedder as HostedEmbedder
    participant Index as PersonalIndex
    participant Writer as IndexWriter
    participant RPC as life_capture.rpc

    Gmail->>Composio: Trigger event (messageId, payload)
    Composio->>Composio: Normalize → Item{source,external_id,ts,subject,text,metadata}
    Composio->>Embedder: embed_batch([text])
    Embedder-->>Composio: vector
    Composio->>Writer: upsert(Item)
    Writer->>Index: write/transaction (dedupe by source+external_id)
    Index-->>Writer: item_id, replaced
    Writer-->>Composio: upsert result
    Composio->>RPC: return ingest outcome (item_id,replaced)
Loading
sequenceDiagram
    participant User as Agent Session
    participant Agent as Agent
    participant Snap as CuratedSnapshot
    participant Prompt as PromptBuilder
    participant Reader as IndexReader
    participant Index as PersonalIndex

    User->>Agent: start conversation (first turn)
    Agent->>Snap: ensure_curated_snapshot()
    Snap-->>Agent: optional snapshot
    Agent->>Prompt: build(PromptContext{curated_snapshot})
    Prompt->>Reader: hybrid_search(query, snapshot-aware)
    Reader->>Index: read FTS + vectors (frozen snapshot used for user files)
    Index-->>Reader: hits
    Reader-->>Agent: inject citations
    Agent->>User: respond with cited items
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • senamakel

"🐰
I nibble logs and stitch the thread,
From inbox crumbs to memories spread;
Contacts, calendars, vectors too,
I tuck them safe for me and you—
A hopping patchwork, fresh and new!"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch test/imessage-live-e2e

Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/
iMessage into a local personal_index.db, with a Today brief and retrieval-
tuned chat as the two surfaces. Three composable privacy modes (Convenience /
Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship
pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from
~/projects/inbox.
Two implementation plans drafted from the life-capture spec:

- Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs
  tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds.
- Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db,
  Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread
  stripping, hybrid retrieval (vector + keyword + recency), controller
  schema + RPC. End-to-end test with synthetic items. No ingestion or UI
  yet — those are subsequent milestone plans.
Refactors scripts/install.sh to expose resolve_asset_url and
verify_asset_reachable. Adds scripts/test_install.sh that exercises
the resolver against a committed fixture latest.json. Failures now
report the resolved URL and the parse error instead of dying silently.
Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an
agent-writable, char-bounded curated memory file pair. F15 builds the
char-bounded store with atomic writes; F16 wires a session-start
snapshot into the system prompt (preserves prefix cache); F17 exposes
memory.{add,replace,remove,read} through the controller dispatch so
both the agent loop and skills can use it.

Sits between TinyHumans synthesis (volatile) and personal_index.db
(raw events) — fills the "deliberately curated facts" gap that
neither covered.
Adds src/openhuman/life_capture/ module tree (stubs for embedder,
index, migrations, quote_strip, redact, rpc, schemas, types, plus
tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs.

Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing
rusqlite/regex/once_cell/async-trait/tempfile already in tree
(Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).
…egex

Previously the country-code group required at least one digit (\+?\d{1,3}),
so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the
whole prefix in a non-capturing optional group so parenthesized area codes
match without a leading country code.
… table

Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every
new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite
Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets
WAL+foreign_keys, open_in_memory() is for tests.

vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.
…_id) dedupe

Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS
stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has
no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in
spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>.

Exposes ensure_vec_extension_registered() crate-wide so the migrations test
can register vec0 before opening its in-memory connection.
- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …),
  plus the user:local ACL EXISTS clause so the same query shape works for the
  multi-token team v2 ACL without rewrites.
- vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement)
  and ORDER BY distance; score = 1/(1 + distance) so callers can blend it
  with keyword bm25 on the same monotonic scale.

Shared ItemRow + into_hit() helper lets both queries reuse the same row
shape; rusqlite query_map closures hand-build it because rusqlite has no
sqlx FromRow equivalent.
Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs,
normalises each independently to [0,1], then re-ranks with
0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30).

Documents present in only one leg get 0 for the missing signal but still
compete on the others; only items with neither signal are dropped. Same-vector
twin items now break their tie by recency.
…rch)

Adds two controllers exposed via the registry:
- life_capture.get_stats — total items, per-source counts, last-ingest ts.
- life_capture.search    — hybrid (vector + keyword + recency) search with
  embed-then-rank, optional k (default 10, capped at 100).

Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime
behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>)
and have no per-call context object. F14 will call runtime::init() at app
startup; until then handlers return a structured 'not initialised' error so
the failure mode is loud, not silent.

Schemas registered in core/all.rs alongside cron.
… → search)

FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector
so the same text round-trips identically through vec0 — keeps the e2e test
hermetic, no network call. Verifies that:
- quote-strip drops the 'On … wrote:' block before indexing,
- redact masks the email address before indexing,
- the upserted item resurfaces via hybrid_search with both transformations
  preserved on the returned text.
After Config::load_or_init succeeds, open the personal_index.db at
config.workspace_dir and register the life_capture runtime. Embedder is
env-gated:
  OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY
  OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1)
  OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small)

If no key is set we still open the index file (so Plan #2's ingestion
worker can write to it) but skip runtime registration — controllers then
return the 'not initialised' error from runtime::get() instead of silently
calling a misconfigured embedder.

Config-schema integration deferred to a follow-up; env-driven keeps F14
non-invasive while we land the rest of the foundation.
…ection

snapshot_pair(memory, user) returns a MemorySnapshot containing plain
strings — no reference back to the stores — so taking a snapshot at session
start and reusing it across turns gives a stable system-prompt prefix and
lets the LLM prefix-cache hit on every subsequent turn.

Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader,
but openhuman assembles agent prompts per-agent under
src/openhuman/agent/agents/*/prompt.rs (not the plan's stale
src-tauri/src/commands/chat.rs path); the prompt-builder integration is
deferred to Plan #2 alongside the agent-context refactor.
… controllers

Adds four controllers under the memory_curated namespace (the 'memory'
namespace is already owned by the long-term memory subsystem):
- memory_curated.read    — read MEMORY.md or USER.md
- memory_curated.add     — append entry, char-bounded
- memory_curated.replace — substring replace, char-bounded
- memory_curated.remove  — drop entries containing a needle

'file' input is a typed enum (memory|user) so adapters reject anything
else at validation time. Runtime state lives in curated_memory::runtime
behind a tokio OnceCell, mirroring life_capture's pattern: startup in
jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes'
char limits (memory: 2200, user: 1375) and registers the runtime.
Handlers return 'not initialised' until init runs, so failure is loud.
…ings

Findings from Codex + Gemini second-opinion review:

1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On
   (source, external_id) conflict the row's id was kept but the caller's
   fresh UUID was used for upsert_vector — the vector wrote under an id no
   row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same
   transaction; mutates caller's Item.id to the canonical id (so the next
   upsert_vector lands on the right row) and orphan-deletes any vector
   already written under the wrong id. Signature change: upsert(&[Item])
   -> upsert(&mut [Item]).

2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a
   failed INSERT permanently lost the item's vector. Now wrapped in tx.

3. Runtime over-gated get_stats on the embedder (Codex). Split runtime
   into separate INDEX + EMBEDDER OnceCells: get_stats only requires
   index, search requires both. Index initialises whenever the workspace
   dir is reachable; embedder is env-gated as before.

4. Startup race (both reviewers): runtime init lived inside the post-
   serve tokio::spawn block, so axum::serve was already accepting before
   the OnceCells were set. Hoisted both bootstraps (curated_memory and
   life_capture) into helpers called inline before axum::serve.

5. runtime::get error message lied (Codex) — said 'set embeddings.api_key
   in config' but startup actually reads env vars. Fixed text.

Bonus: rename controller namespace memory_curated -> curated_memory
(Codex preference; nothing depends on it yet so renamed before clients
do).

Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable).
4696 lib tests pass.
…emory namespace rename

Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.
Critical:
- curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`,
  which could rewrite MEMORY.md / USER.md from an empty baseline on transient
  I/O or permission failures. Reject empty needle in replace/remove.
- curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC
  boundary (remove with "" deletes every entry).
- life_capture/index.rs: upsert_vector rejects orphan ids (items row missing),
  which would have inserted vectors that never join in vector_search.
- life_capture/embedder.rs: validate response length matches input, indices
  are contiguous 0..n, and every vector matches dim() — prevents silent
  misalignment or wrong-dimensional vectors flowing into the 1536-wide
  sqlite-vec table. Added 30s request + 10s connect timeouts.

Retrieval hardening:
- life_capture/index.rs: keyword_search now escapes FTS5 operators by
  tokenizing on whitespace and wrapping each token as a quoted literal.
  Prevents errors or unintended matches from stray quotes, AND/OR/NEAR,
  column specifiers in user input.
- life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until
  via post-filter (single consistent pass across keyword+vector fusion).
- life_capture/rpc.rs: handle_search validates embedder dim and response
  length against the fixed 1536-wide index column — clear RPC error instead
  of a cryptic sqlite-vec failure when the embeddings model is swapped.

Docs:
- docs/event-bus.md: remove erroneous `.await` on register_native_global
  (it is sync; the handler closure is async).
- life-capture design spec: soften PII redaction claim — only regex is
  implemented today; light NER is flagged as a future enhancement.
Rendered prompts now surface runtime curated-memory writes:

- `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`.
- `UserFilesSection` prefers the snapshot over the workspace-file
  loader when one is attached, and injects `USER.md` alongside
  `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper.
- `Session` carries `Option<Arc<MemorySnapshot>>`, populated by
  `ensure_curated_snapshot` on the first turn from
  `curated_memory::runtime::get()`. Reused across turns so prompt
  bytes stay frozen (KV-cache prefix contract) while mid-session
  `curated_memory.add/replace/remove` writes land on the NEXT session.
- `ParentExecutionContext` inherits the snapshot so sub-agents render
  identical `MEMORY.md`/`USER.md` blocks as the parent.
- Legacy workspace-file fallback preserved for embeds that don't
  initialise the curated-memory runtime (pure unit tests).
…ctness (A7 review blockers)

Block lifetime (calendar.rs, reminders.rs): replace the UB cast
`&*block as *const _ as *mut _` with `RcBlock::as_ptr(&block).cast()`.
`RcBlock::as_ptr` returns a proper `*mut Block<F>` owned by the RcBlock;
holding `block` alive until after `blocking_recv()` returns guarantees the
block outlives EventKit's retention window with no UB.

EKEventStore caching (runtime.rs, calendar.rs, reminders.rs): add a
process-global `Arc<Mutex<EventStoreHandle>>` singleton via `OnceLock`.
`EventStoreHandle` is a newtype over `Retained<EKEventStore>` with manual
`Send + Sync` — EKEventStore is Apple-documented as thread-safe and is
always guarded by the Mutex at call sites. `fetch_events` and `save_reminder`
now accept the cached store instead of constructing their own, matching
Apple's one-store-per-process recommendation.

runtime::get() is also changed from panic to `Result<_, String>` so RPC
callers receive a clean error instead of a core crash on DB open/migration
failure.

Dedupe correctness (calendar.rs): chose option (a) — skip events that lack
`calendarItemExternalIdentifier` entirely (log at `info`). The fallback to
the local `calendarItemIdentifier` caused re-emission: before iCloud syncs,
the local id yields hash A; after sync the external id yields hash B — same
event appears twice. Skipping once is safer than silently duping; the log
makes the miss rate observable.

Validation (rpc.rs): `handle_create_reminder` now rejects empty/blank title,
priority > 9, and malformed RFC 3339 due_date with structured error strings.
Past due_date logs a warn but does not reject (lenient for clock skew).

Tests added:
- dedup: `no_dupe_on_resync_with_same_external_id` + `different_external_ids_stored_separately`
- runtime: `get_error_on_bad_path` (no panic on bad DB path)
- rpc validation: empty title, whitespace title, priority > 9, accepts priority 9,
  malformed due_date, valid future due_date, past due_date warn-not-reject
- non-macOS stubs: `list_events` and `create_reminder` stub tests compile but
  are only active on non-macOS targets; macOS CI runs the live path

All 20 eventkit tests pass. `cargo check --bin openhuman-core` is clean.
# Conflicts:
#	Cargo.lock
#	Cargo.toml
Moves the inline #[cfg(test)] mod tests block out of prompts/mod.rs into
prompts/tests.rs to bring the production file under the project's
<=500 line guideline and reduce mod.rs weight per the "light mod.rs" rule.
No behavior change.
Moves the inline #[cfg(test)] mod tests block out of channel.rs into
channel_tests.rs to bring the production file under the project's
<=500 line guideline. No behavior change.
Tidies whitespace left over from the parallel extraction commits in
b946aff and 82b739c — rustfmt re-flowed a handful of long string
literals in channel_tests.rs and compatible_tests.rs and dropped one
trailing blank line in channel.rs. No behavior change.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/openhuman/agent/triage/escalation.rs (1)

125-176: ⚠️ Potential issue | 🔴 Critical

Escalated sub-agents will receive None for curated_snapshot and never populate it.

dispatch_target_agent builds a fresh Agent via Agent::from_config(), which initializes curated_snapshot: None (session builder is synchronous and defers population to the first turn via ensure_curated_snapshot()). The snapshot is captured at line 175 before run_subagent() executes. However, run_subagent() dispatches to either run_typed_mode or run_fork_mode, neither of which executes a full session turn—they only run a provider loop. Therefore, ensure_curated_snapshot() is never called, and the escalated sub-agent will never have access to curated memory, unlike normal subagent dispatch where the parent has already populated its snapshot during turn execution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/escalation.rs` around lines 125 - 176,
dispatch_target_agent builds an Agent via Agent::from_config() but captures
curated_snapshot before it can be populated, so escalated sub-agents get None;
call Agent::ensure_curated_snapshot().await (or the synchronous accessor that
forces population) on the newly built Agent before constructing
ParentExecutionContext so curated_snapshot is populated, then use
agent.curated_snapshot() for the ParentExecutionContext; update
dispatch_target_agent to await that population step prior to calling
run_subagent/run_typed_mode/run_fork_mode.
app/src-tauri/src/imessage_scanner/mod.rs (2)

3-6: ⚠️ Potential issue | 🟡 Minor

Update the scanner rustdoc for life-capture ingest.

The module docs still describe openhuman.memory_doc_ingest and the old playbook convention, but the scanner now writes to openhuman.life_capture_ingest.

Based on learnings, new/changed behavior must ship with matching rustdoc / code comments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src-tauri/src/imessage_scanner/mod.rs` around lines 3 - 6, The
module-level rustdoc in imessage_scanner (top of mod.rs) is outdated: it
references `openhuman.memory_doc_ingest` and the old playbook convention; update
the documentation to reflect current behavior by replacing mentions of
`openhuman.memory_doc_ingest` with `openhuman.life_capture_ingest`, remove or
update the reference to `docs/webview-integration-playbook.md`/WhatsApp
convention, and briefly describe that the scanner emits one
`openhuman.life_capture_ingest` JSON-RPC call per `(chat_identifier, day)` group
(read-only from `~/Library/Messages/chat.db`) so the comments match the actual
runtime behavior.

436-443: ⚠️ Potential issue | 🟠 Major

Treat JSON-RPC errors as ingest failures before advancing the cursor.

JSON-RPC controller failures can arrive as HTTP 200 with an "error" envelope. This currently logs success and lets the tick advance the cursor, so failed life-capture writes can be skipped permanently. Also avoid logging key, since it can include a phone number or email chat identifier.

Suggested envelope check
-    let res = http_client().post(&url).json(&body).send().await?;
+    let res = http_client().post(&url).json(&body).send().await?;
+    let status = res.status();
+    let response_body = res.text().await?;
 
-    if !res.status().is_success() {
-        anyhow::bail!("core rpc {}: {}", res.status(), res.text().await?);
+    if !status.is_success() {
+        anyhow::bail!("core rpc {}: {}", status, response_body);
+    }
+
+    let env: serde_json::Value = serde_json::from_str(&response_body)?;
+    if let Some(error) = env.get("error") {
+        anyhow::bail!("life_capture_ingest rpc error: {}", error);
     }
 
-    log::info!("[imessage] life_capture ingest ok key={}", key);
+    log::info!("[imessage] life_capture ingest ok");

As per coding guidelines, never log secrets or full PII — redact.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src-tauri/src/imessage_scanner/mod.rs` around lines 436 - 443, The HTTP
response handling after http_client().post(&url).json(&body).send().await? (res)
must detect JSON-RPC error envelopes even when HTTP 200, so parse the response
body as JSON (e.g., serde_json::Value) and if it contains an "error" field
(non-null) treat it as a failure (anyhow::bail!) instead of logging success and
advancing the cursor; also remove logging of the raw key value in the success
log (redact or omit it) so life_capture ingest success uses a non-PII message
and failures include the error details from the JSON envelope.
src/openhuman/agent/prompts/mod.rs (1)

718-720: ⚠️ Potential issue | 🟠 Major

Thread the curated snapshot into the direct sub-agent renderer too.

UserFilesSection now honors ctx.curated_snapshot, but this renderer still re-reads MEMORY.md from disk. Sub-agent prompts can diverge mid-session after curated-memory writes, violating the session-frozen prompt/cache contract documented above.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/prompts/mod.rs` around lines 718 - 720, The sub-agent
renderer still calls inject_workspace_file_capped(&mut out, workspace_dir,
"MEMORY.md", USER_FILE_MAX_CHARS) when options.include_memory_md, causing a disk
read instead of using the session-curated memory; change this branch to check
ctx.curated_snapshot (or equivalent field) and, if present, inject the curated
snapshot contents into the renderer output (or call an overload of
inject_workspace_file_capped that accepts raw content) instead of reading
MEMORY.md from workspace_dir so UserFilesSection and the direct sub-agent
renderer both use the same in-memory curated snapshot.
🟡 Minor comments (14)
src/openhuman/life_capture/redact.rs-8-17 (1)

8-17: ⚠️ Potential issue | 🟡 Minor

PHONE regex matches false positives (10-digit order/tracking IDs), but proposed \b fix breaks parentheses format.

The pattern correctly identifies the false-positive risk: plain 10-digit runs like 1234567890 match regardless of context (order#1234567890, tracking 1234567890 shipped), getting redacted before indexing and becoming unsearchable. However, the proposed \b fix breaks the existing test case—(415) 555-0123 fails to match because \b between the opening paren and first digit prevents the match. The parentheses format must remain matchable per the test suite (line 46).

A more nuanced fix is needed—such as requiring at least one separator/paren form, using negative lookahead/lookbehind to exclude bare digit runs, or anchoring to word boundaries only where appropriate (after parens, around separators). Consider what formats should legitimately match when preceded/followed by alphanumerics (e.g., order#1234567890 should not, but tel:(415)555-0123 might).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/redact.rs` around lines 8 - 17, PHONE's regex is
matching bare 10-digit runs (false positives) and adding a simple \b breaks
parenthesized numbers; update the PHONE Regex (the static PHONE Lazy<Regex> /
Regex::new call) to require either explicit separators or parentheses around the
area code or to forbid matching a continuous digit-run with no separators by
using alternation plus lookarounds (e.g., one branch that matches formats with
separators or \(...\) and another branch that uses negative lookbehind/negative
lookahead to reject a preceding/following alphanumeric digit when matching a
plain 10-digit sequence); implement this by modifying the pattern inside
Regex::new so it only matches numbers with separators/parentheses or ensures the
10-digit branch is not adjacent to word characters.
src/core/all.rs-287-296 (1)

287-296: ⚠️ Potential issue | 🟡 Minor

Add descriptions for the other newly registered namespaces.

life_capture and curated_memory are now registered controllers, but namespace_description() only adds people and eventkit, so CLI/help discovery will show no description for two new user-facing namespaces.

Proposed fix
         "learning" => Some(
             "User context enrichment — LinkedIn profile scraping and onboarding intelligence.",
         ),
+        "life_capture" => Some(
+            "Personal index ingestion, search, stats, and retrieval for captured life data.",
+        ),
+        "curated_memory" => Some(
+            "Curated MEMORY.md and USER.md scratchpad reads and writes.",
+        ),
         "people" => Some(
             "Contact resolution and recency × frequency × reciprocity × depth scoring.",
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/all.rs` around lines 287 - 296, namespace_description() only returns
descriptions for "people", "notification", and "eventkit", so the newly
registered namespaces "life_capture" and "curated_memory" appear undocumented in
CLI/help; update the match in namespace_description() to include arms for
"life_capture" and "curated_memory" returning concise user-facing descriptions
(similar style to "people" and "eventkit") so those controllers surface
meaningful help text; reference the function namespace_description() and add the
two new string keys "life_capture" and "curated_memory" with appropriate
description strings.
src/openhuman/life_capture/quote_strip.rs-14-33 (1)

14-33: ⚠️ Potential issue | 🟡 Minor

Preserve true no-op behavior when no quote is present.

strip_quoted_reply() currently trims and rewrites line endings even when there is no marker and no quoted > line, so “passthrough” inputs can still be changed.

Proposed fix
 pub fn strip_quoted_reply(body: &str) -> String {
     // Cut at the earliest of the two marker types.
     let mut cut: Option<usize> = None;
     for re in [&*ON_DATE_WROTE, &*OUTLOOK_SEP] {
         if let Some(m) = re.find(body) {
             cut = Some(cut.map_or(m.start(), |c| c.min(m.start())));
         }
     }
     let head = if let Some(idx) = cut {
         &body[..idx]
     } else {
         body
     };
 
+    if cut.is_none() && !head.lines().any(|l| l.trim_start().starts_with('>')) {
+        return body.to_string();
+    }
+
     // Drop any line starting with '>'.
     let kept: Vec<&str> = head
         .lines()
         .filter(|l| !l.trim_start().starts_with('>'))
         .collect();
     kept.join("\n").trim().to_string()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/quote_strip.rs` around lines 14 - 33,
strip_quoted_reply currently always rebuilds the string which can change line
endings and whitespace even when there is nothing to strip; modify
strip_quoted_reply so that if no marker from ON_DATE_WROTE or OUTLOOK_SEP is
found and no line begins with '>' it returns body.to_string() immediately
(preserving exact input). Otherwise keep the existing logic that computes cut,
slices head, filters '>' lines, joins and trims; reference the function
strip_quoted_reply and the regex markers ON_DATE_WROTE and OUTLOOK_SEP to locate
where to add the early-return check.
src/openhuman/life_capture/migrations/0001_init.sql-9-9 (1)

9-9: ⚠️ Potential issue | 🟡 Minor

Index text_keyword in FTS or remove the column.

text_keyword is documented as the raw FTS/BM25 projection, but items_fts and its triggers index text only. Connectors that populate text_keyword will not affect keyword search.

Suggested migration adjustment
 CREATE VIRTUAL TABLE IF NOT EXISTS items_fts USING fts5(
     subject,
-    text,
+    text_keyword,
     content='items',
     content_rowid='rowid',
     tokenize='porter unicode61'
 );
@@
-    INSERT INTO items_fts(rowid, subject, text)
-    VALUES (new.rowid, COALESCE(new.subject, ''), new.text);
+    INSERT INTO items_fts(rowid, subject, text_keyword)
+    VALUES (new.rowid, COALESCE(new.subject, ''), COALESCE(new.text_keyword, new.text));
@@
-    INSERT INTO items_fts(items_fts, rowid, subject, text)
-    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), old.text);
+    INSERT INTO items_fts(items_fts, rowid, subject, text_keyword)
+    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), COALESCE(old.text_keyword, old.text));
@@
-    INSERT INTO items_fts(items_fts, rowid, subject, text)
-    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), old.text);
-    INSERT INTO items_fts(rowid, subject, text)
-    VALUES (new.rowid, COALESCE(new.subject, ''), new.text);
+    INSERT INTO items_fts(items_fts, rowid, subject, text_keyword)
+    VALUES ('delete', old.rowid, COALESCE(old.subject, ''), COALESCE(old.text_keyword, old.text));
+    INSERT INTO items_fts(rowid, subject, text_keyword)
+    VALUES (new.rowid, COALESCE(new.subject, ''), COALESCE(new.text_keyword, new.text));
 END;

Also applies to: 30-35, 39-54

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/migrations/0001_init.sql` at line 9, The schema
adds a text_keyword column but the FTS virtual table and triggers only index the
text column (items_fts and its triggers), so populate or index text_keyword in
the FTS pipeline or remove the unused column; update the FTS definition
(items_fts) and its associated insert/update/delete triggers to include
text_keyword in the FTS content and bm25/FTS projections, or drop text_keyword
from the CREATE TABLE if you intend to only index text, making sure to update
any migration lines referencing text_keyword (and the triggers that mirror into
items_fts) so connectors that write text_keyword actually affect search.
docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md-24-25 (1)

24-25: ⚠️ Potential issue | 🟡 Minor

Refresh the documented RPC surface to match the shipped controller.

The plan still references embed_and_store, a bare-array search return, and life_capture.* call names. This PR ships openhuman.life_capture_ingest and wraps search results as { "hits": [...] }, so the plan will mislead follow-up work and smoke tests.

Based on learnings, new/changed behavior must ship with matching rustdoc / code comments and architecture docs when user-visible behavior changes.

Also applies to: 1423-1438, 2093-2097

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md` around lines
24 - 25, Update the plan/docs to match the shipped controller surface: replace
references to the removed embed_and_store API
(LifeCaptureController::embed_and_store) with the actual ingest RPC name
openhuman.life_capture_ingest, change any mention of search returning a bare
array to document that search returns an object with "hits": [...] (not a raw
array), and ensure RpcOutcome-style handler mentions align with the current
controller handlers (e.g., LifeCaptureController get_status, get_stats, search)
and openhuman.life_capture_ingest call names so follow-up work and smoke tests
reflect the real, shipped behavior.
scripts/install.sh-203-217 (1)

203-217: ⚠️ Potential issue | 🟡 Minor

verify_asset_reachable is defined but never called — dead code or missing wiring?

The helper is declared with exponential-backoff HEAD retries but no call site wires it into resolve_from_latest_json or the post-resolution flow. Either hook it in after ASSET_URL is resolved (e.g., before the curl -fL ... -o ${DOWNLOAD_PATH} step) so resolution-time confidence matches download-time behaviour, or drop it to avoid drift.

Want me to wire verify_asset_reachable "${ASSET_URL}" into the post-resolution path?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/install.sh` around lines 203 - 217, The verify_asset_reachable helper
is never invoked causing dead code or missed pre-download checks; after
ASSET_URL is resolved (e.g., in resolve_from_latest_json or immediately after
that resolution step) call verify_asset_reachable "${ASSET_URL}" and abort on
non-zero return before performing the curl download that writes to DOWNLOAD_PATH
(the existing curl -fL ... -o ${DOWNLOAD_PATH} step), so the script performs the
HEAD retry/backoff check prior to attempting the actual download.
src/openhuman/eventkit/rpc.rs-35-39 (1)

35-39: ⚠️ Potential issue | 🟡 Minor

Minor: end <= start rejects equal timestamps, but error text says "must be greater".

If a caller passes the same value for start_ts and end_ts (e.g., an instantaneous probe), the condition fails and the error says end_ts must be greater than start_ts, which is consistent — but the schema description doesn't advertise that constraint. Either loosen to end < start (allow equal) or document the strict-greater-than requirement on the end_ts input in schemas.rs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/eventkit/rpc.rs` around lines 35 - 39, The code currently
rejects equal timestamps via the check "if end <= start" in
src/openhuman/eventkit/rpc.rs; to allow instantaneous probes, change the
condition to "if end < start" and update the error message to reflect the new
non-strict constraint (e.g., "end_ts ({end}) must be >= start_ts ({start})");
also update the schema/doc in schemas.rs if it currently claims a
strict-greater-than requirement so the API docs match the new behavior.
src/openhuman/eventkit/schemas.rs-244-247 (1)

244-247: ⚠️ Potential issue | 🟡 Minor

Silent clamp masks the rpc.rs priority validation.

rpc::handle_create_reminder in src/openhuman/eventkit/rpc.rs explicitly rejects priorities >9 with "create_reminder: 'priority' must be 0–9 (got {p})", but this adapter pre-clamps via v.min(9), so priority=42 silently becomes 9 and the rpc validation never fires. Callers get no indication their input was invalid, and the adapter and rpc layer disagree on the contract.

Either drop the clamp and let rpc reject, or validate explicitly in the adapter with the same error text.

🐛 Proposed fix — let rpc validation handle the range
-        let priority = params
-            .get("priority")
-            .and_then(|v| v.as_u64())
-            .map(|v| v.min(9) as u8);
+        let priority = match params.get("priority") {
+            None | Some(Value::Null) => None,
+            Some(Value::Number(n)) => Some(
+                n.as_u64()
+                    .ok_or("create_reminder: 'priority' must be an unsigned integer 0–9")?
+                    as u8,
+            ),
+            Some(_) => return Err(
+                "create_reminder: 'priority' must be an unsigned integer 0–9".into(),
+            ),
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/eventkit/schemas.rs` around lines 244 - 247, The adapter in
src/openhuman/eventkit/schemas.rs silently clamps priority via .map(|v| v.min(9)
as u8), masking rpc::handle_create_reminder's validation; remove the clamp so
the code only parses the numeric value (e.g., keep .and_then(|v|
v.as_u64()).map(|v| v as u8)) and let rpc::handle_create_reminder enforce the
0–9 range, or alternatively perform explicit validation in the adapter that
returns the identical error string used by rpc::handle_create_reminder; update
the priority parsing in the schemas.rs snippet (the code that defines priority)
accordingly.
src/openhuman/life_capture/migrations.rs-53-68 (1)

53-68: ⚠️ Potential issue | 🟡 Minor

Remove unused run_async or document its purpose.

run_async at line 53 is not called anywhere in the codebase. The actual pattern in PersonalIndex::open() and PersonalIndex::open_in_memory() uses spawn_blocking with the sync run() function, then wraps the connection in Arc<tokio::sync::Mutex<Connection>> at the index level—not within migrations.

If run_async exists for future extensibility or as part of a public API contract, add a comment explaining the use case. Otherwise, remove it to reduce maintenance surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/migrations.rs` around lines 53 - 68, The function
run_async is unused—either remove it or document its intended public
API/extension use; if removing, delete the run_async function and any tests/docs
referencing it so migrations.rs only exposes run, otherwise add a clear doc
comment above run_async explaining why it exists (e.g., convenience async
wrapper for blocking run used by external callers) and mark it
#[allow(dead_code)] or pub(crate)/pub as appropriate, and ensure references in
PersonalIndex::open and PersonalIndex::open_in_memory remain using the current
spawn_blocking + run pattern to avoid changing behavior.
docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md-13-342 (1)

13-342: ⚠️ Potential issue | 🟡 Minor

Duplicated "Workstream A" risks confusing the agentic executor.

The file now has the new Workstream A (Task A4 at line 19) and "Workstream A (original) — kept for history" (line 60) which also defines Tasks A1–A4 with the same IDs. Given the doc header instructs workers to execute with superpowers:executing-plans, a runner iterating on unchecked - [ ] items will hit both copies of A4 and re-try work already completed in the new section. Either move the history block behind a collapsible <details>, prefix its tasks with A-old-*, or drop it entirely and keep the context in a short paragraph.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md` around lines 13 -
342, The document contains duplicated "Workstream A" sections (the new
Workstream A with Task A4 and the "Workstream A (original) — kept for history"
block) which will confuse automated executors; remove or demote the historical
copy so only one actionable Workstream A/A4 remains. Fix by either (a) moving
the entire "Workstream A (original) — kept for history" block into a
non-actionable collapsible section (e.g., wrap in a <details> summary) or (b)
renaming its task IDs to non-conflicting identifiers like A-old-*, or (c) delete
the historical duplicate entirely and condense its context into a short
paragraph; ensure the actionable Task A4 / "Task A4: Wire the resolver test into
CI and drop the ubuntu matrix row" and the checklist steps remain unchanged and
uniquely numbered.
src/openhuman/life_capture/embedder.rs-21-25 (1)

21-25: ⚠️ Potential issue | 🟡 Minor

Silent fallback drops the configured timeouts.

reqwest::Client::builder().…build().unwrap_or_else(|_| reqwest::Client::new()) discards the 30s request / 10s connect timeouts on any builder error and hands back a client with no timeouts, which can hang request threads indefinitely. Builder failures here are TLS/system-level — rare but fatal to diagnose once you hit them. Prefer bubbling the error (Result<Self, _>) or expect with a clear message; reqwest::Client::new() is not a safer fallback (it internally unwraps the same builder).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/embedder.rs` around lines 21 - 25, The current use
of reqwest::Client::builder().…build().unwrap_or_else(|_|
reqwest::Client::new()) silently drops the configured timeouts (30s request /
10s connect) on builder errors; replace this by propagating the build error or
failing with an explicit message so you don't return a client without timeouts.
Update the code that initializes the local variable http (the
reqwest::Client::builder().timeout(...).connect_timeout(...).build().unwrap_or_else(...))
to either return a Result from the enclosing constructor/function (e.g.,
Result<Self, reqwest::Error>) and use .build()? or .map_err(...) to bubble the
error, or call .build().expect("failed to build reqwest client with configured
timeouts: <context>") so failures are explicit; do not fallback to
reqwest::Client::new() which loses the timeouts. Ensure the change is applied
where http is created so configured timeouts are always honored or the error
surfaces.
src/openhuman/life_capture/index.rs-167-177 (1)

167-177: ⚠️ Potential issue | 🟡 Minor

Silent fallback to empty source string hides serialization bugs.

let source = serde_json::to_value(item.source)
    .ok()
    .and_then(|v| v.as_str().map(str::to_string))
    .unwrap_or_default();

If Source is ever extended in a way that doesn't serialize to a bare string (e.g. variant with data), this collapses silently to "". The row is then written with source = "", which won't match anything in Query.sources filtering in apply_query_filters, nor will anyone easily spot it in the DB. Hybrid search will return zero hits for an otherwise-correctly-ingested item.

Similarly, the unchecked author_json and metadata_json fallbacks (unwrap_or_else(|_| "null".into()) / "{}".into()) will round-trip through from_str on read and mask the underlying error.

🛡️ Proposed fix
-                    let source = serde_json::to_value(item.source)
-                        .ok()
-                        .and_then(|v| v.as_str().map(str::to_string))
-                        .unwrap_or_default();
-                    let author_json = item
-                        .author
-                        .as_ref()
-                        .map(|a| serde_json::to_string(a).unwrap_or_else(|_| "null".into()));
-                    let metadata_json = serde_json::to_string(&item.metadata)
-                        .unwrap_or_else(|_| "{}".into());
+                    let source = serde_json::to_value(item.source)
+                        .context("serialize item.source")?
+                        .as_str()
+                        .ok_or_else(|| anyhow::anyhow!("Source must serialize to a JSON string"))?
+                        .to_string();
+                    let author_json = item
+                        .author
+                        .as_ref()
+                        .map(serde_json::to_string)
+                        .transpose()
+                        .context("serialize item.author")?;
+                    let metadata_json = serde_json::to_string(&item.metadata)
+                        .context("serialize item.metadata")?;

Given the upsert entry point already returns anyhow::Result, propagating is the natural shape.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/index.rs` around lines 167 - 177, The current code
silently swallows serialization errors for item.source, author, and metadata
(the owned.iter_mut block building source, author_json, metadata_json), which
hides bugs and writes empty/placeholder values; update the logic to propagate
serde_json serialization errors back to the caller (the upsert entry point which
returns anyhow::Result) instead of using unwrap_or_default/unwrap_or_else —
e.g., replace the silent fallbacks with ?-style propagation or map_err
conversions so serde_json::to_value(item.source),
serde_json::to_string(&item.metadata), and serde_json::to_string(a) return
errors to the caller and abort the upsert on serialization failure.
src/openhuman/people/resolver.rs-103-175 (1)

103-175: ⚠️ Potential issue | 🟡 Minor

Seeded Person.display_name is dropped when the contact also has an email/phone.

When a contact has both an email and a display name (the common case: AddressBookContact { display_name: Some("Alice Smith"), emails: ["alice@..."], phones: [...] }), the primary handle is Handle::Email, so resolve_or_create mints a Person with:

// from resolve_or_create, Handle::Email branch
(display_name, primary_email, primary_phone) = (None, Some(email), None)

The contact's display name is then only attached as an alias row in handle_aliases — it never makes it into Person.display_name. The UI (handle_list in rpc.rs) reads p.display_name directly, so address-book-seeded contacts render without a name until something else enriches them.

The existing test seed_from_address_book_populates_store doesn't catch this because it only asserts on PersonId equality, not on display_name.

♻️ Sketch of a fix

Either:

  1. After resolve_or_create, do a best-effort store.enrich_person(pid, display_name, primary_phone) that fills in any NULL columns on the existing row; or
  2. Introduce a resolver.upsert_from_contact(&AddressBookContact) path that builds the full Person skeleton up front instead of going through the single-handle resolve_or_create.

Option 1 is the smaller change and matches the "seed is idempotent, don't clobber" contract. Also worth extending seed_from_address_book_populates_store with:

+        let alice = s.get(alice_id.unwrap()).await.unwrap().unwrap();
+        assert_eq!(alice.display_name.as_deref(), Some("Alice Smith"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/resolver.rs` around lines 103 - 175,
seed_from_address_book currently uses resolve_or_create(&primary) which
creates/returns a Person without the contact's display_name when primary is an
email/phone, so the contact name only becomes a handle alias and never populates
Person.display_name; after Successfully resolving/creating the primary (the
Ok(pid) branch in seed_from_address_book) call a best-effort enrichment on the
stored person to set display_name and primary_phone if those columns are NULL
(e.g., add a store.enrich_person(pid, display_name_opt, phone_opt) or similar),
ensuring you pass the contact.display_name and the primary phone and only update
NULL fields to preserve existing data; update the test
seed_from_address_book_populates_store to assert Person.display_name is
populated for a contact with email+display_name.
src/openhuman/people/address_book.rs-126-177 (1)

126-177: ⚠️ Potential issue | 🟡 Minor

request_access can silently hang if invoked on the main thread.

The doc comment says "Must not be called from the main thread on macOS (CNContactStore will deadlock)", but nothing actually enforces this, and SystemContactsSource::fetch_contactsfetch_via_cn_contact_storerequest_access is reachable from any async task. If the tokio runtime happens to schedule this on the main thread (e.g. from an RPC handler in an app bundle started via NSApplicationMain), rx.recv() will block forever with no timeout. Consider either:

  1. Adding a rx.recv_timeout(Duration::from_secs(30)) with a distinct AddressBookError::Timeout, or
  2. Wrapping the call path in tokio::task::spawn_blocking at the handle_refresh_address_book RPC boundary so it never runs on the async executor thread.

This would prevent a permanently stuck RPC if the invariant is accidentally violated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/address_book.rs` around lines 126 - 177, request_access
can block forever if called on the main thread because rx.recv() waits
indefinitely; update the code to avoid permanent hangs by either (A) replacing
rx.recv() with rx.recv_timeout(Duration::from_secs(30)) and return a new
AddressBookError::Timeout on timeout (update the AddressBookError enum
accordingly), or (B) ensure callers never call request_access on an async
executor thread by moving the blocking path into a background thread using
tokio::task::spawn_blocking at the RPC boundary (e.g. in
handle_refresh_address_book / SystemContactsSource::fetch_contacts ->
fetch_via_cn_contact_store) so request_access stays on a blocking thread;
reference request_access, rx.recv, AddressBookError, fetch_via_cn_contact_store,
fetch_contacts, and handle_refresh_address_book when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9bf11610-907f-4555-9e51-47feb8aa8433

📥 Commits

Reviewing files that changed from the base of the PR and between 3dc0d0d and 2452a1b.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (93)
  • .github/workflows/installer-smoke.yml
  • Cargo.toml
  • app/src-tauri/Info.plist
  • app/src-tauri/src/imessage_scanner/mod.rs
  • docs/superpowers/plans/2026-04-21-imessage-live-harness.md
  • docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md
  • docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md
  • docs/superpowers/specs/2026-04-22-life-capture-layer-design.md
  • scripts/fixtures/latest.json
  • scripts/install.sh
  • scripts/test_install.sh
  • src/core/all.rs
  • src/core/jsonrpc.rs
  • src/openhuman/agent/agents/archivist/prompt.rs
  • src/openhuman/agent/agents/code_executor/prompt.rs
  • src/openhuman/agent/agents/critic/prompt.rs
  • src/openhuman/agent/agents/integrations_agent/prompt.rs
  • src/openhuman/agent/agents/loader.rs
  • src/openhuman/agent/agents/morning_briefing/prompt.rs
  • src/openhuman/agent/agents/orchestrator/prompt.rs
  • src/openhuman/agent/agents/planner/prompt.rs
  • src/openhuman/agent/agents/researcher/prompt.rs
  • src/openhuman/agent/agents/summarizer/prompt.rs
  • src/openhuman/agent/agents/tool_maker/prompt.rs
  • src/openhuman/agent/agents/tools_agent/prompt.rs
  • src/openhuman/agent/agents/trigger_reactor/prompt.rs
  • src/openhuman/agent/agents/trigger_triage/prompt.rs
  • src/openhuman/agent/agents/welcome/prompt.rs
  • src/openhuman/agent/harness/fork_context.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent/prompts/mod.rs
  • src/openhuman/agent/prompts/tests.rs
  • src/openhuman/agent/prompts/types.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/channels/providers/telegram/channel.rs
  • src/openhuman/channels/providers/telegram/channel_tests.rs
  • src/openhuman/context/debug_dump.rs
  • src/openhuman/curated_memory/mod.rs
  • src/openhuman/curated_memory/rpc.rs
  • src/openhuman/curated_memory/runtime.rs
  • src/openhuman/curated_memory/schemas.rs
  • src/openhuman/curated_memory/store.rs
  • src/openhuman/curated_memory/types.rs
  • src/openhuman/eventkit/calendar.rs
  • src/openhuman/eventkit/migrations/0001_init.sql
  • src/openhuman/eventkit/mod.rs
  • src/openhuman/eventkit/reminders.rs
  • src/openhuman/eventkit/rpc.rs
  • src/openhuman/eventkit/runtime.rs
  • src/openhuman/eventkit/schemas.rs
  • src/openhuman/eventkit/store.rs
  • src/openhuman/eventkit/types.rs
  • src/openhuman/learning/prompt_sections.rs
  • src/openhuman/life_capture/embedder.rs
  • src/openhuman/life_capture/index.rs
  • src/openhuman/life_capture/ingest/composio.rs
  • src/openhuman/life_capture/ingest/mod.rs
  • src/openhuman/life_capture/migrations.rs
  • src/openhuman/life_capture/migrations/0001_init.sql
  • src/openhuman/life_capture/migrations/0002_vec.sql
  • src/openhuman/life_capture/mod.rs
  • src/openhuman/life_capture/quote_strip.rs
  • src/openhuman/life_capture/redact.rs
  • src/openhuman/life_capture/rpc.rs
  • src/openhuman/life_capture/runtime.rs
  • src/openhuman/life_capture/schemas.rs
  • src/openhuman/life_capture/tests/e2e.rs
  • src/openhuman/life_capture/tests/mod.rs
  • src/openhuman/life_capture/types.rs
  • src/openhuman/mod.rs
  • src/openhuman/people/address_book.rs
  • src/openhuman/people/migrations.rs
  • src/openhuman/people/migrations/0001_init.sql
  • src/openhuman/people/mod.rs
  • src/openhuman/people/resolver.rs
  • src/openhuman/people/rpc.rs
  • src/openhuman/people/schemas.rs
  • src/openhuman/people/scorer.rs
  • src/openhuman/people/store.rs
  • src/openhuman/people/tests.rs
  • src/openhuman/people/types.rs
  • src/openhuman/providers/compatible.rs
  • src/openhuman/providers/compatible_tests.rs
  • tests/agent_harness_public.rs
  • tests/composio_ingest_e2e.rs
  • tests/fixtures/life_capture/corpus.json
  • tests/imessage_ingest_e2e.rs
  • tests/life_capture_retrieval_eval.rs

Comment thread src/core/jsonrpc.rs
Comment on lines +658 to +670
// --- Curated-memory + life-capture runtime bootstrap ------------------
//
// These MUST initialise before `axum::serve` starts accepting connections
// so the very first RPC can't race the OnceCell — earlier we set them
// inside the tokio::spawn block below and the server began accepting
// before they were ready, surfacing as spurious 'not initialised' errors
// for the first few requests.
//
// Loading the config a second time here (the spawn block also calls
// `load_or_init`) is fine — it caches and is idempotent.
if let Ok(config) = crate::openhuman::config::Config::load_or_init().await {
bootstrap_curated_memory(&config.workspace_dir).await;
bootstrap_life_capture(&config.workspace_dir).await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Move domain runtime bootstrap out of the JSON-RPC transport.

jsonrpc.rs now hard-codes curated-memory and life-capture initialization details, including storage paths, env-driven embedder setup, and bridge registration. Please move this behind a domain/runtime bootstrap API or registry-level startup hook so the transport stays domain-agnostic.

As per coding guidelines, src/core/{cli.rs,jsonrpc.rs} should expose domain functionality through the controller registry and “do not add domain-specific branches or one-off transport logic in src/core/cli.rs or src/core/jsonrpc.rs”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/jsonrpc.rs` around lines 658 - 670, The JSON-RPC transport currently
inits domain runtime directly by calling
crate::openhuman::config::Config::load_or_init().await then
bootstrap_curated_memory(&config.workspace_dir).await and
bootstrap_life_capture(&config.workspace_dir).await; move that logic out of
jsonrpc.rs and behind a domain/runtime bootstrap API or registry startup hook.
Create or extend the controller/registry interface (e.g., add a
run_startup_hooks or register_runtime_bootstrap method) and have the
curated-memory and life-capture initialization (including env-driven embedder
setup and bridge registration) implemented as a domain module that registers
itself with that API; then replace the direct calls in jsonrpc.rs with a single
call to the registry startup hook (or no-op if none registered), keeping
jsonrpc.rs domain-agnostic and relying on controller/registry to perform
load_or_init and to invoke bootstrap_curated_memory and bootstrap_life_capture.

Comment thread src/openhuman/curated_memory/mod.rs
Comment thread src/openhuman/eventkit/calendar.rs
Comment thread src/openhuman/eventkit/calendar.rs
Comment thread src/openhuman/eventkit/runtime.rs
Comment on lines +154 to +170
let writer = IndexWriter::new(idx);
writer
.upsert(&mut items)
.await
.map_err(|e| format!("upsert: {e}"))?;
let canonical_id = items[0].id;
let replaced = canonical_id != requested_id;

let mut vecs = embedder
.embed_batch(&[text.as_str()])
.await
.map_err(|e| format!("embed: {e}"))?;
let vector = vecs.pop().ok_or("embedder returned no vectors")?;
writer
.upsert_vector(&canonical_id, &vector)
.await
.map_err(|e| format!("upsert_vector: {e}"))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make item upsert and vector replacement failure-atomic.

The item is committed before embedding/vector replacement. If embed_batch or upsert_vector fails, the index can contain a new/updated item with no vector, or updated text with the old vector. Pre-embed and validate dimensions before mutating, then wrap item upsert + vector replace in one IndexWriter transaction/helper.

Minimum mitigation before adding a transaction helper
+    const INDEX_VECTOR_DIM: usize = 1536;
+    if embedder.dim() != INDEX_VECTOR_DIM {
+        return Err(format!(
+            "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
+            embedder.dim()
+        ));
+    }
+
+    let mut vecs = embedder
+        .embed_batch(&[text.as_str()])
+        .await
+        .map_err(|e| format!("embed: {e}"))?;
+    let vector = vecs.pop().ok_or("embedder returned no vectors")?;
+    if vector.len() != INDEX_VECTOR_DIM {
+        return Err(format!(
+            "embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
+            vector.len()
+        ));
+    }
+
     let writer = IndexWriter::new(idx);
     writer
         .upsert(&mut items)
         .await
         .map_err(|e| format!("upsert: {e}"))?;
     let canonical_id = items[0].id;
     let replaced = canonical_id != requested_id;
 
-    let mut vecs = embedder
-        .embed_batch(&[text.as_str()])
-        .await
-        .map_err(|e| format!("embed: {e}"))?;
-    let vector = vecs.pop().ok_or("embedder returned no vectors")?;
     writer
         .upsert_vector(&canonical_id, &vector)
         .await

This still needs a transaction-backed writer API to fully guarantee atomic replacement.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let writer = IndexWriter::new(idx);
writer
.upsert(&mut items)
.await
.map_err(|e| format!("upsert: {e}"))?;
let canonical_id = items[0].id;
let replaced = canonical_id != requested_id;
let mut vecs = embedder
.embed_batch(&[text.as_str()])
.await
.map_err(|e| format!("embed: {e}"))?;
let vector = vecs.pop().ok_or("embedder returned no vectors")?;
writer
.upsert_vector(&canonical_id, &vector)
.await
.map_err(|e| format!("upsert_vector: {e}"))?;
const INDEX_VECTOR_DIM: usize = 1536;
if embedder.dim() != INDEX_VECTOR_DIM {
return Err(format!(
"embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
embedder.dim()
));
}
let mut vecs = embedder
.embed_batch(&[text.as_str()])
.await
.map_err(|e| format!("embed: {e}"))?;
let vector = vecs.pop().ok_or("embedder returned no vectors")?;
if vector.len() != INDEX_VECTOR_DIM {
return Err(format!(
"embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
vector.len()
));
}
let writer = IndexWriter::new(idx);
writer
.upsert(&mut items)
.await
.map_err(|e| format!("upsert: {e}"))?;
let canonical_id = items[0].id;
let replaced = canonical_id != requested_id;
writer
.upsert_vector(&canonical_id, &vector)
.await
.map_err(|e| format!("upsert_vector: {e}"))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/rpc.rs` around lines 154 - 170, You currently
upsert the item (IndexWriter::upsert) before calling embedder.embed_batch and
upsert_vector, which can leave the index inconsistent if embedding/vector
replacement fails; move embedder.embed_batch(&[text.as_str()]) before any
mutation, validate the returned vector dimensions against the index schema, then
perform the item upsert and vector write inside a single atomic operation on
IndexWriter (add or use a transaction/helper such as IndexWriter::transaction or
a new method like IndexWriter::upsert_with_vector that takes (&mut items,
&canonical_id, &vector) and guarantees commit/rollback semantics); ensure you
still derive canonical_id/requested_id as before but only call
writer.upsert/_upsert_vector from inside that transactional helper so failure
does not leave a half-applied item or stale vector.

Comment thread src/openhuman/life_capture/schemas.rs
Comment thread src/openhuman/people/migrations.rs
Comment thread src/openhuman/people/rpc.rs
Comment thread tests/composio_ingest_e2e.rs
@jwalin-shah jwalin-shah marked this pull request as draft April 23, 2026 07:37
@jwalin-shah jwalin-shah changed the title feat(life_capture): ingest controller + iMessage scanner bridge feat(life_capture): A1 iMessage ingest controller Apr 23, 2026
…bview NotificationSettings

- PersonId inherently defined to_string which shadowed the ToString blanket impl from Display (clippy correctness error, deny-by-default)
- webview_notifications::NotificationSettings Default returned Self { enabled: false }; replaced manual impl with derive
- Drop event title from the 'missing external identifier' log message;
  event titles are personal data that must not appear in logs.
- Propagate store::is_known() and store::upsert_event() errors instead of
  swallowing them. Failing open caused the same event to be re-emitted on
  every sync because the cache was never updated.

Addresses @coderabbitai on src/openhuman/eventkit/calendar.rs:156 and :217
Gmail message IDs and calendar event IDs are stable external identifiers
that create a durable PII trail in logs. Remove them from debug/warn log
fields; the source tag retains enough context for correlation.

Addresses @coderabbitai on src/openhuman/life_capture/ingest/composio.rs:112
…ardcode

HostedEmbedder::dim() was hardcoded to 1536 regardless of which model was
configured. Add dim as a constructor parameter so model and vector width
stay in sync. At the bootstrap site, resolve dim from OPENHUMAN_EMBEDDINGS_DIM
(override), known model names (text-embedding-3-large→3072), or default 1536.

Addresses @coderabbitai on src/openhuman/life_capture/embedder.rs:32
…ure state

Previously, the item row was committed before the embedding call. A failed
embed_batch would leave a committed row with no vector, making it unfindable
by search. Reorder to: embed first (no DB side effects) → upsert item row
→ upsert vector, so a failed embed aborts without touching the index.

Addresses @coderabbitai on src/openhuman/life_capture/rpc.rs:170
The handler accepts any JSON value for metadata but the schema declared it
as Option<String>, causing schema-driven clients to reject object payloads.
Change to Option<Json> to match the actual contract.

Addresses @coderabbitai on src/openhuman/life_capture/schemas.rs:192
The created flag in the RPC response was always true when create_if_missing=true
because resolve_or_create returned a bare PersonId with no signal about whether
a new row was minted or an existing one was found. Thread the insertion boolean
out of the resolver so handle_resolve can report the true creation state.

Update resolver tests to assert the was_created flag, and update link() and
seed_from_address_book() callers to destructure the tuple.

Addresses @coderabbitai on src/openhuman/people/rpc.rs:80
Replace manual BEGIN/COMMIT/ROLLBACK with Connection::unchecked_transaction()
which auto-rolls back on drop, matching the rusqlite idiomatic pattern and
the life_capture migration style.

Add entry/per-migration/completion debug log lines as required by coding
guidelines (debug logging on new/changed flows).

Addresses @coderabbitai on src/openhuman/people/migrations.rs:46
…ath error

Both tests mutated OPENHUMAN_WORKSPACE while sharing the CONN OnceLock,
causing a race where the second test observed the first test's DB and never
exercised its own branch.

- Add a test-local Mutex to serialize all tests in this module.
- Rewrite get_error_on_bad_path to test create_dir_all directly (rather than
  get()), because get() fast-paths via OnceLock once initialized by
  get_is_idempotent. The new version asserts is_err() rather than silently
  discarding the result.

Addresses @coderabbitai on src/openhuman/eventkit/runtime.rs:137
…composio e2e

Silently ignoring init_index/init_embedder errors allowed the test to write
through a local Arc while JSON-RPC search read a pre-existing global runtime,
turning failures into order-dependent timeouts. Propagate errors with
.expect() so the test fails immediately with a clear message.

Addresses @coderabbitai on tests/composio_ingest_e2e.rs:72
…red_controllers re-exports

Domain mod.rs must expose the canonical registry names in addition to the
domain-prefixed aliases so the controller registry wiring in all.rs can call
them uniformly across domains. Aliases are kept for backward compatibility.

Addresses @coderabbitai on src/openhuman/curated_memory/mod.rs:10
@jwalin-shah jwalin-shah marked this pull request as ready for review April 25, 2026 08:03
…untime

Both tests reuse the process-global PersonalIndex via OnceCell, so when
tokio runs them in parallel they race on `total_items`. Add a static
tokio::sync::Mutex they both acquire for the duration of the test, and
switch the absolute count assertion in the idempotency test to a delta
against a baseline captured under the lock. Same pattern already used
elsewhere (eventkit env-var lock, composio fail-fast guard).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
src/openhuman/agent/debug/mod.rs (1)

395-395: Preserve prompt-dump parity by threading a real curated snapshot

Hardcoding curated_snapshot: None can make dump output diverge from live first-turn prompt bytes when curated memory is active. Given this module’s byte-identical contract, use the same best-effort snapshot strategy here and pass as_deref() into PromptContext.

♻️ Proposed adjustment
+    let curated_snapshot = match crate::openhuman::curated_memory::runtime::get() {
+        Ok(rt) => match crate::openhuman::curated_memory::snapshot_pair(&rt.memory, &rt.user).await
+        {
+            Ok(snap) => Some(std::sync::Arc::new(snap)),
+            Err(_) => None,
+        },
+        Err(_) => None,
+    };
+
     let ctx = PromptContext {
         workspace_dir: agent.workspace_dir(),
         model_name: &model_name,
@@
         connected_identities_md: crate::openhuman::agent::prompts::render_connected_identities(),
         include_profile: !definition.omit_profile,
         include_memory_md: !definition.omit_memory_md,
-        curated_snapshot: None,
+        curated_snapshot: curated_snapshot.as_deref(),
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/debug/mod.rs` at line 395, The code currently hardcodes
curated_snapshot: None which breaks byte-identical dumps when curated memory is
enabled; replace that None by producing and threading the same best-effort
curated snapshot used elsewhere and pass its Option<&[u8]> into PromptContext
via curated_snapshot.as_deref(). Locate where curated_snapshot is set and where
PromptContext is constructed, compute the snapshot using the existing snapshot
strategy (same helper/logic used in the live path), store it in curated_snapshot
(Option<Vec<u8>>), and call curated_snapshot.as_deref() when creating the
PromptContext.
src/openhuman/people/tests.rs (1)

60-68: Drop the unused Utc::now() call.

let _now = Utc::now(); is unrelated to UUID round-tripping and adds nothing to the test. Either drop it or move it into a separate test that actually exercises timestamp-related behavior.

♻️ Proposed cleanup
 #[test]
 fn person_id_uuid_format() {
     let id = PersonId::new();
     // Round-trips through a string.
     let s = id.to_string();
     let parsed: uuid::Uuid = s.parse().unwrap();
     assert_eq!(parsed, id.0);
-    let _now = Utc::now();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/tests.rs` around lines 60 - 68, The test function
person_id_uuid_format contains an unused call to Utc::now() that should be
removed; update the test by deleting the line let _now = Utc::now(); (or move it
into a new timestamp-focused test if you need to exercise time behavior) so the
test only verifies PersonId::new(), id.to_string(), and parsing back to
uuid::Uuid (referencing PersonId and the person_id_uuid_format test).
src/openhuman/people/rpc.rs (1)

16-19: Add entry/exit logging to the RPC handlers.

Per coding guidelines, Rust code should add tracing/log at debug/trace for entry/exit points, branch decisions, and external calls with grep-friendly [domain] prefixes. handle_refresh_address_book already does this; handle_list, handle_resolve, and handle_score are silent. They each issue DB calls (store.list, interactions_for, lookup, insert_person) which are useful to trace by person_id / handle kind when diagnosing production issues.

♻️ Suggested logging
 pub async fn handle_list(store: &PeopleStore, limit: usize) -> Result<RpcOutcome<Value>, String> {
     let limit = limit.clamp(1, 500);
+    tracing::debug!("[people::rpc] handle_list entry limit={limit}");
     let people = store.list().await.map_err(|e| format!("list: {e}"))?;
     let now = Utc::now();
@@
+    tracing::debug!(
+        "[people::rpc] handle_list returning {} people",
+        people_json.len()
+    );
     Ok(RpcOutcome::new(json!({ "people": people_json }), vec![]))
 }
@@
 ) -> Result<RpcOutcome<Value>, String> {
+    tracing::debug!(
+        "[people::rpc] handle_resolve entry kind={} create_if_missing={create_if_missing}",
+        handle.as_key().0
+    );
     let resolver = HandleResolver::new(store);
@@
 ) -> Result<RpcOutcome<Value>, String> {
+    tracing::debug!("[people::rpc] handle_score entry person_id={person_id}");
     let interactions = store

As per coding guidelines: "Use log / tracing at debug / trace level in Rust; include stable grep-friendly prefixes ([domain], [rpc], [ui-flow]) and correlation fields (request IDs, method names, entity IDs)".

Also applies to: 62-67, 121-124

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/rpc.rs` around lines 16 - 19, Add entry/exit and
key-event tracing to the RPC handlers handle_list, handle_resolve, and
handle_score similar to handle_refresh_address_book: emit debug/trace logs with
grep-friendly prefixes like "[rpc]" or "[people]" at function entry and exit,
log branch decisions and external calls (store.list, interactions_for, lookup,
insert_person) including correlation fields (request id or method name) and
entity identifiers (person_id, handle/kind) so the DB/lookup calls and outcomes
are visible for diagnostics; follow existing tracing style used in
handle_refresh_address_book for level and message format.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 865-868: The doc comment "Initializes long-lived socket/event-bus
infrastructure." is now incorrectly attached to bootstrap_curated_memory; move
that leading doc comment (and any subsequent related lines) from above async fn
bootstrap_curated_memory to directly above the async fn bootstrap_skill_runtime
so bootstrap_skill_runtime regains its intended documentation and
bootstrap_curated_memory has only its correct docs or none.

In `@src/openhuman/agent/harness/session/types.rs`:
- Around line 126-134: The doc comment for the Session-scoped `curated_snapshot`
is out of sync with implementation: it states the snapshot is captured "when the
agent is built" but the code actually initializes `curated_snapshot` to None in
the builder and captures it on first turn via `ensure_curated_snapshot`; update
the rustdoc to describe this lifecycle (None in builder, captured lazily on
first turn by `ensure_curated_snapshot`, forwarded into
`PromptContext::curated_snapshot` each turn so runtime writes apply to
subsequent sessions) and mention that it is None when the curated-memory runtime
(`crate::openhuman::curated_memory::runtime`) isn't initialized.

In `@src/openhuman/life_capture/ingest/composio.rs`:
- Around line 415-422: The helper fresh_bridge() currently calls
std::mem::forget(dir) which leaks the TempDir; change fresh_bridge() to return
ownership of the TempDir (e.g., return (LifeCaptureComposioBridge,
tempfile::TempDir)) or hold the TempDir inside the LifeCaptureComposioBridge
wrapper so it lives until tests drop it; update all test call sites that do let
bridge = fresh_bridge().await; to destructure the tuple (let (bridge, _dir) =
fresh_bridge().await;) or accept the wrapper that owns the TempDir so the
TempDir is dropped at test end, and ensure PersonalIndex::open(&db).await and
the Arc<dyn Embedder> (TestEmbedder) usage remain unchanged.
- Around line 601-620: The test bridge_has_stable_name_and_domain currently
contains a tautological assertion (assert_eq!(name, name)) and never calls
LifeCaptureComposioBridge::name() or domains(), nor uses the Stub Embedder;
replace the bogus assertion by constructing a real bridge (use the existing
fresh_bridge helper or the constructor that returns a PersonalIndex handle) and
call LifeCaptureComposioBridge::name() and LifeCaptureComposioBridge::domains()
to assert expected values, or if you prefer to keep it lightweight convert the
test to #[tokio::test], call fresh_bridge(...) to get a bridge instance, then
assert_eq!(bridge.name(), "life_capture::composio_bridge") and
assert!(bridge.domains().contains(...)) and remove the unused Stub/Embedder impl
if you choose the fresh_bridge approach.

In `@src/openhuman/life_capture/rpc.rs`:
- Around line 76-94: The code hardcodes INDEX_VECTOR_DIM = 1536 in handle_search
causing mismatches with configurable embedders; expose the real index vector
dimension from the PersonalIndex (or lift INDEX_VECTOR_DIM into a single shared
constant used by bootstrap_life_capture, PersonalIndex, handle_search and
handle_ingest), then update handle_search to compare embedder.dim() against that
shared/exposed dimension and validate query_vec.len() the same way; also add the
same dimension checks to handle_ingest before calling upsert_vector (validate
embedder.dim() and the produced vector length and return a clear RPC error on
mismatch) and ensure bootstrap_life_capture uses the same shared dimension when
constructing/configuring the embedder.

In `@src/openhuman/people/address_book.rs`:
- Around line 129-169: The CNContactStore FFI calls (e.g., request_access,
enumerateContactsWithFetchRequest_error_usingBlock inside
imp::fetch_via_cn_contact_store) block a Tokio worker; wrap these synchronous
calls in tokio::task::spawn_blocking. Either (A) move the spawn_blocking into
the SystemContactsSource implementation so ContactsSource::fetch_contacts calls
imp::fetch_via_cn_contact_store inside spawn_blocking and returns the awaited
result, or (B) keep fetch_contacts synchronous but change callers (e.g.,
HandleResolver::seed_from_address_book / read_with) to call
tokio::task::spawn_blocking(|| source.fetch_contacts()) and propagate errors
(map join errors to AddressBookError::Other). Ensure request_access remains
inside the blocking closure so rx.recv() never runs on a Tokio worker thread.

In `@src/openhuman/people/resolver.rs`:
- Around line 41-78: resolve_or_create has a TOCTOU: it calls
PeopleStore::lookup then creates a new PersonId and calls
PeopleStore::insert_person, which can race and produce orphan Person rows when
two callers concurrently resolve the same Handle; fix by moving the
read-or-create into an atomic transactional operation inside the store
(preferred): add a new PeopleStore method like lookup_or_insert_person(handle,
person_factory) that within a single transaction SELECTs from handle_aliases
and, if missing, INSERTs into people and INSERTs the handle_aliases (or returns
the existing person_id), then update resolve_or_create to call
lookup_or_insert_person instead of separate lookup + insert_person;
alternatively (if you must keep insert_person), after insert_person re-query
lookup(canonical) and if the returned PersonId != the one you inserted, delete
the orphan people row and return the winner's id.

In `@src/openhuman/people/rpc.rs`:
- Around line 16-58: handle_list currently issues an
interactions_for(p.id).await inside a sequential loop which causes an N+1 query
pattern and serial spawn_blocking calls; fix by either (A) adding an aggregate
method on PeopleStore (suggest names: list_with_scores(limit) or
interaction_summaries()) that runs a single SQL query to return interaction
aggregates needed by score() and then compute score() in-memory for each person
before sorting and truncating by limit, or (B) minimally parallelizing the
per-person fetches by collecting futures for store.interactions_for(p.id) and
awaiting them with futures::future::join_all so the DB calls run concurrently,
and ensure you apply limit before expensive work by sorting/truncating after
computing scores. Use the function name handle_list and the method
interactions_for as anchors when making the change.

---

Nitpick comments:
In `@src/openhuman/agent/debug/mod.rs`:
- Line 395: The code currently hardcodes curated_snapshot: None which breaks
byte-identical dumps when curated memory is enabled; replace that None by
producing and threading the same best-effort curated snapshot used elsewhere and
pass its Option<&[u8]> into PromptContext via curated_snapshot.as_deref().
Locate where curated_snapshot is set and where PromptContext is constructed,
compute the snapshot using the existing snapshot strategy (same helper/logic
used in the live path), store it in curated_snapshot (Option<Vec<u8>>), and call
curated_snapshot.as_deref() when creating the PromptContext.

In `@src/openhuman/people/rpc.rs`:
- Around line 16-19: Add entry/exit and key-event tracing to the RPC handlers
handle_list, handle_resolve, and handle_score similar to
handle_refresh_address_book: emit debug/trace logs with grep-friendly prefixes
like "[rpc]" or "[people]" at function entry and exit, log branch decisions and
external calls (store.list, interactions_for, lookup, insert_person) including
correlation fields (request id or method name) and entity identifiers
(person_id, handle/kind) so the DB/lookup calls and outcomes are visible for
diagnostics; follow existing tracing style used in handle_refresh_address_book
for level and message format.

In `@src/openhuman/people/tests.rs`:
- Around line 60-68: The test function person_id_uuid_format contains an unused
call to Utc::now() that should be removed; update the test by deleting the line
let _now = Utc::now(); (or move it into a new timestamp-focused test if you need
to exercise time behavior) so the test only verifies PersonId::new(),
id.to_string(), and parsing back to uuid::Uuid (referencing PersonId and the
person_id_uuid_format test).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53db1672-b237-409a-922b-375440e06437

📥 Commits

Reviewing files that changed from the base of the PR and between 2452a1b and 26664ca.

📒 Files selected for processing (29)
  • src/core/all.rs
  • src/core/jsonrpc.rs
  • src/openhuman/agent/debug/mod.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/curated_memory/mod.rs
  • src/openhuman/eventkit/calendar.rs
  • src/openhuman/eventkit/runtime.rs
  • src/openhuman/learning/prompt_sections.rs
  • src/openhuman/life_capture/embedder.rs
  • src/openhuman/life_capture/ingest/composio.rs
  • src/openhuman/life_capture/rpc.rs
  • src/openhuman/life_capture/schemas.rs
  • src/openhuman/mod.rs
  • src/openhuman/people/address_book.rs
  • src/openhuman/people/migrations.rs
  • src/openhuman/people/resolver.rs
  • src/openhuman/people/rpc.rs
  • src/openhuman/people/schemas.rs
  • src/openhuman/people/scorer.rs
  • src/openhuman/people/tests.rs
  • src/openhuman/people/types.rs
  • src/openhuman/webview_notifications/types.rs
  • tests/agent_harness_public.rs
  • tests/composio_ingest_e2e.rs
  • tests/imessage_ingest_e2e.rs
✅ Files skipped from review due to trivial changes (2)
  • src/openhuman/learning/prompt_sections.rs
  • src/openhuman/curated_memory/mod.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/agent_harness_public.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/mod.rs
  • src/core/all.rs
  • src/openhuman/eventkit/calendar.rs
  • src/openhuman/eventkit/runtime.rs

Comment thread src/core/jsonrpc.rs
Comment on lines 865 to +868
/// Initializes long-lived socket/event-bus infrastructure.
/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale doc comment now attached to bootstrap_curated_memory.

Line 865's /// Initializes long-lived socket/event-bus infrastructure. was originally the rustdoc for bootstrap_skill_runtime, but inserting the two new helpers above it leaves this line as a leading doc comment on bootstrap_curated_memory (the next item Rust sees), so the rendered docs/intellisense for bootstrap_curated_memory will start with an unrelated sentence about socket/event-bus, and bootstrap_skill_runtime is left undocumented.

📝 Proposed fix — move the doc comment back to `bootstrap_skill_runtime`
-/// Initializes long-lived socket/event-bus infrastructure.
 /// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
 /// curated-memory runtime singleton. Idempotent — second-init is a no-op.
 async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {

…and add it just above bootstrap_skill_runtime (around line 984):

+/// Initializes long-lived socket/event-bus infrastructure.
 pub async fn bootstrap_skill_runtime() {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Initializes long-lived socket/event-bus infrastructure.
/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {
/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/jsonrpc.rs` around lines 865 - 868, The doc comment "Initializes
long-lived socket/event-bus infrastructure." is now incorrectly attached to
bootstrap_curated_memory; move that leading doc comment (and any subsequent
related lines) from above async fn bootstrap_curated_memory to directly above
the async fn bootstrap_skill_runtime so bootstrap_skill_runtime regains its
intended documentation and bootstrap_curated_memory has only its correct docs or
none.

Comment on lines +126 to +134
/// Session-scoped snapshot of the curated-memory pair
/// (`MEMORY.md` + `USER.md`) read once from the
/// [`crate::openhuman::curated_memory::runtime`] singleton when the
/// agent is built. Forwarded into [`PromptContext::curated_snapshot`]
/// every turn so the rendered prompt bytes stay frozen for the
/// session's KV-cache prefix, while runtime writes via
/// `curated_memory.add/replace/remove` land on the NEXT session.
/// `None` when the curated-memory runtime isn't initialised (unit
/// tests, older embeds) — the renderer falls back to workspace files.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doc timing mismatch for curated_snapshot lifecycle

Line 129 says the snapshot is read when the agent is built, but current behavior captures it on first turn (ensure_curated_snapshot), with builder initialization at None. Please update this doc block to reflect the real lifecycle.

📝 Suggested doc correction
-    /// [`crate::openhuman::curated_memory::runtime`] singleton when the
-    /// agent is built. Forwarded into [`PromptContext::curated_snapshot`]
+    /// [`crate::openhuman::curated_memory::runtime`] singleton on the
+    /// first turn (`ensure_curated_snapshot`). Forwarded into [`PromptContext::curated_snapshot`]

As per coding guidelines, "When adding or changing domain behavior, ship matching documentation (rustdoc, code comments, or updates to AGENTS.md / architecture docs) before handoff."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Session-scoped snapshot of the curated-memory pair
/// (`MEMORY.md` + `USER.md`) read once from the
/// [`crate::openhuman::curated_memory::runtime`] singleton when the
/// agent is built. Forwarded into [`PromptContext::curated_snapshot`]
/// every turn so the rendered prompt bytes stay frozen for the
/// session's KV-cache prefix, while runtime writes via
/// `curated_memory.add/replace/remove` land on the NEXT session.
/// `None` when the curated-memory runtime isn't initialised (unit
/// tests, older embeds) — the renderer falls back to workspace files.
/// Session-scoped snapshot of the curated-memory pair
/// (`MEMORY.md` + `USER.md`) read once from the
/// [`crate::openhuman::curated_memory::runtime`] singleton on the
/// first turn (`ensure_curated_snapshot`). Forwarded into [`PromptContext::curated_snapshot`]
/// every turn so the rendered prompt bytes stay frozen for the
/// session's KV-cache prefix, while runtime writes via
/// `curated_memory.add/replace/remove` land on the NEXT session.
/// `None` when the curated-memory runtime isn't initialised (unit
/// tests, older embeds) — the renderer falls back to workspace files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/types.rs` around lines 126 - 134, The doc
comment for the Session-scoped `curated_snapshot` is out of sync with
implementation: it states the snapshot is captured "when the agent is built" but
the code actually initializes `curated_snapshot` to None in the builder and
captures it on first turn via `ensure_curated_snapshot`; update the rustdoc to
describe this lifecycle (None in builder, captured lazily on first turn by
`ensure_curated_snapshot`, forwarded into `PromptContext::curated_snapshot` each
turn so runtime writes apply to subsequent sessions) and mention that it is None
when the curated-memory runtime (`crate::openhuman::curated_memory::runtime`)
isn't initialized.

Comment on lines +415 to +422
async fn fresh_bridge() -> LifeCaptureComposioBridge {
let dir = tempdir().expect("tempdir");
let db = dir.path().join("lc.db");
std::mem::forget(dir);
let idx = Arc::new(PersonalIndex::open(&db).await.expect("open"));
let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder);
LifeCaptureComposioBridge::new(idx, emb)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

std::mem::forget(dir) leaks the temp directory on every test run.

tempfile::TempDir cleans up on drop; mem::forget permanently leaks the directory at dir.path() to disk. With 4 tests using fresh_bridge, every cargo test run leaves 4 stray directories under the system temp root. Return the TempDir from fresh_bridge (or store it in the bridge wrapper) so it is dropped at end of test instead.

🔧 Proposed fix
-    async fn fresh_bridge() -> LifeCaptureComposioBridge {
-        let dir = tempdir().expect("tempdir");
-        let db = dir.path().join("lc.db");
-        std::mem::forget(dir);
-        let idx = Arc::new(PersonalIndex::open(&db).await.expect("open"));
-        let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder);
-        LifeCaptureComposioBridge::new(idx, emb)
-    }
+    async fn fresh_bridge() -> (LifeCaptureComposioBridge, tempfile::TempDir) {
+        let dir = tempdir().expect("tempdir");
+        let db = dir.path().join("lc.db");
+        let idx = Arc::new(PersonalIndex::open(&db).await.expect("open"));
+        let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder);
+        (LifeCaptureComposioBridge::new(idx, emb), dir)
+    }

Then update each let bridge = fresh_bridge().await; to let (bridge, _dir) = fresh_bridge().await;.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/ingest/composio.rs` around lines 415 - 422, The
helper fresh_bridge() currently calls std::mem::forget(dir) which leaks the
TempDir; change fresh_bridge() to return ownership of the TempDir (e.g., return
(LifeCaptureComposioBridge, tempfile::TempDir)) or hold the TempDir inside the
LifeCaptureComposioBridge wrapper so it lives until tests drop it; update all
test call sites that do let bridge = fresh_bridge().await; to destructure the
tuple (let (bridge, _dir) = fresh_bridge().await;) or accept the wrapper that
owns the TempDir so the TempDir is dropped at test end, and ensure
PersonalIndex::open(&db).await and the Arc<dyn Embedder> (TestEmbedder) usage
remain unchanged.

Comment on lines +601 to +620
#[test]
fn bridge_has_stable_name_and_domain() {
// Construct with stubs just to exercise the trait methods.
struct Stub;
#[async_trait]
impl Embedder for Stub {
async fn embed_batch(&self, _t: &[&str]) -> anyhow::Result<Vec<Vec<f32>>> {
Ok(vec![])
}
fn dim(&self) -> usize {
1536
}
}
// Lightweight non-init path — we don't actually need a live index
// for the trait introspection bits, but we do need a real
// PersonalIndex handle. Reuse the fresh_bridge pattern via a
// smoke check in the runtime test.
let name = "life_capture::composio_bridge";
assert_eq!(name, name); // shape guard for the name constant
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tautological assertion provides no coverage.

assert_eq!(name, name) at line 619 is always true and does not exercise LifeCaptureComposioBridge::name() or domains(). Either construct a real bridge (e.g. via fresh_bridge) and assert against the trait methods, or delete the test — the surrounding Stub Embedder impl is also unused.

🔧 Proposed fix
     #[test]
     fn bridge_has_stable_name_and_domain() {
-        // Construct with stubs just to exercise the trait methods.
-        struct Stub;
-        #[async_trait]
-        impl Embedder for Stub {
-            async fn embed_batch(&self, _t: &[&str]) -> anyhow::Result<Vec<Vec<f32>>> {
-                Ok(vec![])
-            }
-            fn dim(&self) -> usize {
-                1536
-            }
-        }
-        // Lightweight non-init path — we don't actually need a live index
-        // for the trait introspection bits, but we do need a real
-        // PersonalIndex handle. Reuse the fresh_bridge pattern via a
-        // smoke check in the runtime test.
-        let name = "life_capture::composio_bridge";
-        assert_eq!(name, name); // shape guard for the name constant
+        let bridge = fresh_bridge().await;
+        assert_eq!(bridge.name(), "life_capture::composio_bridge");
+        assert_eq!(bridge.domains(), Some(&["composio"][..]));
     }

(Make the test #[tokio::test] if you switch to fresh_bridge.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/ingest/composio.rs` around lines 601 - 620, The
test bridge_has_stable_name_and_domain currently contains a tautological
assertion (assert_eq!(name, name)) and never calls
LifeCaptureComposioBridge::name() or domains(), nor uses the Stub Embedder;
replace the bogus assertion by constructing a real bridge (use the existing
fresh_bridge helper or the constructor that returns a PersonalIndex handle) and
call LifeCaptureComposioBridge::name() and LifeCaptureComposioBridge::domains()
to assert expected values, or if you prefer to keep it lightweight convert the
test to #[tokio::test], call fresh_bridge(...) to get a bridge instance, then
assert_eq!(bridge.name(), "life_capture::composio_bridge") and
assert!(bridge.domains().contains(...)) and remove the unused Stub/Embedder impl
if you choose the fresh_bridge approach.

Comment on lines +76 to +94
const INDEX_VECTOR_DIM: usize = 1536;
if embedder.dim() != INDEX_VECTOR_DIM {
return Err(format!(
"embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
embedder.dim()
));
}

let mut vecs = embedder
.embed_batch(&[text.as_str()])
.await
.map_err(|e| format!("embed: {e}"))?;
let query_vec = vecs.pop().ok_or("embedder returned no vectors")?;
if query_vec.len() != INDEX_VECTOR_DIM {
return Err(format!(
"embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
query_vec.len()
));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether PersonalIndex's sqlite-vec column dim is fixed at 1536, and whether
# any path other than rpc.rs hardcodes a vector dim.
rg -nP --type=rust -C3 '\b(1536|3072|vector_dim|VECTOR_DIM|item_vectors)\b' -g '!**/tests/**'
ast-grep --pattern 'CREATE VIRTUAL TABLE $$$'

Repository: tinyhumansai/openhuman

Length of output: 19658


🏁 Script executed:

# Check handle_ingest function in rpc.rs around lines 123-178
sed -n '120,185p' src/openhuman/life_capture/rpc.rs

Repository: tinyhumansai/openhuman

Length of output: 2271


🏁 Script executed:

# Search for CREATE VIRTUAL TABLE and schema definitions
rg -n 'CREATE VIRTUAL TABLE|CREATE TABLE' --type=rust src/openhuman/life_capture/

Repository: tinyhumansai/openhuman

Length of output: 173


🏁 Script executed:

# Check if PersonalIndex has a vector_dim or similar method
rg -n 'fn.*dim|pub fn' src/openhuman/life_capture/index.rs | head -20

Repository: tinyhumansai/openhuman

Length of output: 166


🏁 Script executed:

# Look for the actual table schema/migration that defines item_vectors
find src/openhuman/life_capture -name "*.rs" -exec grep -l "item_vectors\|CREATE" {} \;

Repository: tinyhumansai/openhuman

Length of output: 143


🏁 Script executed:

# Check migrations.rs more thoroughly for table definitions
cat src/openhuman/life_capture/migrations.rs

Repository: tinyhumansai/openhuman

Length of output: 4456


🏁 Script executed:

# Look for PersonalIndex definition and any dimension-related methods
grep -n "impl PersonalIndex\|fn.*dim\|pub fn\|struct PersonalIndex" src/openhuman/life_capture/index.rs | head -30

Repository: tinyhumansai/openhuman

Length of output: 220


🏁 Script executed:

# Check the migration SQL files - specifically 0002_vec.sql for the item_vectors schema
find src/openhuman/life_capture/migrations -name "*.sql" -exec cat {} \;

Repository: tinyhumansai/openhuman

Length of output: 2588


🏁 Script executed:

# Also check if there are any other methods in PersonalIndex that might expose vector_dim
sed -n '61,200p' src/openhuman/life_capture/index.rs | grep -n "pub fn\|fn "

Repository: tinyhumansai/openhuman

Length of output: 269


Hardcoded INDEX_VECTOR_DIM = 1536 is asymmetric and tightly coupled to the index schema.

Two concerns:

  1. Index dim is hardcoded in the schema but embedder dim is configurable. Per src/core/jsonrpc.rs:944-956, bootstrap reads OPENHUMAN_EMBEDDINGS_MODEL and OPENHUMAN_EMBEDDINGS_DIM to construct the embedder with per-model dimensions (e.g., 3072 for text-embedding-3-large). With that configuration, handle_search will hard-fail with "embedder dim 3072 does not match index dim 1536"—a correct rejection, since the item_vectors schema (migration 0002_vec.sql) is permanently fixed at float[1536]. Either drop the env-driven dim configurability in bootstrap_life_capture, or have PersonalIndex expose its actual schema dim and check against that instead of a hardcoded constant.

  2. handle_ingest lacks the dimension checks that handle_search enforces. handle_search validates embedder.dim() == INDEX_VECTOR_DIM and vector.len() == INDEX_VECTOR_DIM (lines 76–87), but handle_ingest does not—it embeds and writes the vector unconditionally (lines 149–178). If a misconfigured embedder produces 3072-dim vectors, ingest will pass them to upsert_vector, and sqlite-vec will surface a low-level error rather than the clean RPC message search produces.

🔧 Suggested change for `handle_ingest`
 ) -> Result<RpcOutcome<Value>, String> {
     if external_id.trim().is_empty() {
         return Err("external_id must not be empty".into());
     }
     if text.trim().is_empty() {
         return Err("text must not be empty".into());
     }
     let ts = DateTime::<Utc>::from_timestamp(ts, 0)
         .ok_or_else(|| format!("invalid ts (out of range): {ts}"))?;

+    const INDEX_VECTOR_DIM: usize = 1536;
+    if embedder.dim() != INDEX_VECTOR_DIM {
+        return Err(format!(
+            "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
+            embedder.dim()
+        ));
+    }
+
     let mut vecs = embedder
         .embed_batch(&[text.as_str()])
         .await
         .map_err(|e| format!("embed: {e}"))?;
     let vector = vecs.pop().ok_or("embedder returned no vectors")?;
+    if vector.len() != INDEX_VECTOR_DIM {
+        return Err(format!(
+            "embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
+            vector.len()
+        ));
+    }

Better still, lift INDEX_VECTOR_DIM to a single shared constant (or expose it via PersonalIndex) so search, ingest, and bootstrap cannot drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/life_capture/rpc.rs` around lines 76 - 94, The code hardcodes
INDEX_VECTOR_DIM = 1536 in handle_search causing mismatches with configurable
embedders; expose the real index vector dimension from the PersonalIndex (or
lift INDEX_VECTOR_DIM into a single shared constant used by
bootstrap_life_capture, PersonalIndex, handle_search and handle_ingest), then
update handle_search to compare embedder.dim() against that shared/exposed
dimension and validate query_vec.len() the same way; also add the same dimension
checks to handle_ingest before calling upsert_vector (validate embedder.dim()
and the produced vector length and return a clear RPC error on mismatch) and
ensure bootstrap_life_capture uses the same shared dimension when
constructing/configuring the embedder.

Comment on lines +129 to +169
fn request_access(store: &CNContactStore) -> Result<(), AddressBookError> {
unsafe {
let status = CNContactStore::authorizationStatusForEntityType(CNEntityType::Contacts);
match status {
CNAuthorizationStatus::Authorized | CNAuthorizationStatus::Limited => {
tracing::debug!("[people::address_book] contacts access already authorized");
return Ok(());
}
CNAuthorizationStatus::Denied | CNAuthorizationStatus::Restricted => {
return Err(AddressBookError::PermissionDenied);
}
_ => {
tracing::debug!(
"[people::address_book] requesting contacts access (status={status:?})"
);
}
}

let (tx, rx) = std::sync::mpsc::channel::<Result<(), AddressBookError>>();
let tx = Arc::new(Mutex::new(Some(tx)));
let tx_clone = Arc::clone(&tx);

let block = RcBlock::new(move |granted: Bool, _error: *mut NSError| {
let mut slot = tx_clone.lock().unwrap();
if let Some(sender) = slot.take() {
let result = if granted.as_bool() {
Ok(())
} else {
Err(AddressBookError::PermissionDenied)
};
let _ = sender.send(result);
}
});

store.requestAccessForEntityType_completionHandler(CNEntityType::Contacts, &*block);

rx.recv().map_err(|_| {
AddressBookError::Other("contacts permission callback never fired".into())
})?
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Synchronous blocking on the Tokio worker — wrap CNContactStore calls in spawn_blocking.

request_access calls rx.recv() on a std::sync::mpsc channel, which is a blocking, non-async wait. It can sit there for many seconds (or indefinitely) while the macOS TCC dialog is up. enumerateContactsWithFetchRequest_error_usingBlock is also a synchronous blocking call. Both run on whatever thread invokes fetch_via_cn_contact_store.

Because HandleResolver::seed_from_address_book calls address_book::read_with(source)? directly (no spawn_blocking) and read_with calls source.fetch_contacts() synchronously, the entire CNContactStore round‑trip executes on a Tokio worker thread inside the handle_refresh_address_book future. With the default multi-thread runtime that means one worker is parked until the user dismisses the TCC dialog; with current_thread it would deadlock the entire server.

Best fix is to push the FFI work onto a blocking thread:

impl ContactsSource for SystemContactsSource {
    fn fetch_contacts(&self) -> Result<Vec<AddressBookContact>, AddressBookError> {
        // Block this thread; callers must invoke from spawn_blocking or a sync context.
        imp::fetch_via_cn_contact_store()
    }
}

…and update HandleResolver::seed_from_address_book (or add an async variant on ContactsSource) to wrap the call:

let source = source; // by ref isn't Send; restructure as needed
let contacts = tokio::task::spawn_blocking(move || source.fetch_contacts())
    .await
    .map_err(|e| AddressBookError::Other(format!("join: {e}")))??;

Either move the spawn_blocking into read_with/fetch_contacts, or make ContactsSource::fetch_contacts async and have SystemContactsSource do the spawn_blocking internally.

Also applies to: 171-269

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/address_book.rs` around lines 129 - 169, The
CNContactStore FFI calls (e.g., request_access,
enumerateContactsWithFetchRequest_error_usingBlock inside
imp::fetch_via_cn_contact_store) block a Tokio worker; wrap these synchronous
calls in tokio::task::spawn_blocking. Either (A) move the spawn_blocking into
the SystemContactsSource implementation so ContactsSource::fetch_contacts calls
imp::fetch_via_cn_contact_store inside spawn_blocking and returns the awaited
result, or (B) keep fetch_contacts synchronous but change callers (e.g.,
HandleResolver::seed_from_address_book / read_with) to call
tokio::task::spawn_blocking(|| source.fetch_contacts()) and propagate errors
(map join errors to AddressBookError::Other). Ensure request_access remains
inside the blocking closure so rx.recv() never runs on a Tokio worker thread.

Comment on lines +41 to +78
pub async fn resolve_or_create(&self, handle: &Handle) -> Result<(PersonId, bool), String> {
let canonical = handle.canonicalize();
if let Some(id) = self
.store
.lookup(&canonical)
.await
.map_err(|e| format!("lookup: {e}"))?
{
return Ok((id, false));
}
let id = PersonId::new();
let (display_name, primary_email, primary_phone) = match &canonical {
Handle::DisplayName(s) => (Some(s.clone()), None, None),
Handle::Email(s) => (None, Some(s.clone()), None),
Handle::IMessage(s) => {
if s.contains('@') {
(None, Some(s.clone()), None)
} else {
(None, None, Some(s.clone()))
}
}
};
let now = Utc::now();
let person = Person {
id,
display_name,
primary_email,
primary_phone,
handles: vec![canonical.clone()],
created_at: now,
updated_at: now,
};
self.store
.insert_person(&person, &[canonical])
.await
.map_err(|e| format!("insert_person: {e}"))?;
Ok((id, true))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TOCTOU between lookup and insert_person can produce orphan Person rows.

resolve_or_create does a lookup, then mints a fresh PersonId, then calls insert_person (which inserts into people and INSERT OR IGNORE into handle_aliases). If two callers resolve the same canonical handle concurrently, both pass lookup, both generate distinct UUIDs, and both succeed in inserting their Person row. The first to commit wins the handle_aliases(kind, value) row; the second's alias is silently ignored, leaving its Person row orphaned (no handle pointing to it) — and both calls return was_created = true.

Practically, the seed_from_address_book path is sequential per call so this is unlikely there, but handle_resolve is a public RPC and concurrent inbound requests for the same handle (e.g., the iMessage scanner from PR #818 + a UI lookup) can race.

Two ways to make this atomic without redesigning the API:

  1. Push the read-or-create into a single transaction inside PeopleStore, e.g., lookup_or_insert_person(handle, person_factory) that does SELECT person_id FROM handle_aliases ... and INSERT INTO people ...; INSERT INTO handle_aliases ... inside one transaction().
  2. Keep insert_person as-is, but after it returns, re-query lookup(canonical) and, if the returned PersonId differs from the one we just minted, delete our orphan people row (or never insert it and re-use the winner's id).

Option 1 is cleaner and avoids the cleanup path entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/resolver.rs` around lines 41 - 78, resolve_or_create has
a TOCTOU: it calls PeopleStore::lookup then creates a new PersonId and calls
PeopleStore::insert_person, which can race and produce orphan Person rows when
two callers concurrently resolve the same Handle; fix by moving the
read-or-create into an atomic transactional operation inside the store
(preferred): add a new PeopleStore method like lookup_or_insert_person(handle,
person_factory) that within a single transaction SELECTs from handle_aliases
and, if missing, INSERTs into people and INSERTs the handle_aliases (or returns
the existing person_id), then update resolve_or_create to call
lookup_or_insert_person instead of separate lookup + insert_person;
alternatively (if you must keep insert_person), after insert_person re-query
lookup(canonical) and if the returned PersonId != the one you inserted, delete
the orphan people row and return the winner's id.

Comment on lines +16 to +58
pub async fn handle_list(store: &PeopleStore, limit: usize) -> Result<RpcOutcome<Value>, String> {
let limit = limit.clamp(1, 500);
let people = store.list().await.map_err(|e| format!("list: {e}"))?;
let now = Utc::now();

let mut ranked: Vec<(Value, f32)> = Vec::with_capacity(people.len());
for p in people {
let interactions = store
.interactions_for(p.id)
.await
.map_err(|e| format!("interactions_for: {e}"))?;
let s = score(&interactions, now);
let handles: Vec<Value> = p
.handles
.iter()
.map(|h| {
let (kind, value) = h.as_key();
json!({ "kind": kind, "value": value })
})
.collect();
ranked.push((
json!({
"person_id": p.id.to_string(),
"display_name": p.display_name,
"primary_email": p.primary_email,
"primary_phone": p.primary_phone,
"handles": handles,
"score": s.score,
"components": {
"recency": s.recency,
"frequency": s.frequency,
"reciprocity": s.reciprocity,
"depth": s.depth,
},
"interaction_count": interactions.len(),
}),
s.score,
));
}
ranked.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
let people_json: Vec<Value> = ranked.into_iter().take(limit).map(|(v, _)| v).collect();
Ok(RpcOutcome::new(json!({ "people": people_json }), vec![]))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

N+1 query pattern when listing people; no top‑K short‑circuit.

handle_list calls store.interactions_for(p.id).await once per person inside a sequential loop and only applies limit after sorting. Each interactions_for is a spawn_blocking round-trip plus a SQL query against interactions, so cost grows linearly with the number of people regardless of limit. With realistic address-book sizes (hundreds to thousands of contacts) this will dominate response time and serialize behind a single Tokio blocking thread per call.

Two reasonable mitigations:

  • Add an aggregate path on PeopleStore (e.g., list_with_scores(limit) or interaction_summaries()) that fetches the rows needed for scoring in a single query, then run score() over the in-memory result.
  • At minimum, parallelize the per-person fetches with futures::future::join_all so they don't serialize end-to-end.

Either way, since score = 0 when any component is 0 and frequency = 0 for people with no interactions in the window, the long tail of dormant contacts is doing a SQL round-trip just to land at the bottom of the list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/people/rpc.rs` around lines 16 - 58, handle_list currently
issues an interactions_for(p.id).await inside a sequential loop which causes an
N+1 query pattern and serial spawn_blocking calls; fix by either (A) adding an
aggregate method on PeopleStore (suggest names: list_with_scores(limit) or
interaction_summaries()) that runs a single SQL query to return interaction
aggregates needed by score() and then compute score() in-memory for each person
before sorting and truncating by limit, or (B) minimally parallelizing the
per-person fetches by collecting futures for store.interactions_for(p.id) and
awaiting them with futures::future::join_all so the DB calls run concurrently,
and ensure you apply limit before expensive work by sorting/truncating after
computing scores. Use the function name handle_list and the method
interactions_for as anchors when making the change.

Reads last N messages from $CHAT_DB (default ~/Library/Messages/chat.db),
ingests into an in-memory PersonalIndex via IndexWriter::upsert, then
runs IndexReader::keyword_search over a hardcoded query list. No
embeddings — pure FTS5/bm25 over real personal data.

Use to benchmark ingest throughput on a real corpus and to sanity-check
that keyword search ranking + snippet generation behave on real text.
On a 5000-message slice: ingest at ~47k items/s, search returns
3 hits with snippets in sub-millisecond time.

Run: cargo run --example real_imessage_demo --release
Requires Full Disk Access for the running terminal.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
examples/real_imessage_demo.rs (2)

63-73: Nit: collapse the two None branches.

is_from_me || handle.is_empty() reads the same and drops one branch.

Proposed tweak
-        let author = if is_from_me {
-            None
-        } else if handle.is_empty() {
-            None
-        } else {
-            Some(Person {
-                display_name: None,
-                email: None,
-                source_id: Some(handle.clone()),
-            })
-        };
+        let author = if is_from_me || handle.is_empty() {
+            None
+        } else {
+            Some(Person {
+                display_name: None,
+                email: None,
+                source_id: Some(handle.clone()),
+            })
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/real_imessage_demo.rs` around lines 63 - 73, Collapse the two None
branches when building author: replace the nested if/else that checks is_from_me
then handle.is_empty() with a single condition (is_from_me || handle.is_empty())
so author is None in either case, otherwise return Some(Person { display_name:
None, email: None, source_id: Some(handle.clone()) }); update the block that
assigns to author (the author variable and Person construction) accordingly.

33-44: Add comment documenting that recent iMessage rows on Big Sur+ will be skipped.

On macOS Big Sur and later, the Messages app stores the body of many (most) recent messages in the attributedBody BLOB column (a binary plist/NSAttributedString archive) and leaves message.text NULL. The current query's WHERE m.text IS NOT NULL AND length(m.text) > 0 filter will silently skip those rows, causing the demo to pull back only older plain-text messages and potentially appear to have sparse results on modern chat databases. Add a comment before line 33 to document this limitation so users understand why the demo may return fewer messages than expected.

Suggested fix
+    // NOTE: on Big Sur+ many rows have NULL `text` and the body lives in
+    // `attributedBody` (binary plist). This demo only ingests the plain-text
+    // subset; expect a fraction of recent messages to be skipped.
     let mut stmt = conn.prepare(
         "SELECT m.ROWID, \
                 COALESCE(m.text, '') as text, \

A fuller implementation would SELECT m.attributedBody and decode the NSString payload from the archived plist when text is NULL, but documenting the limitation is sufficient for a demo.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/real_imessage_demo.rs` around lines 33 - 44, Add a brief comment
just above the SQL prepare call (the conn.prepare(...) that builds `stmt`)
explaining that on macOS Big Sur and later many recent message bodies are stored
in the `attributedBody` BLOB (an archived NSAttributedString/plist) and that the
WHERE clause `m.text IS NOT NULL AND length(m.text) > 0` will therefore skip
those rows, causing the demo to return only older plain-text messages; note that
a fuller fix would SELECT and decode `m.attributedBody` when `text` is NULL, but
for the demo we document the limitation instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/real_imessage_demo.rs`:
- Around line 63-73: Collapse the two None branches when building author:
replace the nested if/else that checks is_from_me then handle.is_empty() with a
single condition (is_from_me || handle.is_empty()) so author is None in either
case, otherwise return Some(Person { display_name: None, email: None, source_id:
Some(handle.clone()) }); update the block that assigns to author (the author
variable and Person construction) accordingly.
- Around line 33-44: Add a brief comment just above the SQL prepare call (the
conn.prepare(...) that builds `stmt`) explaining that on macOS Big Sur and later
many recent message bodies are stored in the `attributedBody` BLOB (an archived
NSAttributedString/plist) and that the WHERE clause `m.text IS NOT NULL AND
length(m.text) > 0` will therefore skip those rows, causing the demo to return
only older plain-text messages; note that a fuller fix would SELECT and decode
`m.attributedBody` when `text` is NULL, but for the demo we document the
limitation instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70f624eb-1dd2-46fe-a539-137d2f592428

📥 Commits

Reviewing files that changed from the base of the PR and between 26664ca and c03e59a.

📒 Files selected for processing (1)
  • examples/real_imessage_demo.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants