feat(life_capture): A1 iMessage ingest controller#818
feat(life_capture): A1 iMessage ingest controller#818jwalin-shah wants to merge 59 commits intotinyhumansai:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds foundational life-capture and curated-memory subsystems plus people/eventkit domains, wiring index/embedder/ingest/storage into server startup and agent prompt contexts; also adds installer resolver test and macOS permissions, many new modules, schemas, RPC handlers, and extensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Gmail as Gmail Webhook
participant Composio as ComposioBridge
participant Embedder as HostedEmbedder
participant Index as PersonalIndex
participant Writer as IndexWriter
participant RPC as life_capture.rpc
Gmail->>Composio: Trigger event (messageId, payload)
Composio->>Composio: Normalize → Item{source,external_id,ts,subject,text,metadata}
Composio->>Embedder: embed_batch([text])
Embedder-->>Composio: vector
Composio->>Writer: upsert(Item)
Writer->>Index: write/transaction (dedupe by source+external_id)
Index-->>Writer: item_id, replaced
Writer-->>Composio: upsert result
Composio->>RPC: return ingest outcome (item_id,replaced)
sequenceDiagram
participant User as Agent Session
participant Agent as Agent
participant Snap as CuratedSnapshot
participant Prompt as PromptBuilder
participant Reader as IndexReader
participant Index as PersonalIndex
User->>Agent: start conversation (first turn)
Agent->>Snap: ensure_curated_snapshot()
Snap-->>Agent: optional snapshot
Agent->>Prompt: build(PromptContext{curated_snapshot})
Prompt->>Reader: hybrid_search(query, snapshot-aware)
Reader->>Index: read FTS + vectors (frozen snapshot used for user files)
Index-->>Reader: hits
Reader-->>Agent: inject citations
Agent->>User: respond with cited items
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/ iMessage into a local personal_index.db, with a Today brief and retrieval- tuned chat as the two surfaces. Three composable privacy modes (Convenience / Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from ~/projects/inbox.
Two implementation plans drafted from the life-capture spec: - Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds. - Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db, Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread stripping, hybrid retrieval (vector + keyword + recency), controller schema + RPC. End-to-end test with synthetic items. No ingestion or UI yet — those are subsequent milestone plans.
Refactors scripts/install.sh to expose resolve_asset_url and verify_asset_reachable. Adds scripts/test_install.sh that exercises the resolver against a committed fixture latest.json. Failures now report the resolved URL and the parse error instead of dying silently.
…le, keyword projection column
Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an
agent-writable, char-bounded curated memory file pair. F15 builds the
char-bounded store with atomic writes; F16 wires a session-start
snapshot into the system prompt (preserves prefix cache); F17 exposes
memory.{add,replace,remove,read} through the controller dispatch so
both the agent loop and skills can use it.
Sits between TinyHumans synthesis (volatile) and personal_index.db
(raw events) — fills the "deliberately curated facts" gap that
neither covered.
Adds src/openhuman/life_capture/ module tree (stubs for embedder, index, migrations, quote_strip, redact, rpc, schemas, types, plus tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs. Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing rusqlite/regex/once_cell/async-trait/tempfile already in tree (Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).
…egex
Previously the country-code group required at least one digit (\+?\d{1,3}),
so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the
whole prefix in a non-capturing optional group so parenthesized area codes
match without a leading country code.
…te, char-bounded)
… table Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets WAL+foreign_keys, open_in_memory() is for tests. vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.
…_id) dedupe Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>. Exposes ensure_vec_extension_registered() crate-wide so the migrations test can register vec0 before opening its in-memory connection.
- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …), plus the user:local ACL EXISTS clause so the same query shape works for the multi-token team v2 ACL without rewrites. - vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement) and ORDER BY distance; score = 1/(1 + distance) so callers can blend it with keyword bm25 on the same monotonic scale. Shared ItemRow + into_hit() helper lets both queries reuse the same row shape; rusqlite query_map closures hand-build it because rusqlite has no sqlx FromRow equivalent.
Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs, normalises each independently to [0,1], then re-ranks with 0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30). Documents present in only one leg get 0 for the missing signal but still compete on the others; only items with neither signal are dropped. Same-vector twin items now break their tie by recency.
…rch) Adds two controllers exposed via the registry: - life_capture.get_stats — total items, per-source counts, last-ingest ts. - life_capture.search — hybrid (vector + keyword + recency) search with embed-then-rank, optional k (default 10, capped at 100). Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>) and have no per-call context object. F14 will call runtime::init() at app startup; until then handlers return a structured 'not initialised' error so the failure mode is loud, not silent. Schemas registered in core/all.rs alongside cron.
… → search) FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector so the same text round-trips identically through vec0 — keeps the e2e test hermetic, no network call. Verifies that: - quote-strip drops the 'On … wrote:' block before indexing, - redact masks the email address before indexing, - the upserted item resurfaces via hybrid_search with both transformations preserved on the returned text.
After Config::load_or_init succeeds, open the personal_index.db at config.workspace_dir and register the life_capture runtime. Embedder is env-gated: OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1) OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small) If no key is set we still open the index file (so Plan #2's ingestion worker can write to it) but skip runtime registration — controllers then return the 'not initialised' error from runtime::get() instead of silently calling a misconfigured embedder. Config-schema integration deferred to a follow-up; env-driven keeps F14 non-invasive while we land the rest of the foundation.
…ection snapshot_pair(memory, user) returns a MemorySnapshot containing plain strings — no reference back to the stores — so taking a snapshot at session start and reusing it across turns gives a stable system-prompt prefix and lets the LLM prefix-cache hit on every subsequent turn. Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader, but openhuman assembles agent prompts per-agent under src/openhuman/agent/agents/*/prompt.rs (not the plan's stale src-tauri/src/commands/chat.rs path); the prompt-builder integration is deferred to Plan #2 alongside the agent-context refactor.
… controllers Adds four controllers under the memory_curated namespace (the 'memory' namespace is already owned by the long-term memory subsystem): - memory_curated.read — read MEMORY.md or USER.md - memory_curated.add — append entry, char-bounded - memory_curated.replace — substring replace, char-bounded - memory_curated.remove — drop entries containing a needle 'file' input is a typed enum (memory|user) so adapters reject anything else at validation time. Runtime state lives in curated_memory::runtime behind a tokio OnceCell, mirroring life_capture's pattern: startup in jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes' char limits (memory: 2200, user: 1375) and registers the runtime. Handlers return 'not initialised' until init runs, so failure is loud.
…ings Findings from Codex + Gemini second-opinion review: 1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On (source, external_id) conflict the row's id was kept but the caller's fresh UUID was used for upsert_vector — the vector wrote under an id no row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same transaction; mutates caller's Item.id to the canonical id (so the next upsert_vector lands on the right row) and orphan-deletes any vector already written under the wrong id. Signature change: upsert(&[Item]) -> upsert(&mut [Item]). 2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a failed INSERT permanently lost the item's vector. Now wrapped in tx. 3. Runtime over-gated get_stats on the embedder (Codex). Split runtime into separate INDEX + EMBEDDER OnceCells: get_stats only requires index, search requires both. Index initialises whenever the workspace dir is reachable; embedder is env-gated as before. 4. Startup race (both reviewers): runtime init lived inside the post- serve tokio::spawn block, so axum::serve was already accepting before the OnceCells were set. Hoisted both bootstraps (curated_memory and life_capture) into helpers called inline before axum::serve. 5. runtime::get error message lied (Codex) — said 'set embeddings.api_key in config' but startup actually reads env vars. Fixed text. Bonus: rename controller namespace memory_curated -> curated_memory (Codex preference; nothing depends on it yet so renamed before clients do). Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable). 4696 lib tests pass.
…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.
Critical: - curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`, which could rewrite MEMORY.md / USER.md from an empty baseline on transient I/O or permission failures. Reject empty needle in replace/remove. - curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC boundary (remove with "" deletes every entry). - life_capture/index.rs: upsert_vector rejects orphan ids (items row missing), which would have inserted vectors that never join in vector_search. - life_capture/embedder.rs: validate response length matches input, indices are contiguous 0..n, and every vector matches dim() — prevents silent misalignment or wrong-dimensional vectors flowing into the 1536-wide sqlite-vec table. Added 30s request + 10s connect timeouts. Retrieval hardening: - life_capture/index.rs: keyword_search now escapes FTS5 operators by tokenizing on whitespace and wrapping each token as a quoted literal. Prevents errors or unintended matches from stray quotes, AND/OR/NEAR, column specifiers in user input. - life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until via post-filter (single consistent pass across keyword+vector fusion). - life_capture/rpc.rs: handle_search validates embedder dim and response length against the fixed 1536-wide index column — clear RPC error instead of a cryptic sqlite-vec failure when the embeddings model is swapped. Docs: - docs/event-bus.md: remove erroneous `.await` on register_native_global (it is sync; the handler closure is async). - life-capture design spec: soften PII redaction claim — only regex is implemented today; light NER is flagged as a future enhancement.
Rendered prompts now surface runtime curated-memory writes: - `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`. - `UserFilesSection` prefers the snapshot over the workspace-file loader when one is attached, and injects `USER.md` alongside `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper. - `Session` carries `Option<Arc<MemorySnapshot>>`, populated by `ensure_curated_snapshot` on the first turn from `curated_memory::runtime::get()`. Reused across turns so prompt bytes stay frozen (KV-cache prefix contract) while mid-session `curated_memory.add/replace/remove` writes land on the NEXT session. - `ParentExecutionContext` inherits the snapshot so sub-agents render identical `MEMORY.md`/`USER.md` blocks as the parent. - Legacy workspace-file fallback preserved for embeds that don't initialise the curated-memory runtime (pure unit tests).
…ctness (A7 review blockers) Block lifetime (calendar.rs, reminders.rs): replace the UB cast `&*block as *const _ as *mut _` with `RcBlock::as_ptr(&block).cast()`. `RcBlock::as_ptr` returns a proper `*mut Block<F>` owned by the RcBlock; holding `block` alive until after `blocking_recv()` returns guarantees the block outlives EventKit's retention window with no UB. EKEventStore caching (runtime.rs, calendar.rs, reminders.rs): add a process-global `Arc<Mutex<EventStoreHandle>>` singleton via `OnceLock`. `EventStoreHandle` is a newtype over `Retained<EKEventStore>` with manual `Send + Sync` — EKEventStore is Apple-documented as thread-safe and is always guarded by the Mutex at call sites. `fetch_events` and `save_reminder` now accept the cached store instead of constructing their own, matching Apple's one-store-per-process recommendation. runtime::get() is also changed from panic to `Result<_, String>` so RPC callers receive a clean error instead of a core crash on DB open/migration failure. Dedupe correctness (calendar.rs): chose option (a) — skip events that lack `calendarItemExternalIdentifier` entirely (log at `info`). The fallback to the local `calendarItemIdentifier` caused re-emission: before iCloud syncs, the local id yields hash A; after sync the external id yields hash B — same event appears twice. Skipping once is safer than silently duping; the log makes the miss rate observable. Validation (rpc.rs): `handle_create_reminder` now rejects empty/blank title, priority > 9, and malformed RFC 3339 due_date with structured error strings. Past due_date logs a warn but does not reject (lenient for clock skew). Tests added: - dedup: `no_dupe_on_resync_with_same_external_id` + `different_external_ids_stored_separately` - runtime: `get_error_on_bad_path` (no panic on bad DB path) - rpc validation: empty title, whitespace title, priority > 9, accepts priority 9, malformed due_date, valid future due_date, past due_date warn-not-reject - non-macOS stubs: `list_events` and `create_reminder` stub tests compile but are only active on non-macOS targets; macOS CI runs the live path All 20 eventkit tests pass. `cargo check --bin openhuman-core` is clean.
# Conflicts: # Cargo.lock # Cargo.toml
Moves the inline #[cfg(test)] mod tests block out of prompts/mod.rs into prompts/tests.rs to bring the production file under the project's <=500 line guideline and reduce mod.rs weight per the "light mod.rs" rule. No behavior change.
Moves the inline #[cfg(test)] mod tests block out of channel.rs into channel_tests.rs to bring the production file under the project's <=500 line guideline. No behavior change.
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/openhuman/agent/triage/escalation.rs (1)
125-176:⚠️ Potential issue | 🔴 CriticalEscalated sub-agents will receive
Nonefor curated_snapshot and never populate it.
dispatch_target_agentbuilds a fresh Agent viaAgent::from_config(), which initializescurated_snapshot: None(session builder is synchronous and defers population to the first turn viaensure_curated_snapshot()). The snapshot is captured at line 175 beforerun_subagent()executes. However,run_subagent()dispatches to eitherrun_typed_modeorrun_fork_mode, neither of which executes a full session turn—they only run a provider loop. Therefore,ensure_curated_snapshot()is never called, and the escalated sub-agent will never have access to curated memory, unlike normal subagent dispatch where the parent has already populated its snapshot during turn execution.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/triage/escalation.rs` around lines 125 - 176, dispatch_target_agent builds an Agent via Agent::from_config() but captures curated_snapshot before it can be populated, so escalated sub-agents get None; call Agent::ensure_curated_snapshot().await (or the synchronous accessor that forces population) on the newly built Agent before constructing ParentExecutionContext so curated_snapshot is populated, then use agent.curated_snapshot() for the ParentExecutionContext; update dispatch_target_agent to await that population step prior to calling run_subagent/run_typed_mode/run_fork_mode.app/src-tauri/src/imessage_scanner/mod.rs (2)
3-6:⚠️ Potential issue | 🟡 MinorUpdate the scanner rustdoc for life-capture ingest.
The module docs still describe
openhuman.memory_doc_ingestand the old playbook convention, but the scanner now writes toopenhuman.life_capture_ingest.Based on learnings, new/changed behavior must ship with matching rustdoc / code comments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src-tauri/src/imessage_scanner/mod.rs` around lines 3 - 6, The module-level rustdoc in imessage_scanner (top of mod.rs) is outdated: it references `openhuman.memory_doc_ingest` and the old playbook convention; update the documentation to reflect current behavior by replacing mentions of `openhuman.memory_doc_ingest` with `openhuman.life_capture_ingest`, remove or update the reference to `docs/webview-integration-playbook.md`/WhatsApp convention, and briefly describe that the scanner emits one `openhuman.life_capture_ingest` JSON-RPC call per `(chat_identifier, day)` group (read-only from `~/Library/Messages/chat.db`) so the comments match the actual runtime behavior.
436-443:⚠️ Potential issue | 🟠 MajorTreat JSON-RPC errors as ingest failures before advancing the cursor.
JSON-RPC controller failures can arrive as HTTP 200 with an
"error"envelope. This currently logs success and lets the tick advance the cursor, so failed life-capture writes can be skipped permanently. Also avoid loggingkey, since it can include a phone number or email chat identifier.Suggested envelope check
- let res = http_client().post(&url).json(&body).send().await?; + let res = http_client().post(&url).json(&body).send().await?; + let status = res.status(); + let response_body = res.text().await?; - if !res.status().is_success() { - anyhow::bail!("core rpc {}: {}", res.status(), res.text().await?); + if !status.is_success() { + anyhow::bail!("core rpc {}: {}", status, response_body); + } + + let env: serde_json::Value = serde_json::from_str(&response_body)?; + if let Some(error) = env.get("error") { + anyhow::bail!("life_capture_ingest rpc error: {}", error); } - log::info!("[imessage] life_capture ingest ok key={}", key); + log::info!("[imessage] life_capture ingest ok");As per coding guidelines, never log secrets or full PII — redact.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src-tauri/src/imessage_scanner/mod.rs` around lines 436 - 443, The HTTP response handling after http_client().post(&url).json(&body).send().await? (res) must detect JSON-RPC error envelopes even when HTTP 200, so parse the response body as JSON (e.g., serde_json::Value) and if it contains an "error" field (non-null) treat it as a failure (anyhow::bail!) instead of logging success and advancing the cursor; also remove logging of the raw key value in the success log (redact or omit it) so life_capture ingest success uses a non-PII message and failures include the error details from the JSON envelope.src/openhuman/agent/prompts/mod.rs (1)
718-720:⚠️ Potential issue | 🟠 MajorThread the curated snapshot into the direct sub-agent renderer too.
UserFilesSectionnow honorsctx.curated_snapshot, but this renderer still re-readsMEMORY.mdfrom disk. Sub-agent prompts can diverge mid-session after curated-memory writes, violating the session-frozen prompt/cache contract documented above.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/prompts/mod.rs` around lines 718 - 720, The sub-agent renderer still calls inject_workspace_file_capped(&mut out, workspace_dir, "MEMORY.md", USER_FILE_MAX_CHARS) when options.include_memory_md, causing a disk read instead of using the session-curated memory; change this branch to check ctx.curated_snapshot (or equivalent field) and, if present, inject the curated snapshot contents into the renderer output (or call an overload of inject_workspace_file_capped that accepts raw content) instead of reading MEMORY.md from workspace_dir so UserFilesSection and the direct sub-agent renderer both use the same in-memory curated snapshot.
🟡 Minor comments (14)
src/openhuman/life_capture/redact.rs-8-17 (1)
8-17:⚠️ Potential issue | 🟡 MinorPHONE regex matches false positives (10-digit order/tracking IDs), but proposed
\bfix breaks parentheses format.The pattern correctly identifies the false-positive risk: plain 10-digit runs like
1234567890match regardless of context (order#1234567890,tracking 1234567890 shipped), getting redacted before indexing and becoming unsearchable. However, the proposed\bfix breaks the existing test case—(415) 555-0123fails to match because\bbetween the opening paren and first digit prevents the match. The parentheses format must remain matchable per the test suite (line 46).A more nuanced fix is needed—such as requiring at least one separator/paren form, using negative lookahead/lookbehind to exclude bare digit runs, or anchoring to word boundaries only where appropriate (after parens, around separators). Consider what formats should legitimately match when preceded/followed by alphanumerics (e.g.,
order#1234567890should not, buttel:(415)555-0123might).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/redact.rs` around lines 8 - 17, PHONE's regex is matching bare 10-digit runs (false positives) and adding a simple \b breaks parenthesized numbers; update the PHONE Regex (the static PHONE Lazy<Regex> / Regex::new call) to require either explicit separators or parentheses around the area code or to forbid matching a continuous digit-run with no separators by using alternation plus lookarounds (e.g., one branch that matches formats with separators or \(...\) and another branch that uses negative lookbehind/negative lookahead to reject a preceding/following alphanumeric digit when matching a plain 10-digit sequence); implement this by modifying the pattern inside Regex::new so it only matches numbers with separators/parentheses or ensures the 10-digit branch is not adjacent to word characters.src/core/all.rs-287-296 (1)
287-296:⚠️ Potential issue | 🟡 MinorAdd descriptions for the other newly registered namespaces.
life_captureandcurated_memoryare now registered controllers, butnamespace_description()only addspeopleandeventkit, so CLI/help discovery will show no description for two new user-facing namespaces.Proposed fix
"learning" => Some( "User context enrichment — LinkedIn profile scraping and onboarding intelligence.", ), + "life_capture" => Some( + "Personal index ingestion, search, stats, and retrieval for captured life data.", + ), + "curated_memory" => Some( + "Curated MEMORY.md and USER.md scratchpad reads and writes.", + ), "people" => Some( "Contact resolution and recency × frequency × reciprocity × depth scoring.", ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/all.rs` around lines 287 - 296, namespace_description() only returns descriptions for "people", "notification", and "eventkit", so the newly registered namespaces "life_capture" and "curated_memory" appear undocumented in CLI/help; update the match in namespace_description() to include arms for "life_capture" and "curated_memory" returning concise user-facing descriptions (similar style to "people" and "eventkit") so those controllers surface meaningful help text; reference the function namespace_description() and add the two new string keys "life_capture" and "curated_memory" with appropriate description strings.src/openhuman/life_capture/quote_strip.rs-14-33 (1)
14-33:⚠️ Potential issue | 🟡 MinorPreserve true no-op behavior when no quote is present.
strip_quoted_reply()currently trims and rewrites line endings even when there is no marker and no quoted>line, so “passthrough” inputs can still be changed.Proposed fix
pub fn strip_quoted_reply(body: &str) -> String { // Cut at the earliest of the two marker types. let mut cut: Option<usize> = None; for re in [&*ON_DATE_WROTE, &*OUTLOOK_SEP] { if let Some(m) = re.find(body) { cut = Some(cut.map_or(m.start(), |c| c.min(m.start()))); } } let head = if let Some(idx) = cut { &body[..idx] } else { body }; + if cut.is_none() && !head.lines().any(|l| l.trim_start().starts_with('>')) { + return body.to_string(); + } + // Drop any line starting with '>'. let kept: Vec<&str> = head .lines() .filter(|l| !l.trim_start().starts_with('>')) .collect(); kept.join("\n").trim().to_string() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/quote_strip.rs` around lines 14 - 33, strip_quoted_reply currently always rebuilds the string which can change line endings and whitespace even when there is nothing to strip; modify strip_quoted_reply so that if no marker from ON_DATE_WROTE or OUTLOOK_SEP is found and no line begins with '>' it returns body.to_string() immediately (preserving exact input). Otherwise keep the existing logic that computes cut, slices head, filters '>' lines, joins and trims; reference the function strip_quoted_reply and the regex markers ON_DATE_WROTE and OUTLOOK_SEP to locate where to add the early-return check.src/openhuman/life_capture/migrations/0001_init.sql-9-9 (1)
9-9:⚠️ Potential issue | 🟡 MinorIndex
text_keywordin FTS or remove the column.
text_keywordis documented as the raw FTS/BM25 projection, butitems_ftsand its triggers indextextonly. Connectors that populatetext_keywordwill not affect keyword search.Suggested migration adjustment
CREATE VIRTUAL TABLE IF NOT EXISTS items_fts USING fts5( subject, - text, + text_keyword, content='items', content_rowid='rowid', tokenize='porter unicode61' ); @@ - INSERT INTO items_fts(rowid, subject, text) - VALUES (new.rowid, COALESCE(new.subject, ''), new.text); + INSERT INTO items_fts(rowid, subject, text_keyword) + VALUES (new.rowid, COALESCE(new.subject, ''), COALESCE(new.text_keyword, new.text)); @@ - INSERT INTO items_fts(items_fts, rowid, subject, text) - VALUES ('delete', old.rowid, COALESCE(old.subject, ''), old.text); + INSERT INTO items_fts(items_fts, rowid, subject, text_keyword) + VALUES ('delete', old.rowid, COALESCE(old.subject, ''), COALESCE(old.text_keyword, old.text)); @@ - INSERT INTO items_fts(items_fts, rowid, subject, text) - VALUES ('delete', old.rowid, COALESCE(old.subject, ''), old.text); - INSERT INTO items_fts(rowid, subject, text) - VALUES (new.rowid, COALESCE(new.subject, ''), new.text); + INSERT INTO items_fts(items_fts, rowid, subject, text_keyword) + VALUES ('delete', old.rowid, COALESCE(old.subject, ''), COALESCE(old.text_keyword, old.text)); + INSERT INTO items_fts(rowid, subject, text_keyword) + VALUES (new.rowid, COALESCE(new.subject, ''), COALESCE(new.text_keyword, new.text)); END;Also applies to: 30-35, 39-54
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/migrations/0001_init.sql` at line 9, The schema adds a text_keyword column but the FTS virtual table and triggers only index the text column (items_fts and its triggers), so populate or index text_keyword in the FTS pipeline or remove the unused column; update the FTS definition (items_fts) and its associated insert/update/delete triggers to include text_keyword in the FTS content and bm25/FTS projections, or drop text_keyword from the CREATE TABLE if you intend to only index text, making sure to update any migration lines referencing text_keyword (and the triggers that mirror into items_fts) so connectors that write text_keyword actually affect search.docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md-24-25 (1)
24-25:⚠️ Potential issue | 🟡 MinorRefresh the documented RPC surface to match the shipped controller.
The plan still references
embed_and_store, a bare-arraysearchreturn, andlife_capture.*call names. This PR shipsopenhuman.life_capture_ingestand wraps search results as{ "hits": [...] }, so the plan will mislead follow-up work and smoke tests.Based on learnings, new/changed behavior must ship with matching rustdoc / code comments and architecture docs when user-visible behavior changes.
Also applies to: 1423-1438, 2093-2097
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-22-life-capture-01-foundation.md` around lines 24 - 25, Update the plan/docs to match the shipped controller surface: replace references to the removed embed_and_store API (LifeCaptureController::embed_and_store) with the actual ingest RPC name openhuman.life_capture_ingest, change any mention of search returning a bare array to document that search returns an object with "hits": [...] (not a raw array), and ensure RpcOutcome-style handler mentions align with the current controller handlers (e.g., LifeCaptureController get_status, get_stats, search) and openhuman.life_capture_ingest call names so follow-up work and smoke tests reflect the real, shipped behavior.scripts/install.sh-203-217 (1)
203-217:⚠️ Potential issue | 🟡 Minor
verify_asset_reachableis defined but never called — dead code or missing wiring?The helper is declared with exponential-backoff HEAD retries but no call site wires it into
resolve_from_latest_jsonor the post-resolution flow. Either hook it in afterASSET_URLis resolved (e.g., before thecurl -fL ... -o ${DOWNLOAD_PATH}step) so resolution-time confidence matches download-time behaviour, or drop it to avoid drift.Want me to wire
verify_asset_reachable "${ASSET_URL}"into the post-resolution path?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/install.sh` around lines 203 - 217, The verify_asset_reachable helper is never invoked causing dead code or missed pre-download checks; after ASSET_URL is resolved (e.g., in resolve_from_latest_json or immediately after that resolution step) call verify_asset_reachable "${ASSET_URL}" and abort on non-zero return before performing the curl download that writes to DOWNLOAD_PATH (the existing curl -fL ... -o ${DOWNLOAD_PATH} step), so the script performs the HEAD retry/backoff check prior to attempting the actual download.src/openhuman/eventkit/rpc.rs-35-39 (1)
35-39:⚠️ Potential issue | 🟡 MinorMinor:
end <= startrejects equal timestamps, but error text says "must be greater".If a caller passes the same value for
start_tsandend_ts(e.g., an instantaneous probe), the condition fails and the error saysend_ts must be greater than start_ts, which is consistent — but the schema description doesn't advertise that constraint. Either loosen toend < start(allow equal) or document the strict-greater-than requirement on theend_tsinput inschemas.rs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/eventkit/rpc.rs` around lines 35 - 39, The code currently rejects equal timestamps via the check "if end <= start" in src/openhuman/eventkit/rpc.rs; to allow instantaneous probes, change the condition to "if end < start" and update the error message to reflect the new non-strict constraint (e.g., "end_ts ({end}) must be >= start_ts ({start})"); also update the schema/doc in schemas.rs if it currently claims a strict-greater-than requirement so the API docs match the new behavior.src/openhuman/eventkit/schemas.rs-244-247 (1)
244-247:⚠️ Potential issue | 🟡 MinorSilent clamp masks the
rpc.rspriority validation.
rpc::handle_create_reminderinsrc/openhuman/eventkit/rpc.rsexplicitly rejects priorities >9 with"create_reminder: 'priority' must be 0–9 (got {p})", but this adapter pre-clamps viav.min(9), sopriority=42silently becomes9and the rpc validation never fires. Callers get no indication their input was invalid, and the adapter and rpc layer disagree on the contract.Either drop the clamp and let rpc reject, or validate explicitly in the adapter with the same error text.
🐛 Proposed fix — let rpc validation handle the range
- let priority = params - .get("priority") - .and_then(|v| v.as_u64()) - .map(|v| v.min(9) as u8); + let priority = match params.get("priority") { + None | Some(Value::Null) => None, + Some(Value::Number(n)) => Some( + n.as_u64() + .ok_or("create_reminder: 'priority' must be an unsigned integer 0–9")? + as u8, + ), + Some(_) => return Err( + "create_reminder: 'priority' must be an unsigned integer 0–9".into(), + ), + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/eventkit/schemas.rs` around lines 244 - 247, The adapter in src/openhuman/eventkit/schemas.rs silently clamps priority via .map(|v| v.min(9) as u8), masking rpc::handle_create_reminder's validation; remove the clamp so the code only parses the numeric value (e.g., keep .and_then(|v| v.as_u64()).map(|v| v as u8)) and let rpc::handle_create_reminder enforce the 0–9 range, or alternatively perform explicit validation in the adapter that returns the identical error string used by rpc::handle_create_reminder; update the priority parsing in the schemas.rs snippet (the code that defines priority) accordingly.src/openhuman/life_capture/migrations.rs-53-68 (1)
53-68:⚠️ Potential issue | 🟡 MinorRemove unused
run_asyncor document its purpose.
run_asyncat line 53 is not called anywhere in the codebase. The actual pattern inPersonalIndex::open()andPersonalIndex::open_in_memory()usesspawn_blockingwith the syncrun()function, then wraps the connection inArc<tokio::sync::Mutex<Connection>>at the index level—not within migrations.If
run_asyncexists for future extensibility or as part of a public API contract, add a comment explaining the use case. Otherwise, remove it to reduce maintenance surface.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/migrations.rs` around lines 53 - 68, The function run_async is unused—either remove it or document its intended public API/extension use; if removing, delete the run_async function and any tests/docs referencing it so migrations.rs only exposes run, otherwise add a clear doc comment above run_async explaining why it exists (e.g., convenience async wrapper for blocking run used by external callers) and mark it #[allow(dead_code)] or pub(crate)/pub as appropriate, and ensure references in PersonalIndex::open and PersonalIndex::open_in_memory remain using the current spawn_blocking + run pattern to avoid changing behavior.docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md-13-342 (1)
13-342:⚠️ Potential issue | 🟡 MinorDuplicated "Workstream A" risks confusing the agentic executor.
The file now has the new Workstream A (Task A4 at line 19) and "Workstream A (original) — kept for history" (line 60) which also defines Tasks A1–A4 with the same IDs. Given the doc header instructs workers to execute with
superpowers:executing-plans, a runner iterating on unchecked- [ ]items will hit both copies of A4 and re-try work already completed in the new section. Either move the history block behind a collapsible<details>, prefix its tasks withA-old-*, or drop it entirely and keep the context in a short paragraph.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-22-track1-ship-pipeline.md` around lines 13 - 342, The document contains duplicated "Workstream A" sections (the new Workstream A with Task A4 and the "Workstream A (original) — kept for history" block) which will confuse automated executors; remove or demote the historical copy so only one actionable Workstream A/A4 remains. Fix by either (a) moving the entire "Workstream A (original) — kept for history" block into a non-actionable collapsible section (e.g., wrap in a <details> summary) or (b) renaming its task IDs to non-conflicting identifiers like A-old-*, or (c) delete the historical duplicate entirely and condense its context into a short paragraph; ensure the actionable Task A4 / "Task A4: Wire the resolver test into CI and drop the ubuntu matrix row" and the checklist steps remain unchanged and uniquely numbered.src/openhuman/life_capture/embedder.rs-21-25 (1)
21-25:⚠️ Potential issue | 🟡 MinorSilent fallback drops the configured timeouts.
reqwest::Client::builder().…build().unwrap_or_else(|_| reqwest::Client::new())discards the 30s request / 10s connect timeouts on any builder error and hands back a client with no timeouts, which can hang request threads indefinitely. Builder failures here are TLS/system-level — rare but fatal to diagnose once you hit them. Prefer bubbling the error (Result<Self, _>) orexpectwith a clear message;reqwest::Client::new()is not a safer fallback (it internally unwraps the same builder).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/embedder.rs` around lines 21 - 25, The current use of reqwest::Client::builder().…build().unwrap_or_else(|_| reqwest::Client::new()) silently drops the configured timeouts (30s request / 10s connect) on builder errors; replace this by propagating the build error or failing with an explicit message so you don't return a client without timeouts. Update the code that initializes the local variable http (the reqwest::Client::builder().timeout(...).connect_timeout(...).build().unwrap_or_else(...)) to either return a Result from the enclosing constructor/function (e.g., Result<Self, reqwest::Error>) and use .build()? or .map_err(...) to bubble the error, or call .build().expect("failed to build reqwest client with configured timeouts: <context>") so failures are explicit; do not fallback to reqwest::Client::new() which loses the timeouts. Ensure the change is applied where http is created so configured timeouts are always honored or the error surfaces.src/openhuman/life_capture/index.rs-167-177 (1)
167-177:⚠️ Potential issue | 🟡 MinorSilent fallback to empty
sourcestring hides serialization bugs.let source = serde_json::to_value(item.source) .ok() .and_then(|v| v.as_str().map(str::to_string)) .unwrap_or_default();If
Sourceis ever extended in a way that doesn't serialize to a bare string (e.g. variant with data), this collapses silently to"". The row is then written withsource = "", which won't match anything inQuery.sourcesfiltering inapply_query_filters, nor will anyone easily spot it in the DB. Hybrid search will return zero hits for an otherwise-correctly-ingested item.Similarly, the unchecked
author_jsonandmetadata_jsonfallbacks (unwrap_or_else(|_| "null".into())/"{}".into()) will round-trip throughfrom_stron read and mask the underlying error.🛡️ Proposed fix
- let source = serde_json::to_value(item.source) - .ok() - .and_then(|v| v.as_str().map(str::to_string)) - .unwrap_or_default(); - let author_json = item - .author - .as_ref() - .map(|a| serde_json::to_string(a).unwrap_or_else(|_| "null".into())); - let metadata_json = serde_json::to_string(&item.metadata) - .unwrap_or_else(|_| "{}".into()); + let source = serde_json::to_value(item.source) + .context("serialize item.source")? + .as_str() + .ok_or_else(|| anyhow::anyhow!("Source must serialize to a JSON string"))? + .to_string(); + let author_json = item + .author + .as_ref() + .map(serde_json::to_string) + .transpose() + .context("serialize item.author")?; + let metadata_json = serde_json::to_string(&item.metadata) + .context("serialize item.metadata")?;Given the
upsertentry point already returnsanyhow::Result, propagating is the natural shape.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/life_capture/index.rs` around lines 167 - 177, The current code silently swallows serialization errors for item.source, author, and metadata (the owned.iter_mut block building source, author_json, metadata_json), which hides bugs and writes empty/placeholder values; update the logic to propagate serde_json serialization errors back to the caller (the upsert entry point which returns anyhow::Result) instead of using unwrap_or_default/unwrap_or_else — e.g., replace the silent fallbacks with ?-style propagation or map_err conversions so serde_json::to_value(item.source), serde_json::to_string(&item.metadata), and serde_json::to_string(a) return errors to the caller and abort the upsert on serialization failure.src/openhuman/people/resolver.rs-103-175 (1)
103-175:⚠️ Potential issue | 🟡 MinorSeeded
Person.display_nameis dropped when the contact also has an email/phone.When a contact has both an email and a display name (the common case:
AddressBookContact { display_name: Some("Alice Smith"), emails: ["alice@..."], phones: [...] }), the primary handle isHandle::Email, soresolve_or_createmints aPersonwith:// from resolve_or_create, Handle::Email branch (display_name, primary_email, primary_phone) = (None, Some(email), None)The contact's display name is then only attached as an alias row in
handle_aliases— it never makes it intoPerson.display_name. The UI (handle_listinrpc.rs) readsp.display_namedirectly, so address-book-seeded contacts render without a name until something else enriches them.The existing test
seed_from_address_book_populates_storedoesn't catch this because it only asserts onPersonIdequality, not ondisplay_name.♻️ Sketch of a fix
Either:
- After
resolve_or_create, do a best-effortstore.enrich_person(pid, display_name, primary_phone)that fills in any NULL columns on the existing row; or- Introduce a
resolver.upsert_from_contact(&AddressBookContact)path that builds the fullPersonskeleton up front instead of going through the single-handleresolve_or_create.Option 1 is the smaller change and matches the "seed is idempotent, don't clobber" contract. Also worth extending
seed_from_address_book_populates_storewith:+ let alice = s.get(alice_id.unwrap()).await.unwrap().unwrap(); + assert_eq!(alice.display_name.as_deref(), Some("Alice Smith"));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/resolver.rs` around lines 103 - 175, seed_from_address_book currently uses resolve_or_create(&primary) which creates/returns a Person without the contact's display_name when primary is an email/phone, so the contact name only becomes a handle alias and never populates Person.display_name; after Successfully resolving/creating the primary (the Ok(pid) branch in seed_from_address_book) call a best-effort enrichment on the stored person to set display_name and primary_phone if those columns are NULL (e.g., add a store.enrich_person(pid, display_name_opt, phone_opt) or similar), ensuring you pass the contact.display_name and the primary phone and only update NULL fields to preserve existing data; update the test seed_from_address_book_populates_store to assert Person.display_name is populated for a contact with email+display_name.src/openhuman/people/address_book.rs-126-177 (1)
126-177:⚠️ Potential issue | 🟡 Minor
request_accesscan silently hang if invoked on the main thread.The doc comment says "Must not be called from the main thread on macOS (CNContactStore will deadlock)", but nothing actually enforces this, and
SystemContactsSource::fetch_contacts→fetch_via_cn_contact_store→request_accessis reachable from any async task. If the tokio runtime happens to schedule this on the main thread (e.g. from an RPC handler in an app bundle started viaNSApplicationMain),rx.recv()will block forever with no timeout. Consider either:
- Adding a
rx.recv_timeout(Duration::from_secs(30))with a distinctAddressBookError::Timeout, or- Wrapping the call path in
tokio::task::spawn_blockingat thehandle_refresh_address_bookRPC boundary so it never runs on the async executor thread.This would prevent a permanently stuck RPC if the invariant is accidentally violated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/address_book.rs` around lines 126 - 177, request_access can block forever if called on the main thread because rx.recv() waits indefinitely; update the code to avoid permanent hangs by either (A) replacing rx.recv() with rx.recv_timeout(Duration::from_secs(30)) and return a new AddressBookError::Timeout on timeout (update the AddressBookError enum accordingly), or (B) ensure callers never call request_access on an async executor thread by moving the blocking path into a background thread using tokio::task::spawn_blocking at the RPC boundary (e.g. in handle_refresh_address_book / SystemContactsSource::fetch_contacts -> fetch_via_cn_contact_store) so request_access stays on a blocking thread; reference request_access, rx.recv, AddressBookError, fetch_via_cn_contact_store, fetch_contacts, and handle_refresh_address_book when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9bf11610-907f-4555-9e51-47feb8aa8433
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (93)
.github/workflows/installer-smoke.ymlCargo.tomlapp/src-tauri/Info.plistapp/src-tauri/src/imessage_scanner/mod.rsdocs/superpowers/plans/2026-04-21-imessage-live-harness.mddocs/superpowers/plans/2026-04-22-life-capture-01-foundation.mddocs/superpowers/plans/2026-04-22-track1-ship-pipeline.mddocs/superpowers/specs/2026-04-22-life-capture-layer-design.mdscripts/fixtures/latest.jsonscripts/install.shscripts/test_install.shsrc/core/all.rssrc/core/jsonrpc.rssrc/openhuman/agent/agents/archivist/prompt.rssrc/openhuman/agent/agents/code_executor/prompt.rssrc/openhuman/agent/agents/critic/prompt.rssrc/openhuman/agent/agents/integrations_agent/prompt.rssrc/openhuman/agent/agents/loader.rssrc/openhuman/agent/agents/morning_briefing/prompt.rssrc/openhuman/agent/agents/orchestrator/prompt.rssrc/openhuman/agent/agents/planner/prompt.rssrc/openhuman/agent/agents/researcher/prompt.rssrc/openhuman/agent/agents/summarizer/prompt.rssrc/openhuman/agent/agents/tool_maker/prompt.rssrc/openhuman/agent/agents/tools_agent/prompt.rssrc/openhuman/agent/agents/trigger_reactor/prompt.rssrc/openhuman/agent/agents/trigger_triage/prompt.rssrc/openhuman/agent/agents/welcome/prompt.rssrc/openhuman/agent/harness/fork_context.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/runtime.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/types.rssrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/agent/prompts/mod.rssrc/openhuman/agent/prompts/tests.rssrc/openhuman/agent/prompts/types.rssrc/openhuman/agent/triage/escalation.rssrc/openhuman/agent/triage/evaluator.rssrc/openhuman/channels/providers/telegram/channel.rssrc/openhuman/channels/providers/telegram/channel_tests.rssrc/openhuman/context/debug_dump.rssrc/openhuman/curated_memory/mod.rssrc/openhuman/curated_memory/rpc.rssrc/openhuman/curated_memory/runtime.rssrc/openhuman/curated_memory/schemas.rssrc/openhuman/curated_memory/store.rssrc/openhuman/curated_memory/types.rssrc/openhuman/eventkit/calendar.rssrc/openhuman/eventkit/migrations/0001_init.sqlsrc/openhuman/eventkit/mod.rssrc/openhuman/eventkit/reminders.rssrc/openhuman/eventkit/rpc.rssrc/openhuman/eventkit/runtime.rssrc/openhuman/eventkit/schemas.rssrc/openhuman/eventkit/store.rssrc/openhuman/eventkit/types.rssrc/openhuman/learning/prompt_sections.rssrc/openhuman/life_capture/embedder.rssrc/openhuman/life_capture/index.rssrc/openhuman/life_capture/ingest/composio.rssrc/openhuman/life_capture/ingest/mod.rssrc/openhuman/life_capture/migrations.rssrc/openhuman/life_capture/migrations/0001_init.sqlsrc/openhuman/life_capture/migrations/0002_vec.sqlsrc/openhuman/life_capture/mod.rssrc/openhuman/life_capture/quote_strip.rssrc/openhuman/life_capture/redact.rssrc/openhuman/life_capture/rpc.rssrc/openhuman/life_capture/runtime.rssrc/openhuman/life_capture/schemas.rssrc/openhuman/life_capture/tests/e2e.rssrc/openhuman/life_capture/tests/mod.rssrc/openhuman/life_capture/types.rssrc/openhuman/mod.rssrc/openhuman/people/address_book.rssrc/openhuman/people/migrations.rssrc/openhuman/people/migrations/0001_init.sqlsrc/openhuman/people/mod.rssrc/openhuman/people/resolver.rssrc/openhuman/people/rpc.rssrc/openhuman/people/schemas.rssrc/openhuman/people/scorer.rssrc/openhuman/people/store.rssrc/openhuman/people/tests.rssrc/openhuman/people/types.rssrc/openhuman/providers/compatible.rssrc/openhuman/providers/compatible_tests.rstests/agent_harness_public.rstests/composio_ingest_e2e.rstests/fixtures/life_capture/corpus.jsontests/imessage_ingest_e2e.rstests/life_capture_retrieval_eval.rs
| // --- Curated-memory + life-capture runtime bootstrap ------------------ | ||
| // | ||
| // These MUST initialise before `axum::serve` starts accepting connections | ||
| // so the very first RPC can't race the OnceCell — earlier we set them | ||
| // inside the tokio::spawn block below and the server began accepting | ||
| // before they were ready, surfacing as spurious 'not initialised' errors | ||
| // for the first few requests. | ||
| // | ||
| // Loading the config a second time here (the spawn block also calls | ||
| // `load_or_init`) is fine — it caches and is idempotent. | ||
| if let Ok(config) = crate::openhuman::config::Config::load_or_init().await { | ||
| bootstrap_curated_memory(&config.workspace_dir).await; | ||
| bootstrap_life_capture(&config.workspace_dir).await; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Move domain runtime bootstrap out of the JSON-RPC transport.
jsonrpc.rs now hard-codes curated-memory and life-capture initialization details, including storage paths, env-driven embedder setup, and bridge registration. Please move this behind a domain/runtime bootstrap API or registry-level startup hook so the transport stays domain-agnostic.
As per coding guidelines, src/core/{cli.rs,jsonrpc.rs} should expose domain functionality through the controller registry and “do not add domain-specific branches or one-off transport logic in src/core/cli.rs or src/core/jsonrpc.rs”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/jsonrpc.rs` around lines 658 - 670, The JSON-RPC transport currently
inits domain runtime directly by calling
crate::openhuman::config::Config::load_or_init().await then
bootstrap_curated_memory(&config.workspace_dir).await and
bootstrap_life_capture(&config.workspace_dir).await; move that logic out of
jsonrpc.rs and behind a domain/runtime bootstrap API or registry startup hook.
Create or extend the controller/registry interface (e.g., add a
run_startup_hooks or register_runtime_bootstrap method) and have the
curated-memory and life-capture initialization (including env-driven embedder
setup and bridge registration) implemented as a domain module that registers
itself with that API; then replace the direct calls in jsonrpc.rs with a single
call to the registry startup hook (or no-op if none registered), keeping
jsonrpc.rs domain-agnostic and relying on controller/registry to perform
load_or_init and to invoke bootstrap_curated_memory and bootstrap_life_capture.
| let writer = IndexWriter::new(idx); | ||
| writer | ||
| .upsert(&mut items) | ||
| .await | ||
| .map_err(|e| format!("upsert: {e}"))?; | ||
| let canonical_id = items[0].id; | ||
| let replaced = canonical_id != requested_id; | ||
|
|
||
| let mut vecs = embedder | ||
| .embed_batch(&[text.as_str()]) | ||
| .await | ||
| .map_err(|e| format!("embed: {e}"))?; | ||
| let vector = vecs.pop().ok_or("embedder returned no vectors")?; | ||
| writer | ||
| .upsert_vector(&canonical_id, &vector) | ||
| .await | ||
| .map_err(|e| format!("upsert_vector: {e}"))?; |
There was a problem hiding this comment.
Make item upsert and vector replacement failure-atomic.
The item is committed before embedding/vector replacement. If embed_batch or upsert_vector fails, the index can contain a new/updated item with no vector, or updated text with the old vector. Pre-embed and validate dimensions before mutating, then wrap item upsert + vector replace in one IndexWriter transaction/helper.
Minimum mitigation before adding a transaction helper
+ const INDEX_VECTOR_DIM: usize = 1536;
+ if embedder.dim() != INDEX_VECTOR_DIM {
+ return Err(format!(
+ "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
+ embedder.dim()
+ ));
+ }
+
+ let mut vecs = embedder
+ .embed_batch(&[text.as_str()])
+ .await
+ .map_err(|e| format!("embed: {e}"))?;
+ let vector = vecs.pop().ok_or("embedder returned no vectors")?;
+ if vector.len() != INDEX_VECTOR_DIM {
+ return Err(format!(
+ "embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
+ vector.len()
+ ));
+ }
+
let writer = IndexWriter::new(idx);
writer
.upsert(&mut items)
.await
.map_err(|e| format!("upsert: {e}"))?;
let canonical_id = items[0].id;
let replaced = canonical_id != requested_id;
- let mut vecs = embedder
- .embed_batch(&[text.as_str()])
- .await
- .map_err(|e| format!("embed: {e}"))?;
- let vector = vecs.pop().ok_or("embedder returned no vectors")?;
writer
.upsert_vector(&canonical_id, &vector)
.awaitThis still needs a transaction-backed writer API to fully guarantee atomic replacement.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let writer = IndexWriter::new(idx); | |
| writer | |
| .upsert(&mut items) | |
| .await | |
| .map_err(|e| format!("upsert: {e}"))?; | |
| let canonical_id = items[0].id; | |
| let replaced = canonical_id != requested_id; | |
| let mut vecs = embedder | |
| .embed_batch(&[text.as_str()]) | |
| .await | |
| .map_err(|e| format!("embed: {e}"))?; | |
| let vector = vecs.pop().ok_or("embedder returned no vectors")?; | |
| writer | |
| .upsert_vector(&canonical_id, &vector) | |
| .await | |
| .map_err(|e| format!("upsert_vector: {e}"))?; | |
| const INDEX_VECTOR_DIM: usize = 1536; | |
| if embedder.dim() != INDEX_VECTOR_DIM { | |
| return Err(format!( | |
| "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}", | |
| embedder.dim() | |
| )); | |
| } | |
| let mut vecs = embedder | |
| .embed_batch(&[text.as_str()]) | |
| .await | |
| .map_err(|e| format!("embed: {e}"))?; | |
| let vector = vecs.pop().ok_or("embedder returned no vectors")?; | |
| if vector.len() != INDEX_VECTOR_DIM { | |
| return Err(format!( | |
| "embedding length {} does not match index dim {INDEX_VECTOR_DIM}", | |
| vector.len() | |
| )); | |
| } | |
| let writer = IndexWriter::new(idx); | |
| writer | |
| .upsert(&mut items) | |
| .await | |
| .map_err(|e| format!("upsert: {e}"))?; | |
| let canonical_id = items[0].id; | |
| let replaced = canonical_id != requested_id; | |
| writer | |
| .upsert_vector(&canonical_id, &vector) | |
| .await | |
| .map_err(|e| format!("upsert_vector: {e}"))?; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/life_capture/rpc.rs` around lines 154 - 170, You currently
upsert the item (IndexWriter::upsert) before calling embedder.embed_batch and
upsert_vector, which can leave the index inconsistent if embedding/vector
replacement fails; move embedder.embed_batch(&[text.as_str()]) before any
mutation, validate the returned vector dimensions against the index schema, then
perform the item upsert and vector write inside a single atomic operation on
IndexWriter (add or use a transaction/helper such as IndexWriter::transaction or
a new method like IndexWriter::upsert_with_vector that takes (&mut items,
&canonical_id, &vector) and guarantees commit/rollback semantics); ensure you
still derive canonical_id/requested_id as before but only call
writer.upsert/_upsert_vector from inside that transactional helper so failure
does not leave a half-applied item or stale vector.
…bview NotificationSettings
- PersonId inherently defined to_string which shadowed the ToString blanket impl from Display (clippy correctness error, deny-by-default)
- webview_notifications::NotificationSettings Default returned Self { enabled: false }; replaced manual impl with derive
# Conflicts: # src/openhuman/mod.rs
- Drop event title from the 'missing external identifier' log message; event titles are personal data that must not appear in logs. - Propagate store::is_known() and store::upsert_event() errors instead of swallowing them. Failing open caused the same event to be re-emitted on every sync because the cache was never updated. Addresses @coderabbitai on src/openhuman/eventkit/calendar.rs:156 and :217
Gmail message IDs and calendar event IDs are stable external identifiers that create a durable PII trail in logs. Remove them from debug/warn log fields; the source tag retains enough context for correlation. Addresses @coderabbitai on src/openhuman/life_capture/ingest/composio.rs:112
…ardcode HostedEmbedder::dim() was hardcoded to 1536 regardless of which model was configured. Add dim as a constructor parameter so model and vector width stay in sync. At the bootstrap site, resolve dim from OPENHUMAN_EMBEDDINGS_DIM (override), known model names (text-embedding-3-large→3072), or default 1536. Addresses @coderabbitai on src/openhuman/life_capture/embedder.rs:32
…ure state Previously, the item row was committed before the embedding call. A failed embed_batch would leave a committed row with no vector, making it unfindable by search. Reorder to: embed first (no DB side effects) → upsert item row → upsert vector, so a failed embed aborts without touching the index. Addresses @coderabbitai on src/openhuman/life_capture/rpc.rs:170
The handler accepts any JSON value for metadata but the schema declared it as Option<String>, causing schema-driven clients to reject object payloads. Change to Option<Json> to match the actual contract. Addresses @coderabbitai on src/openhuman/life_capture/schemas.rs:192
The created flag in the RPC response was always true when create_if_missing=true because resolve_or_create returned a bare PersonId with no signal about whether a new row was minted or an existing one was found. Thread the insertion boolean out of the resolver so handle_resolve can report the true creation state. Update resolver tests to assert the was_created flag, and update link() and seed_from_address_book() callers to destructure the tuple. Addresses @coderabbitai on src/openhuman/people/rpc.rs:80
Replace manual BEGIN/COMMIT/ROLLBACK with Connection::unchecked_transaction() which auto-rolls back on drop, matching the rusqlite idiomatic pattern and the life_capture migration style. Add entry/per-migration/completion debug log lines as required by coding guidelines (debug logging on new/changed flows). Addresses @coderabbitai on src/openhuman/people/migrations.rs:46
…ath error Both tests mutated OPENHUMAN_WORKSPACE while sharing the CONN OnceLock, causing a race where the second test observed the first test's DB and never exercised its own branch. - Add a test-local Mutex to serialize all tests in this module. - Rewrite get_error_on_bad_path to test create_dir_all directly (rather than get()), because get() fast-paths via OnceLock once initialized by get_is_idempotent. The new version asserts is_err() rather than silently discarding the result. Addresses @coderabbitai on src/openhuman/eventkit/runtime.rs:137
…composio e2e Silently ignoring init_index/init_embedder errors allowed the test to write through a local Arc while JSON-RPC search read a pre-existing global runtime, turning failures into order-dependent timeouts. Propagate errors with .expect() so the test fails immediately with a clear message. Addresses @coderabbitai on tests/composio_ingest_e2e.rs:72
…red_controllers re-exports Domain mod.rs must expose the canonical registry names in addition to the domain-prefixed aliases so the controller registry wiring in all.rs can call them uniformly across domains. Aliases are kept for backward compatibility. Addresses @coderabbitai on src/openhuman/curated_memory/mod.rs:10
…untime Both tests reuse the process-global PersonalIndex via OnceCell, so when tokio runs them in parallel they race on `total_items`. Add a static tokio::sync::Mutex they both acquire for the duration of the test, and switch the absolute count assertion in the idempotency test to a delta against a baseline captured under the lock. Same pattern already used elsewhere (eventkit env-var lock, composio fail-fast guard).
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (3)
src/openhuman/agent/debug/mod.rs (1)
395-395: Preserve prompt-dump parity by threading a real curated snapshotHardcoding
curated_snapshot: Nonecan make dump output diverge from live first-turn prompt bytes when curated memory is active. Given this module’s byte-identical contract, use the same best-effort snapshot strategy here and passas_deref()intoPromptContext.♻️ Proposed adjustment
+ let curated_snapshot = match crate::openhuman::curated_memory::runtime::get() { + Ok(rt) => match crate::openhuman::curated_memory::snapshot_pair(&rt.memory, &rt.user).await + { + Ok(snap) => Some(std::sync::Arc::new(snap)), + Err(_) => None, + }, + Err(_) => None, + }; + let ctx = PromptContext { workspace_dir: agent.workspace_dir(), model_name: &model_name, @@ connected_identities_md: crate::openhuman::agent::prompts::render_connected_identities(), include_profile: !definition.omit_profile, include_memory_md: !definition.omit_memory_md, - curated_snapshot: None, + curated_snapshot: curated_snapshot.as_deref(), };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/debug/mod.rs` at line 395, The code currently hardcodes curated_snapshot: None which breaks byte-identical dumps when curated memory is enabled; replace that None by producing and threading the same best-effort curated snapshot used elsewhere and pass its Option<&[u8]> into PromptContext via curated_snapshot.as_deref(). Locate where curated_snapshot is set and where PromptContext is constructed, compute the snapshot using the existing snapshot strategy (same helper/logic used in the live path), store it in curated_snapshot (Option<Vec<u8>>), and call curated_snapshot.as_deref() when creating the PromptContext.src/openhuman/people/tests.rs (1)
60-68: Drop the unusedUtc::now()call.
let _now = Utc::now();is unrelated to UUID round-tripping and adds nothing to the test. Either drop it or move it into a separate test that actually exercises timestamp-related behavior.♻️ Proposed cleanup
#[test] fn person_id_uuid_format() { let id = PersonId::new(); // Round-trips through a string. let s = id.to_string(); let parsed: uuid::Uuid = s.parse().unwrap(); assert_eq!(parsed, id.0); - let _now = Utc::now(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/tests.rs` around lines 60 - 68, The test function person_id_uuid_format contains an unused call to Utc::now() that should be removed; update the test by deleting the line let _now = Utc::now(); (or move it into a new timestamp-focused test if you need to exercise time behavior) so the test only verifies PersonId::new(), id.to_string(), and parsing back to uuid::Uuid (referencing PersonId and the person_id_uuid_format test).src/openhuman/people/rpc.rs (1)
16-19: Add entry/exit logging to the RPC handlers.Per coding guidelines, Rust code should add
tracing/logatdebug/tracefor entry/exit points, branch decisions, and external calls with grep-friendly[domain]prefixes.handle_refresh_address_bookalready does this;handle_list,handle_resolve, andhandle_scoreare silent. They each issue DB calls (store.list,interactions_for,lookup,insert_person) which are useful to trace byperson_id/ handle kind when diagnosing production issues.♻️ Suggested logging
pub async fn handle_list(store: &PeopleStore, limit: usize) -> Result<RpcOutcome<Value>, String> { let limit = limit.clamp(1, 500); + tracing::debug!("[people::rpc] handle_list entry limit={limit}"); let people = store.list().await.map_err(|e| format!("list: {e}"))?; let now = Utc::now(); @@ + tracing::debug!( + "[people::rpc] handle_list returning {} people", + people_json.len() + ); Ok(RpcOutcome::new(json!({ "people": people_json }), vec![])) } @@ ) -> Result<RpcOutcome<Value>, String> { + tracing::debug!( + "[people::rpc] handle_resolve entry kind={} create_if_missing={create_if_missing}", + handle.as_key().0 + ); let resolver = HandleResolver::new(store); @@ ) -> Result<RpcOutcome<Value>, String> { + tracing::debug!("[people::rpc] handle_score entry person_id={person_id}"); let interactions = storeAs per coding guidelines: "Use
log/tracingatdebug/tracelevel in Rust; include stable grep-friendly prefixes ([domain],[rpc],[ui-flow]) and correlation fields (request IDs, method names, entity IDs)".Also applies to: 62-67, 121-124
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/people/rpc.rs` around lines 16 - 19, Add entry/exit and key-event tracing to the RPC handlers handle_list, handle_resolve, and handle_score similar to handle_refresh_address_book: emit debug/trace logs with grep-friendly prefixes like "[rpc]" or "[people]" at function entry and exit, log branch decisions and external calls (store.list, interactions_for, lookup, insert_person) including correlation fields (request id or method name) and entity identifiers (person_id, handle/kind) so the DB/lookup calls and outcomes are visible for diagnostics; follow existing tracing style used in handle_refresh_address_book for level and message format.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/jsonrpc.rs`:
- Around line 865-868: The doc comment "Initializes long-lived socket/event-bus
infrastructure." is now incorrectly attached to bootstrap_curated_memory; move
that leading doc comment (and any subsequent related lines) from above async fn
bootstrap_curated_memory to directly above the async fn bootstrap_skill_runtime
so bootstrap_skill_runtime regains its intended documentation and
bootstrap_curated_memory has only its correct docs or none.
In `@src/openhuman/agent/harness/session/types.rs`:
- Around line 126-134: The doc comment for the Session-scoped `curated_snapshot`
is out of sync with implementation: it states the snapshot is captured "when the
agent is built" but the code actually initializes `curated_snapshot` to None in
the builder and captures it on first turn via `ensure_curated_snapshot`; update
the rustdoc to describe this lifecycle (None in builder, captured lazily on
first turn by `ensure_curated_snapshot`, forwarded into
`PromptContext::curated_snapshot` each turn so runtime writes apply to
subsequent sessions) and mention that it is None when the curated-memory runtime
(`crate::openhuman::curated_memory::runtime`) isn't initialized.
In `@src/openhuman/life_capture/ingest/composio.rs`:
- Around line 415-422: The helper fresh_bridge() currently calls
std::mem::forget(dir) which leaks the TempDir; change fresh_bridge() to return
ownership of the TempDir (e.g., return (LifeCaptureComposioBridge,
tempfile::TempDir)) or hold the TempDir inside the LifeCaptureComposioBridge
wrapper so it lives until tests drop it; update all test call sites that do let
bridge = fresh_bridge().await; to destructure the tuple (let (bridge, _dir) =
fresh_bridge().await;) or accept the wrapper that owns the TempDir so the
TempDir is dropped at test end, and ensure PersonalIndex::open(&db).await and
the Arc<dyn Embedder> (TestEmbedder) usage remain unchanged.
- Around line 601-620: The test bridge_has_stable_name_and_domain currently
contains a tautological assertion (assert_eq!(name, name)) and never calls
LifeCaptureComposioBridge::name() or domains(), nor uses the Stub Embedder;
replace the bogus assertion by constructing a real bridge (use the existing
fresh_bridge helper or the constructor that returns a PersonalIndex handle) and
call LifeCaptureComposioBridge::name() and LifeCaptureComposioBridge::domains()
to assert expected values, or if you prefer to keep it lightweight convert the
test to #[tokio::test], call fresh_bridge(...) to get a bridge instance, then
assert_eq!(bridge.name(), "life_capture::composio_bridge") and
assert!(bridge.domains().contains(...)) and remove the unused Stub/Embedder impl
if you choose the fresh_bridge approach.
In `@src/openhuman/life_capture/rpc.rs`:
- Around line 76-94: The code hardcodes INDEX_VECTOR_DIM = 1536 in handle_search
causing mismatches with configurable embedders; expose the real index vector
dimension from the PersonalIndex (or lift INDEX_VECTOR_DIM into a single shared
constant used by bootstrap_life_capture, PersonalIndex, handle_search and
handle_ingest), then update handle_search to compare embedder.dim() against that
shared/exposed dimension and validate query_vec.len() the same way; also add the
same dimension checks to handle_ingest before calling upsert_vector (validate
embedder.dim() and the produced vector length and return a clear RPC error on
mismatch) and ensure bootstrap_life_capture uses the same shared dimension when
constructing/configuring the embedder.
In `@src/openhuman/people/address_book.rs`:
- Around line 129-169: The CNContactStore FFI calls (e.g., request_access,
enumerateContactsWithFetchRequest_error_usingBlock inside
imp::fetch_via_cn_contact_store) block a Tokio worker; wrap these synchronous
calls in tokio::task::spawn_blocking. Either (A) move the spawn_blocking into
the SystemContactsSource implementation so ContactsSource::fetch_contacts calls
imp::fetch_via_cn_contact_store inside spawn_blocking and returns the awaited
result, or (B) keep fetch_contacts synchronous but change callers (e.g.,
HandleResolver::seed_from_address_book / read_with) to call
tokio::task::spawn_blocking(|| source.fetch_contacts()) and propagate errors
(map join errors to AddressBookError::Other). Ensure request_access remains
inside the blocking closure so rx.recv() never runs on a Tokio worker thread.
In `@src/openhuman/people/resolver.rs`:
- Around line 41-78: resolve_or_create has a TOCTOU: it calls
PeopleStore::lookup then creates a new PersonId and calls
PeopleStore::insert_person, which can race and produce orphan Person rows when
two callers concurrently resolve the same Handle; fix by moving the
read-or-create into an atomic transactional operation inside the store
(preferred): add a new PeopleStore method like lookup_or_insert_person(handle,
person_factory) that within a single transaction SELECTs from handle_aliases
and, if missing, INSERTs into people and INSERTs the handle_aliases (or returns
the existing person_id), then update resolve_or_create to call
lookup_or_insert_person instead of separate lookup + insert_person;
alternatively (if you must keep insert_person), after insert_person re-query
lookup(canonical) and if the returned PersonId != the one you inserted, delete
the orphan people row and return the winner's id.
In `@src/openhuman/people/rpc.rs`:
- Around line 16-58: handle_list currently issues an
interactions_for(p.id).await inside a sequential loop which causes an N+1 query
pattern and serial spawn_blocking calls; fix by either (A) adding an aggregate
method on PeopleStore (suggest names: list_with_scores(limit) or
interaction_summaries()) that runs a single SQL query to return interaction
aggregates needed by score() and then compute score() in-memory for each person
before sorting and truncating by limit, or (B) minimally parallelizing the
per-person fetches by collecting futures for store.interactions_for(p.id) and
awaiting them with futures::future::join_all so the DB calls run concurrently,
and ensure you apply limit before expensive work by sorting/truncating after
computing scores. Use the function name handle_list and the method
interactions_for as anchors when making the change.
---
Nitpick comments:
In `@src/openhuman/agent/debug/mod.rs`:
- Line 395: The code currently hardcodes curated_snapshot: None which breaks
byte-identical dumps when curated memory is enabled; replace that None by
producing and threading the same best-effort curated snapshot used elsewhere and
pass its Option<&[u8]> into PromptContext via curated_snapshot.as_deref().
Locate where curated_snapshot is set and where PromptContext is constructed,
compute the snapshot using the existing snapshot strategy (same helper/logic
used in the live path), store it in curated_snapshot (Option<Vec<u8>>), and call
curated_snapshot.as_deref() when creating the PromptContext.
In `@src/openhuman/people/rpc.rs`:
- Around line 16-19: Add entry/exit and key-event tracing to the RPC handlers
handle_list, handle_resolve, and handle_score similar to
handle_refresh_address_book: emit debug/trace logs with grep-friendly prefixes
like "[rpc]" or "[people]" at function entry and exit, log branch decisions and
external calls (store.list, interactions_for, lookup, insert_person) including
correlation fields (request id or method name) and entity identifiers
(person_id, handle/kind) so the DB/lookup calls and outcomes are visible for
diagnostics; follow existing tracing style used in handle_refresh_address_book
for level and message format.
In `@src/openhuman/people/tests.rs`:
- Around line 60-68: The test function person_id_uuid_format contains an unused
call to Utc::now() that should be removed; update the test by deleting the line
let _now = Utc::now(); (or move it into a new timestamp-focused test if you need
to exercise time behavior) so the test only verifies PersonId::new(),
id.to_string(), and parsing back to uuid::Uuid (referencing PersonId and the
person_id_uuid_format test).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53db1672-b237-409a-922b-375440e06437
📒 Files selected for processing (29)
src/core/all.rssrc/core/jsonrpc.rssrc/openhuman/agent/debug/mod.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/runtime.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/types.rssrc/openhuman/agent/harness/subagent_runner/ops.rssrc/openhuman/curated_memory/mod.rssrc/openhuman/eventkit/calendar.rssrc/openhuman/eventkit/runtime.rssrc/openhuman/learning/prompt_sections.rssrc/openhuman/life_capture/embedder.rssrc/openhuman/life_capture/ingest/composio.rssrc/openhuman/life_capture/rpc.rssrc/openhuman/life_capture/schemas.rssrc/openhuman/mod.rssrc/openhuman/people/address_book.rssrc/openhuman/people/migrations.rssrc/openhuman/people/resolver.rssrc/openhuman/people/rpc.rssrc/openhuman/people/schemas.rssrc/openhuman/people/scorer.rssrc/openhuman/people/tests.rssrc/openhuman/people/types.rssrc/openhuman/webview_notifications/types.rstests/agent_harness_public.rstests/composio_ingest_e2e.rstests/imessage_ingest_e2e.rs
✅ Files skipped from review due to trivial changes (2)
- src/openhuman/learning/prompt_sections.rs
- src/openhuman/curated_memory/mod.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- tests/agent_harness_public.rs
- src/openhuman/agent/harness/session/runtime.rs
- src/openhuman/mod.rs
- src/core/all.rs
- src/openhuman/eventkit/calendar.rs
- src/openhuman/eventkit/runtime.rs
| /// Initializes long-lived socket/event-bus infrastructure. | ||
| /// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the | ||
| /// curated-memory runtime singleton. Idempotent — second-init is a no-op. | ||
| async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) { |
There was a problem hiding this comment.
Stale doc comment now attached to bootstrap_curated_memory.
Line 865's /// Initializes long-lived socket/event-bus infrastructure. was originally the rustdoc for bootstrap_skill_runtime, but inserting the two new helpers above it leaves this line as a leading doc comment on bootstrap_curated_memory (the next item Rust sees), so the rendered docs/intellisense for bootstrap_curated_memory will start with an unrelated sentence about socket/event-bus, and bootstrap_skill_runtime is left undocumented.
📝 Proposed fix — move the doc comment back to `bootstrap_skill_runtime`
-/// Initializes long-lived socket/event-bus infrastructure.
/// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the
/// curated-memory runtime singleton. Idempotent — second-init is a no-op.
async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) {…and add it just above bootstrap_skill_runtime (around line 984):
+/// Initializes long-lived socket/event-bus infrastructure.
pub async fn bootstrap_skill_runtime() {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Initializes long-lived socket/event-bus infrastructure. | |
| /// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the | |
| /// curated-memory runtime singleton. Idempotent — second-init is a no-op. | |
| async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) { | |
| /// Open MEMORY.md / USER.md at `<workspace>/memories/` and register the | |
| /// curated-memory runtime singleton. Idempotent — second-init is a no-op. | |
| async fn bootstrap_curated_memory(workspace_dir: &std::path::Path) { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/jsonrpc.rs` around lines 865 - 868, The doc comment "Initializes
long-lived socket/event-bus infrastructure." is now incorrectly attached to
bootstrap_curated_memory; move that leading doc comment (and any subsequent
related lines) from above async fn bootstrap_curated_memory to directly above
the async fn bootstrap_skill_runtime so bootstrap_skill_runtime regains its
intended documentation and bootstrap_curated_memory has only its correct docs or
none.
| /// Session-scoped snapshot of the curated-memory pair | ||
| /// (`MEMORY.md` + `USER.md`) read once from the | ||
| /// [`crate::openhuman::curated_memory::runtime`] singleton when the | ||
| /// agent is built. Forwarded into [`PromptContext::curated_snapshot`] | ||
| /// every turn so the rendered prompt bytes stay frozen for the | ||
| /// session's KV-cache prefix, while runtime writes via | ||
| /// `curated_memory.add/replace/remove` land on the NEXT session. | ||
| /// `None` when the curated-memory runtime isn't initialised (unit | ||
| /// tests, older embeds) — the renderer falls back to workspace files. |
There was a problem hiding this comment.
Doc timing mismatch for curated_snapshot lifecycle
Line 129 says the snapshot is read when the agent is built, but current behavior captures it on first turn (ensure_curated_snapshot), with builder initialization at None. Please update this doc block to reflect the real lifecycle.
📝 Suggested doc correction
- /// [`crate::openhuman::curated_memory::runtime`] singleton when the
- /// agent is built. Forwarded into [`PromptContext::curated_snapshot`]
+ /// [`crate::openhuman::curated_memory::runtime`] singleton on the
+ /// first turn (`ensure_curated_snapshot`). Forwarded into [`PromptContext::curated_snapshot`]As per coding guidelines, "When adding or changing domain behavior, ship matching documentation (rustdoc, code comments, or updates to AGENTS.md / architecture docs) before handoff."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Session-scoped snapshot of the curated-memory pair | |
| /// (`MEMORY.md` + `USER.md`) read once from the | |
| /// [`crate::openhuman::curated_memory::runtime`] singleton when the | |
| /// agent is built. Forwarded into [`PromptContext::curated_snapshot`] | |
| /// every turn so the rendered prompt bytes stay frozen for the | |
| /// session's KV-cache prefix, while runtime writes via | |
| /// `curated_memory.add/replace/remove` land on the NEXT session. | |
| /// `None` when the curated-memory runtime isn't initialised (unit | |
| /// tests, older embeds) — the renderer falls back to workspace files. | |
| /// Session-scoped snapshot of the curated-memory pair | |
| /// (`MEMORY.md` + `USER.md`) read once from the | |
| /// [`crate::openhuman::curated_memory::runtime`] singleton on the | |
| /// first turn (`ensure_curated_snapshot`). Forwarded into [`PromptContext::curated_snapshot`] | |
| /// every turn so the rendered prompt bytes stay frozen for the | |
| /// session's KV-cache prefix, while runtime writes via | |
| /// `curated_memory.add/replace/remove` land on the NEXT session. | |
| /// `None` when the curated-memory runtime isn't initialised (unit | |
| /// tests, older embeds) — the renderer falls back to workspace files. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/harness/session/types.rs` around lines 126 - 134, The doc
comment for the Session-scoped `curated_snapshot` is out of sync with
implementation: it states the snapshot is captured "when the agent is built" but
the code actually initializes `curated_snapshot` to None in the builder and
captures it on first turn via `ensure_curated_snapshot`; update the rustdoc to
describe this lifecycle (None in builder, captured lazily on first turn by
`ensure_curated_snapshot`, forwarded into `PromptContext::curated_snapshot` each
turn so runtime writes apply to subsequent sessions) and mention that it is None
when the curated-memory runtime (`crate::openhuman::curated_memory::runtime`)
isn't initialized.
| async fn fresh_bridge() -> LifeCaptureComposioBridge { | ||
| let dir = tempdir().expect("tempdir"); | ||
| let db = dir.path().join("lc.db"); | ||
| std::mem::forget(dir); | ||
| let idx = Arc::new(PersonalIndex::open(&db).await.expect("open")); | ||
| let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder); | ||
| LifeCaptureComposioBridge::new(idx, emb) | ||
| } |
There was a problem hiding this comment.
std::mem::forget(dir) leaks the temp directory on every test run.
tempfile::TempDir cleans up on drop; mem::forget permanently leaks the directory at dir.path() to disk. With 4 tests using fresh_bridge, every cargo test run leaves 4 stray directories under the system temp root. Return the TempDir from fresh_bridge (or store it in the bridge wrapper) so it is dropped at end of test instead.
🔧 Proposed fix
- async fn fresh_bridge() -> LifeCaptureComposioBridge {
- let dir = tempdir().expect("tempdir");
- let db = dir.path().join("lc.db");
- std::mem::forget(dir);
- let idx = Arc::new(PersonalIndex::open(&db).await.expect("open"));
- let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder);
- LifeCaptureComposioBridge::new(idx, emb)
- }
+ async fn fresh_bridge() -> (LifeCaptureComposioBridge, tempfile::TempDir) {
+ let dir = tempdir().expect("tempdir");
+ let db = dir.path().join("lc.db");
+ let idx = Arc::new(PersonalIndex::open(&db).await.expect("open"));
+ let emb: Arc<dyn Embedder> = Arc::new(TestEmbedder);
+ (LifeCaptureComposioBridge::new(idx, emb), dir)
+ }Then update each let bridge = fresh_bridge().await; to let (bridge, _dir) = fresh_bridge().await;.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/life_capture/ingest/composio.rs` around lines 415 - 422, The
helper fresh_bridge() currently calls std::mem::forget(dir) which leaks the
TempDir; change fresh_bridge() to return ownership of the TempDir (e.g., return
(LifeCaptureComposioBridge, tempfile::TempDir)) or hold the TempDir inside the
LifeCaptureComposioBridge wrapper so it lives until tests drop it; update all
test call sites that do let bridge = fresh_bridge().await; to destructure the
tuple (let (bridge, _dir) = fresh_bridge().await;) or accept the wrapper that
owns the TempDir so the TempDir is dropped at test end, and ensure
PersonalIndex::open(&db).await and the Arc<dyn Embedder> (TestEmbedder) usage
remain unchanged.
| #[test] | ||
| fn bridge_has_stable_name_and_domain() { | ||
| // Construct with stubs just to exercise the trait methods. | ||
| struct Stub; | ||
| #[async_trait] | ||
| impl Embedder for Stub { | ||
| async fn embed_batch(&self, _t: &[&str]) -> anyhow::Result<Vec<Vec<f32>>> { | ||
| Ok(vec![]) | ||
| } | ||
| fn dim(&self) -> usize { | ||
| 1536 | ||
| } | ||
| } | ||
| // Lightweight non-init path — we don't actually need a live index | ||
| // for the trait introspection bits, but we do need a real | ||
| // PersonalIndex handle. Reuse the fresh_bridge pattern via a | ||
| // smoke check in the runtime test. | ||
| let name = "life_capture::composio_bridge"; | ||
| assert_eq!(name, name); // shape guard for the name constant | ||
| } |
There was a problem hiding this comment.
Tautological assertion provides no coverage.
assert_eq!(name, name) at line 619 is always true and does not exercise LifeCaptureComposioBridge::name() or domains(). Either construct a real bridge (e.g. via fresh_bridge) and assert against the trait methods, or delete the test — the surrounding Stub Embedder impl is also unused.
🔧 Proposed fix
#[test]
fn bridge_has_stable_name_and_domain() {
- // Construct with stubs just to exercise the trait methods.
- struct Stub;
- #[async_trait]
- impl Embedder for Stub {
- async fn embed_batch(&self, _t: &[&str]) -> anyhow::Result<Vec<Vec<f32>>> {
- Ok(vec![])
- }
- fn dim(&self) -> usize {
- 1536
- }
- }
- // Lightweight non-init path — we don't actually need a live index
- // for the trait introspection bits, but we do need a real
- // PersonalIndex handle. Reuse the fresh_bridge pattern via a
- // smoke check in the runtime test.
- let name = "life_capture::composio_bridge";
- assert_eq!(name, name); // shape guard for the name constant
+ let bridge = fresh_bridge().await;
+ assert_eq!(bridge.name(), "life_capture::composio_bridge");
+ assert_eq!(bridge.domains(), Some(&["composio"][..]));
}(Make the test #[tokio::test] if you switch to fresh_bridge.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/life_capture/ingest/composio.rs` around lines 601 - 620, The
test bridge_has_stable_name_and_domain currently contains a tautological
assertion (assert_eq!(name, name)) and never calls
LifeCaptureComposioBridge::name() or domains(), nor uses the Stub Embedder;
replace the bogus assertion by constructing a real bridge (use the existing
fresh_bridge helper or the constructor that returns a PersonalIndex handle) and
call LifeCaptureComposioBridge::name() and LifeCaptureComposioBridge::domains()
to assert expected values, or if you prefer to keep it lightweight convert the
test to #[tokio::test], call fresh_bridge(...) to get a bridge instance, then
assert_eq!(bridge.name(), "life_capture::composio_bridge") and
assert!(bridge.domains().contains(...)) and remove the unused Stub/Embedder impl
if you choose the fresh_bridge approach.
| const INDEX_VECTOR_DIM: usize = 1536; | ||
| if embedder.dim() != INDEX_VECTOR_DIM { | ||
| return Err(format!( | ||
| "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}", | ||
| embedder.dim() | ||
| )); | ||
| } | ||
|
|
||
| let mut vecs = embedder | ||
| .embed_batch(&[text.as_str()]) | ||
| .await | ||
| .map_err(|e| format!("embed: {e}"))?; | ||
| let query_vec = vecs.pop().ok_or("embedder returned no vectors")?; | ||
| if query_vec.len() != INDEX_VECTOR_DIM { | ||
| return Err(format!( | ||
| "embedding length {} does not match index dim {INDEX_VECTOR_DIM}", | ||
| query_vec.len() | ||
| )); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether PersonalIndex's sqlite-vec column dim is fixed at 1536, and whether
# any path other than rpc.rs hardcodes a vector dim.
rg -nP --type=rust -C3 '\b(1536|3072|vector_dim|VECTOR_DIM|item_vectors)\b' -g '!**/tests/**'
ast-grep --pattern 'CREATE VIRTUAL TABLE $$$'Repository: tinyhumansai/openhuman
Length of output: 19658
🏁 Script executed:
# Check handle_ingest function in rpc.rs around lines 123-178
sed -n '120,185p' src/openhuman/life_capture/rpc.rsRepository: tinyhumansai/openhuman
Length of output: 2271
🏁 Script executed:
# Search for CREATE VIRTUAL TABLE and schema definitions
rg -n 'CREATE VIRTUAL TABLE|CREATE TABLE' --type=rust src/openhuman/life_capture/Repository: tinyhumansai/openhuman
Length of output: 173
🏁 Script executed:
# Check if PersonalIndex has a vector_dim or similar method
rg -n 'fn.*dim|pub fn' src/openhuman/life_capture/index.rs | head -20Repository: tinyhumansai/openhuman
Length of output: 166
🏁 Script executed:
# Look for the actual table schema/migration that defines item_vectors
find src/openhuman/life_capture -name "*.rs" -exec grep -l "item_vectors\|CREATE" {} \;Repository: tinyhumansai/openhuman
Length of output: 143
🏁 Script executed:
# Check migrations.rs more thoroughly for table definitions
cat src/openhuman/life_capture/migrations.rsRepository: tinyhumansai/openhuman
Length of output: 4456
🏁 Script executed:
# Look for PersonalIndex definition and any dimension-related methods
grep -n "impl PersonalIndex\|fn.*dim\|pub fn\|struct PersonalIndex" src/openhuman/life_capture/index.rs | head -30Repository: tinyhumansai/openhuman
Length of output: 220
🏁 Script executed:
# Check the migration SQL files - specifically 0002_vec.sql for the item_vectors schema
find src/openhuman/life_capture/migrations -name "*.sql" -exec cat {} \;Repository: tinyhumansai/openhuman
Length of output: 2588
🏁 Script executed:
# Also check if there are any other methods in PersonalIndex that might expose vector_dim
sed -n '61,200p' src/openhuman/life_capture/index.rs | grep -n "pub fn\|fn "Repository: tinyhumansai/openhuman
Length of output: 269
Hardcoded INDEX_VECTOR_DIM = 1536 is asymmetric and tightly coupled to the index schema.
Two concerns:
-
Index dim is hardcoded in the schema but embedder dim is configurable. Per
src/core/jsonrpc.rs:944-956, bootstrap readsOPENHUMAN_EMBEDDINGS_MODELandOPENHUMAN_EMBEDDINGS_DIMto construct the embedder with per-model dimensions (e.g., 3072 fortext-embedding-3-large). With that configuration,handle_searchwill hard-fail with "embedder dim 3072 does not match index dim 1536"—a correct rejection, since theitem_vectorsschema (migration0002_vec.sql) is permanently fixed atfloat[1536]. Either drop the env-driven dim configurability inbootstrap_life_capture, or havePersonalIndexexpose its actual schema dim and check against that instead of a hardcoded constant. -
handle_ingestlacks the dimension checks thathandle_searchenforces.handle_searchvalidatesembedder.dim() == INDEX_VECTOR_DIMandvector.len() == INDEX_VECTOR_DIM(lines 76–87), buthandle_ingestdoes not—it embeds and writes the vector unconditionally (lines 149–178). If a misconfigured embedder produces 3072-dim vectors, ingest will pass them toupsert_vector, and sqlite-vec will surface a low-level error rather than the clean RPC message search produces.
🔧 Suggested change for `handle_ingest`
) -> Result<RpcOutcome<Value>, String> {
if external_id.trim().is_empty() {
return Err("external_id must not be empty".into());
}
if text.trim().is_empty() {
return Err("text must not be empty".into());
}
let ts = DateTime::<Utc>::from_timestamp(ts, 0)
.ok_or_else(|| format!("invalid ts (out of range): {ts}"))?;
+ const INDEX_VECTOR_DIM: usize = 1536;
+ if embedder.dim() != INDEX_VECTOR_DIM {
+ return Err(format!(
+ "embedder dim {} does not match index dim {INDEX_VECTOR_DIM}",
+ embedder.dim()
+ ));
+ }
+
let mut vecs = embedder
.embed_batch(&[text.as_str()])
.await
.map_err(|e| format!("embed: {e}"))?;
let vector = vecs.pop().ok_or("embedder returned no vectors")?;
+ if vector.len() != INDEX_VECTOR_DIM {
+ return Err(format!(
+ "embedding length {} does not match index dim {INDEX_VECTOR_DIM}",
+ vector.len()
+ ));
+ }Better still, lift INDEX_VECTOR_DIM to a single shared constant (or expose it via PersonalIndex) so search, ingest, and bootstrap cannot drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/life_capture/rpc.rs` around lines 76 - 94, The code hardcodes
INDEX_VECTOR_DIM = 1536 in handle_search causing mismatches with configurable
embedders; expose the real index vector dimension from the PersonalIndex (or
lift INDEX_VECTOR_DIM into a single shared constant used by
bootstrap_life_capture, PersonalIndex, handle_search and handle_ingest), then
update handle_search to compare embedder.dim() against that shared/exposed
dimension and validate query_vec.len() the same way; also add the same dimension
checks to handle_ingest before calling upsert_vector (validate embedder.dim()
and the produced vector length and return a clear RPC error on mismatch) and
ensure bootstrap_life_capture uses the same shared dimension when
constructing/configuring the embedder.
| fn request_access(store: &CNContactStore) -> Result<(), AddressBookError> { | ||
| unsafe { | ||
| let status = CNContactStore::authorizationStatusForEntityType(CNEntityType::Contacts); | ||
| match status { | ||
| CNAuthorizationStatus::Authorized | CNAuthorizationStatus::Limited => { | ||
| tracing::debug!("[people::address_book] contacts access already authorized"); | ||
| return Ok(()); | ||
| } | ||
| CNAuthorizationStatus::Denied | CNAuthorizationStatus::Restricted => { | ||
| return Err(AddressBookError::PermissionDenied); | ||
| } | ||
| _ => { | ||
| tracing::debug!( | ||
| "[people::address_book] requesting contacts access (status={status:?})" | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| let (tx, rx) = std::sync::mpsc::channel::<Result<(), AddressBookError>>(); | ||
| let tx = Arc::new(Mutex::new(Some(tx))); | ||
| let tx_clone = Arc::clone(&tx); | ||
|
|
||
| let block = RcBlock::new(move |granted: Bool, _error: *mut NSError| { | ||
| let mut slot = tx_clone.lock().unwrap(); | ||
| if let Some(sender) = slot.take() { | ||
| let result = if granted.as_bool() { | ||
| Ok(()) | ||
| } else { | ||
| Err(AddressBookError::PermissionDenied) | ||
| }; | ||
| let _ = sender.send(result); | ||
| } | ||
| }); | ||
|
|
||
| store.requestAccessForEntityType_completionHandler(CNEntityType::Contacts, &*block); | ||
|
|
||
| rx.recv().map_err(|_| { | ||
| AddressBookError::Other("contacts permission callback never fired".into()) | ||
| })? | ||
| } | ||
| } |
There was a problem hiding this comment.
Synchronous blocking on the Tokio worker — wrap CNContactStore calls in spawn_blocking.
request_access calls rx.recv() on a std::sync::mpsc channel, which is a blocking, non-async wait. It can sit there for many seconds (or indefinitely) while the macOS TCC dialog is up. enumerateContactsWithFetchRequest_error_usingBlock is also a synchronous blocking call. Both run on whatever thread invokes fetch_via_cn_contact_store.
Because HandleResolver::seed_from_address_book calls address_book::read_with(source)? directly (no spawn_blocking) and read_with calls source.fetch_contacts() synchronously, the entire CNContactStore round‑trip executes on a Tokio worker thread inside the handle_refresh_address_book future. With the default multi-thread runtime that means one worker is parked until the user dismisses the TCC dialog; with current_thread it would deadlock the entire server.
Best fix is to push the FFI work onto a blocking thread:
impl ContactsSource for SystemContactsSource {
fn fetch_contacts(&self) -> Result<Vec<AddressBookContact>, AddressBookError> {
// Block this thread; callers must invoke from spawn_blocking or a sync context.
imp::fetch_via_cn_contact_store()
}
}…and update HandleResolver::seed_from_address_book (or add an async variant on ContactsSource) to wrap the call:
let source = source; // by ref isn't Send; restructure as needed
let contacts = tokio::task::spawn_blocking(move || source.fetch_contacts())
.await
.map_err(|e| AddressBookError::Other(format!("join: {e}")))??;Either move the spawn_blocking into read_with/fetch_contacts, or make ContactsSource::fetch_contacts async and have SystemContactsSource do the spawn_blocking internally.
Also applies to: 171-269
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/people/address_book.rs` around lines 129 - 169, The
CNContactStore FFI calls (e.g., request_access,
enumerateContactsWithFetchRequest_error_usingBlock inside
imp::fetch_via_cn_contact_store) block a Tokio worker; wrap these synchronous
calls in tokio::task::spawn_blocking. Either (A) move the spawn_blocking into
the SystemContactsSource implementation so ContactsSource::fetch_contacts calls
imp::fetch_via_cn_contact_store inside spawn_blocking and returns the awaited
result, or (B) keep fetch_contacts synchronous but change callers (e.g.,
HandleResolver::seed_from_address_book / read_with) to call
tokio::task::spawn_blocking(|| source.fetch_contacts()) and propagate errors
(map join errors to AddressBookError::Other). Ensure request_access remains
inside the blocking closure so rx.recv() never runs on a Tokio worker thread.
| pub async fn resolve_or_create(&self, handle: &Handle) -> Result<(PersonId, bool), String> { | ||
| let canonical = handle.canonicalize(); | ||
| if let Some(id) = self | ||
| .store | ||
| .lookup(&canonical) | ||
| .await | ||
| .map_err(|e| format!("lookup: {e}"))? | ||
| { | ||
| return Ok((id, false)); | ||
| } | ||
| let id = PersonId::new(); | ||
| let (display_name, primary_email, primary_phone) = match &canonical { | ||
| Handle::DisplayName(s) => (Some(s.clone()), None, None), | ||
| Handle::Email(s) => (None, Some(s.clone()), None), | ||
| Handle::IMessage(s) => { | ||
| if s.contains('@') { | ||
| (None, Some(s.clone()), None) | ||
| } else { | ||
| (None, None, Some(s.clone())) | ||
| } | ||
| } | ||
| }; | ||
| let now = Utc::now(); | ||
| let person = Person { | ||
| id, | ||
| display_name, | ||
| primary_email, | ||
| primary_phone, | ||
| handles: vec![canonical.clone()], | ||
| created_at: now, | ||
| updated_at: now, | ||
| }; | ||
| self.store | ||
| .insert_person(&person, &[canonical]) | ||
| .await | ||
| .map_err(|e| format!("insert_person: {e}"))?; | ||
| Ok((id, true)) | ||
| } |
There was a problem hiding this comment.
TOCTOU between lookup and insert_person can produce orphan Person rows.
resolve_or_create does a lookup, then mints a fresh PersonId, then calls insert_person (which inserts into people and INSERT OR IGNORE into handle_aliases). If two callers resolve the same canonical handle concurrently, both pass lookup, both generate distinct UUIDs, and both succeed in inserting their Person row. The first to commit wins the handle_aliases(kind, value) row; the second's alias is silently ignored, leaving its Person row orphaned (no handle pointing to it) — and both calls return was_created = true.
Practically, the seed_from_address_book path is sequential per call so this is unlikely there, but handle_resolve is a public RPC and concurrent inbound requests for the same handle (e.g., the iMessage scanner from PR #818 + a UI lookup) can race.
Two ways to make this atomic without redesigning the API:
- Push the read-or-create into a single transaction inside
PeopleStore, e.g.,lookup_or_insert_person(handle, person_factory)that doesSELECT person_id FROM handle_aliases ...andINSERT INTO people ...; INSERT INTO handle_aliases ...inside onetransaction(). - Keep
insert_personas-is, but after it returns, re-querylookup(canonical)and, if the returnedPersonIddiffers from the one we just minted, delete our orphanpeoplerow (or never insert it and re-use the winner's id).
Option 1 is cleaner and avoids the cleanup path entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/people/resolver.rs` around lines 41 - 78, resolve_or_create has
a TOCTOU: it calls PeopleStore::lookup then creates a new PersonId and calls
PeopleStore::insert_person, which can race and produce orphan Person rows when
two callers concurrently resolve the same Handle; fix by moving the
read-or-create into an atomic transactional operation inside the store
(preferred): add a new PeopleStore method like lookup_or_insert_person(handle,
person_factory) that within a single transaction SELECTs from handle_aliases
and, if missing, INSERTs into people and INSERTs the handle_aliases (or returns
the existing person_id), then update resolve_or_create to call
lookup_or_insert_person instead of separate lookup + insert_person;
alternatively (if you must keep insert_person), after insert_person re-query
lookup(canonical) and if the returned PersonId != the one you inserted, delete
the orphan people row and return the winner's id.
| pub async fn handle_list(store: &PeopleStore, limit: usize) -> Result<RpcOutcome<Value>, String> { | ||
| let limit = limit.clamp(1, 500); | ||
| let people = store.list().await.map_err(|e| format!("list: {e}"))?; | ||
| let now = Utc::now(); | ||
|
|
||
| let mut ranked: Vec<(Value, f32)> = Vec::with_capacity(people.len()); | ||
| for p in people { | ||
| let interactions = store | ||
| .interactions_for(p.id) | ||
| .await | ||
| .map_err(|e| format!("interactions_for: {e}"))?; | ||
| let s = score(&interactions, now); | ||
| let handles: Vec<Value> = p | ||
| .handles | ||
| .iter() | ||
| .map(|h| { | ||
| let (kind, value) = h.as_key(); | ||
| json!({ "kind": kind, "value": value }) | ||
| }) | ||
| .collect(); | ||
| ranked.push(( | ||
| json!({ | ||
| "person_id": p.id.to_string(), | ||
| "display_name": p.display_name, | ||
| "primary_email": p.primary_email, | ||
| "primary_phone": p.primary_phone, | ||
| "handles": handles, | ||
| "score": s.score, | ||
| "components": { | ||
| "recency": s.recency, | ||
| "frequency": s.frequency, | ||
| "reciprocity": s.reciprocity, | ||
| "depth": s.depth, | ||
| }, | ||
| "interaction_count": interactions.len(), | ||
| }), | ||
| s.score, | ||
| )); | ||
| } | ||
| ranked.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal)); | ||
| let people_json: Vec<Value> = ranked.into_iter().take(limit).map(|(v, _)| v).collect(); | ||
| Ok(RpcOutcome::new(json!({ "people": people_json }), vec![])) | ||
| } |
There was a problem hiding this comment.
N+1 query pattern when listing people; no top‑K short‑circuit.
handle_list calls store.interactions_for(p.id).await once per person inside a sequential loop and only applies limit after sorting. Each interactions_for is a spawn_blocking round-trip plus a SQL query against interactions, so cost grows linearly with the number of people regardless of limit. With realistic address-book sizes (hundreds to thousands of contacts) this will dominate response time and serialize behind a single Tokio blocking thread per call.
Two reasonable mitigations:
- Add an aggregate path on
PeopleStore(e.g.,list_with_scores(limit)orinteraction_summaries()) that fetches the rows needed for scoring in a single query, then runscore()over the in-memory result. - At minimum, parallelize the per-person fetches with
futures::future::join_allso they don't serialize end-to-end.
Either way, since score = 0 when any component is 0 and frequency = 0 for people with no interactions in the window, the long tail of dormant contacts is doing a SQL round-trip just to land at the bottom of the list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/people/rpc.rs` around lines 16 - 58, handle_list currently
issues an interactions_for(p.id).await inside a sequential loop which causes an
N+1 query pattern and serial spawn_blocking calls; fix by either (A) adding an
aggregate method on PeopleStore (suggest names: list_with_scores(limit) or
interaction_summaries()) that runs a single SQL query to return interaction
aggregates needed by score() and then compute score() in-memory for each person
before sorting and truncating by limit, or (B) minimally parallelizing the
per-person fetches by collecting futures for store.interactions_for(p.id) and
awaiting them with futures::future::join_all so the DB calls run concurrently,
and ensure you apply limit before expensive work by sorting/truncating after
computing scores. Use the function name handle_list and the method
interactions_for as anchors when making the change.
Reads last N messages from $CHAT_DB (default ~/Library/Messages/chat.db), ingests into an in-memory PersonalIndex via IndexWriter::upsert, then runs IndexReader::keyword_search over a hardcoded query list. No embeddings — pure FTS5/bm25 over real personal data. Use to benchmark ingest throughput on a real corpus and to sanity-check that keyword search ranking + snippet generation behave on real text. On a 5000-message slice: ingest at ~47k items/s, search returns 3 hits with snippets in sub-millisecond time. Run: cargo run --example real_imessage_demo --release Requires Full Disk Access for the running terminal.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
examples/real_imessage_demo.rs (2)
63-73: Nit: collapse the twoNonebranches.
is_from_me || handle.is_empty()reads the same and drops one branch.Proposed tweak
- let author = if is_from_me { - None - } else if handle.is_empty() { - None - } else { - Some(Person { - display_name: None, - email: None, - source_id: Some(handle.clone()), - }) - }; + let author = if is_from_me || handle.is_empty() { + None + } else { + Some(Person { + display_name: None, + email: None, + source_id: Some(handle.clone()), + }) + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/real_imessage_demo.rs` around lines 63 - 73, Collapse the two None branches when building author: replace the nested if/else that checks is_from_me then handle.is_empty() with a single condition (is_from_me || handle.is_empty()) so author is None in either case, otherwise return Some(Person { display_name: None, email: None, source_id: Some(handle.clone()) }); update the block that assigns to author (the author variable and Person construction) accordingly.
33-44: Add comment documenting that recent iMessage rows on Big Sur+ will be skipped.On macOS Big Sur and later, the Messages app stores the body of many (most) recent messages in the
attributedBodyBLOB column (a binary plist/NSAttributedStringarchive) and leavesmessage.textNULL. The current query'sWHERE m.text IS NOT NULL AND length(m.text) > 0filter will silently skip those rows, causing the demo to pull back only older plain-text messages and potentially appear to have sparse results on modern chat databases. Add a comment before line 33 to document this limitation so users understand why the demo may return fewer messages than expected.Suggested fix
+ // NOTE: on Big Sur+ many rows have NULL `text` and the body lives in + // `attributedBody` (binary plist). This demo only ingests the plain-text + // subset; expect a fraction of recent messages to be skipped. let mut stmt = conn.prepare( "SELECT m.ROWID, \ COALESCE(m.text, '') as text, \A fuller implementation would
SELECT m.attributedBodyand decode theNSStringpayload from the archived plist whentextis NULL, but documenting the limitation is sufficient for a demo.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/real_imessage_demo.rs` around lines 33 - 44, Add a brief comment just above the SQL prepare call (the conn.prepare(...) that builds `stmt`) explaining that on macOS Big Sur and later many recent message bodies are stored in the `attributedBody` BLOB (an archived NSAttributedString/plist) and that the WHERE clause `m.text IS NOT NULL AND length(m.text) > 0` will therefore skip those rows, causing the demo to return only older plain-text messages; note that a fuller fix would SELECT and decode `m.attributedBody` when `text` is NULL, but for the demo we document the limitation instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/real_imessage_demo.rs`:
- Around line 63-73: Collapse the two None branches when building author:
replace the nested if/else that checks is_from_me then handle.is_empty() with a
single condition (is_from_me || handle.is_empty()) so author is None in either
case, otherwise return Some(Person { display_name: None, email: None, source_id:
Some(handle.clone()) }); update the block that assigns to author (the author
variable and Person construction) accordingly.
- Around line 33-44: Add a brief comment just above the SQL prepare call (the
conn.prepare(...) that builds `stmt`) explaining that on macOS Big Sur and later
many recent message bodies are stored in the `attributedBody` BLOB (an archived
NSAttributedString/plist) and that the WHERE clause `m.text IS NOT NULL AND
length(m.text) > 0` will therefore skip those rows, causing the demo to return
only older plain-text messages; note that a fuller fix would SELECT and decode
`m.attributedBody` when `text` is NULL, but for the demo we document the
limitation instead.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 70f624eb-1dd2-46fe-a539-137d2f592428
📒 Files selected for processing (1)
examples/real_imessage_demo.rs
This PR stacks on #817 (life-capture foundation). Commit
be1112f9introduces thelife_capture.ingestcontroller which requires thePersonalIndex+IndexWritertypes from #817. Do not merge until #817 lands — after that I'll rebase this onto new main and the diff auto-shrinks to just the A1 delta (~1 commit).Summary
life_capture.ingestRPC controller — idempotent upsert by(source, external_id), embeds text, atomic vector replace.ingest_group()fromopenhuman.memory_doc_ingesttoopenhuman.life_capture_ingest, so chat-day transcripts land in thePersonalIndex(the same storelife_capture.searchreads from).life_capture.searchenvelope shape: wrap hits in{"hits": [...]}per its declared schema (was returning a bare array).#[ignore]d e2e attests/imessage_ingest_e2e.rs: ingest → search → re-ingest, assertsreplaced=trueandtotal_itemsunchanged.Test plan
cargo test -p openhuman --lib openhuman::life_capture— 27 passingcargo test --test imessage_ingest_e2e -- --ignored --test-threads=1— 2 passingcargo check --manifest-path Cargo.tomlcargo check --manifest-path app/src-tauri/Cargo.toml~/Library/Messages/chat.dbviayarn tauri dev, confirm hits appear inlife_capture.searchSummary by CodeRabbit
New Features
Tests
Documentation
macOS Permissions