feat(eventkit): EventKit Calendar read + Reminders write bridge (A7)#822
feat(eventkit): EventKit Calendar read + Reminders write bridge (A7)#822jwalin-shah wants to merge 35 commits intotinyhumansai:mainfrom
Conversation
Drafts the wedge for OpenHuman: continuous ingestion of Gmail/Calendar/Slack/ iMessage into a local personal_index.db, with a Today brief and retrieval- tuned chat as the two surfaces. Three composable privacy modes (Convenience / Hybrid / Fully local) sharing one code path. Track 1 unblocks the ship pipeline; Track 2 builds the spine over 4 weeks. Reuses ingestion logic from ~/projects/inbox.
Two implementation plans drafted from the life-capture spec: - Track 1 (ship pipeline): fix Ubuntu installer smoke, land in-flight PRs tinyhumansai#806/tinyhumansai#786/tinyhumansai#788/tinyhumansai#797, wire Tauri auto-updater + signed Mac/Windows builds. - Life-Capture #1 (foundation): SQLite + sqlite-vec personal_index.db, Embedder trait + HostedEmbedder (OpenAI), PII redaction, quoted-thread stripping, hybrid retrieval (vector + keyword + recency), controller schema + RPC. End-to-end test with synthetic items. No ingestion or UI yet — those are subsequent milestone plans.
Refactors scripts/install.sh to expose resolve_asset_url and verify_asset_reachable. Adds scripts/test_install.sh that exercises the resolver against a committed fixture latest.json. Failures now report the resolved URL and the parse error instead of dying silently.
…le, keyword projection column
Adopts the Hermes pattern (NousResearch/hermes-agent, MIT) of an
agent-writable, char-bounded curated memory file pair. F15 builds the
char-bounded store with atomic writes; F16 wires a session-start
snapshot into the system prompt (preserves prefix cache); F17 exposes
memory.{add,replace,remove,read} through the controller dispatch so
both the agent loop and skills can use it.
Sits between TinyHumans synthesis (volatile) and personal_index.db
(raw events) — fills the "deliberately curated facts" gap that
neither covered.
Adds src/openhuman/life_capture/ module tree (stubs for embedder, index, migrations, quote_strip, redact, rpc, schemas, types, plus tests/) per Foundation Plan F1. Wires the module in src/openhuman/mod.rs. Adds sqlite-vec to deps and httpmock as a dev-dep. Reuses existing rusqlite/regex/once_cell/async-trait/tempfile already in tree (Option C — avoids sqlx/rusqlite libsqlite3-sys conflict).
…egex
Previously the country-code group required at least one digit (\+?\d{1,3}),
so '(415) 555-0123' (no leading country code) wouldn't match. Wrapped the
whole prefix in a non-capturing optional group so parenthesized area codes
match without a leading country code.
…te, char-bounded)
… table Loads sqlite_vec via sqlite3_auto_extension exactly once per process so every new connection picks up the vec0 module. PersonalIndex wraps a single rusqlite Connection in Arc<Mutex<_>> for async sharing; open() runs migrations and sets WAL+foreign_keys, open_in_memory() is for tests. vec_version() round-trip and a 1536-dim INSERT + MATCH query both verified.
…_id) dedupe Adds IndexWriter with upsert (ON CONFLICT(source, external_id) DO UPDATE — FTS stays in sync via triggers) and upsert_vector (DELETE + INSERT since vec0 has no ON CONFLICT for virtual table primary keys). Both wrap rusqlite work in spawn_blocking and share the PersonalIndex's Arc<Mutex<Connection>>. Exposes ensure_vec_extension_registered() crate-wide so the migrations test can register vec0 before opening its in-memory connection.
- keyword_search: FTS5 ranked by negated bm25 with snippet markers (« » …), plus the user:local ACL EXISTS clause so the same query shape works for the multi-token team v2 ACL without rewrites. - vector_search: sqlite-vec MATCH with k = ? clause (vec0 KNN requirement) and ORDER BY distance; score = 1/(1 + distance) so callers can blend it with keyword bm25 on the same monotonic scale. Shared ItemRow + into_hit() helper lets both queries reuse the same row shape; rusqlite query_map closures hand-build it because rusqlite has no sqlx FromRow equivalent.
Pulls oversampled candidates (k*3 min 20) from both keyword and vector legs, normalises each independently to [0,1], then re-ranks with 0.55 * vec_norm + 0.35 * kw_norm + 0.10 * exp(-age_days/30). Documents present in only one leg get 0 for the missing signal but still compete on the others; only items with neither signal are dropped. Same-vector twin items now break their tie by recency.
…rch) Adds two controllers exposed via the registry: - life_capture.get_stats — total items, per-source counts, last-ingest ts. - life_capture.search — hybrid (vector + keyword + recency) search with embed-then-rank, optional k (default 10, capped at 100). Runtime state (PersonalIndex + dyn Embedder) lives in life_capture::runtime behind a tokio OnceCell because handlers are stateless fn(Map<String,Value>) and have no per-call context object. F14 will call runtime::init() at app startup; until then handlers return a structured 'not initialised' error so the failure mode is loud, not silent. Schemas registered in core/all.rs alongside cron.
… → search) FakeEmbedder hashes input bytes into a deterministic sparse 1536-dim vector so the same text round-trips identically through vec0 — keeps the e2e test hermetic, no network call. Verifies that: - quote-strip drops the 'On … wrote:' block before indexing, - redact masks the email address before indexing, - the upserted item resurfaces via hybrid_search with both transformations preserved on the returned text.
After Config::load_or_init succeeds, open the personal_index.db at config.workspace_dir and register the life_capture runtime. Embedder is env-gated: OPENHUMAN_EMBEDDINGS_KEY > OPENAI_API_KEY OPENHUMAN_EMBEDDINGS_URL (default: https://api.openai.com/v1) OPENHUMAN_EMBEDDINGS_MODEL (default: text-embedding-3-small) If no key is set we still open the index file (so Plan #2's ingestion worker can write to it) but skip runtime registration — controllers then return the 'not initialised' error from runtime::get() instead of silently calling a misconfigured embedder. Config-schema integration deferred to a follow-up; env-driven keeps F14 non-invasive while we land the rest of the foundation.
…ection snapshot_pair(memory, user) returns a MemorySnapshot containing plain strings — no reference back to the stores — so taking a snapshot at session start and reusing it across turns gives a stable system-prompt prefix and lets the LLM prefix-cache hit on every subsequent turn. Plan #1 also calls for wiring this into chat.rs' OpenClaw context loader, but openhuman assembles agent prompts per-agent under src/openhuman/agent/agents/*/prompt.rs (not the plan's stale src-tauri/src/commands/chat.rs path); the prompt-builder integration is deferred to Plan #2 alongside the agent-context refactor.
… controllers Adds four controllers under the memory_curated namespace (the 'memory' namespace is already owned by the long-term memory subsystem): - memory_curated.read — read MEMORY.md or USER.md - memory_curated.add — append entry, char-bounded - memory_curated.replace — substring replace, char-bounded - memory_curated.remove — drop entries containing a needle 'file' input is a typed enum (memory|user) so adapters reject anything else at validation time. Runtime state lives in curated_memory::runtime behind a tokio OnceCell, mirroring life_capture's pattern: startup in jsonrpc.rs constructs both stores at <workspace>/memories/ with Hermes' char limits (memory: 2200, user: 1375) and registers the runtime. Handlers return 'not initialised' until init runs, so failure is loud.
…ings Findings from Codex + Gemini second-opinion review: 1. IndexWriter::upsert orphaned vectors on re-ingest (Codex). On (source, external_id) conflict the row's id was kept but the caller's fresh UUID was used for upsert_vector — the vector wrote under an id no row joined to. Fix: explicit SELECT-then-UPDATE-or-INSERT in the same transaction; mutates caller's Item.id to the canonical id (so the next upsert_vector lands on the right row) and orphan-deletes any vector already written under the wrong id. Signature change: upsert(&[Item]) -> upsert(&mut [Item]). 2. upsert_vector DELETE+INSERT was outside a transaction (Gemini) — a failed INSERT permanently lost the item's vector. Now wrapped in tx. 3. Runtime over-gated get_stats on the embedder (Codex). Split runtime into separate INDEX + EMBEDDER OnceCells: get_stats only requires index, search requires both. Index initialises whenever the workspace dir is reachable; embedder is env-gated as before. 4. Startup race (both reviewers): runtime init lived inside the post- serve tokio::spawn block, so axum::serve was already accepting before the OnceCells were set. Hoisted both bootstraps (curated_memory and life_capture) into helpers called inline before axum::serve. 5. runtime::get error message lied (Codex) — said 'set embeddings.api_key in config' but startup actually reads env vars. Fixed text. Bonus: rename controller namespace memory_curated -> curated_memory (Codex preference; nothing depends on it yet so renamed before clients do). Adds regression test for #1 (reingest_with_fresh_uuid_keeps_vector_findable). 4696 lib tests pass.
…emory namespace rename Pre-push hook auto-applied cargo fmt across the foundation files; also fixed the schemas.rs docstring still referencing the old memory_curated namespace.
Critical: - curated_memory/store.rs: propagate read errors instead of `unwrap_or_default`, which could rewrite MEMORY.md / USER.md from an empty baseline on transient I/O or permission failures. Reject empty needle in replace/remove. - curated_memory/rpc.rs: belt-and-suspenders empty-needle guard at the RPC boundary (remove with "" deletes every entry). - life_capture/index.rs: upsert_vector rejects orphan ids (items row missing), which would have inserted vectors that never join in vector_search. - life_capture/embedder.rs: validate response length matches input, indices are contiguous 0..n, and every vector matches dim() — prevents silent misalignment or wrong-dimensional vectors flowing into the 1536-wide sqlite-vec table. Added 30s request + 10s connect timeouts. Retrieval hardening: - life_capture/index.rs: keyword_search now escapes FTS5 operators by tokenizing on whitespace and wrapping each token as a quoted literal. Prevents errors or unintended matches from stray quotes, AND/OR/NEAR, column specifiers in user input. - life_capture/index.rs: hybrid_search applies q.sources / q.since / q.until via post-filter (single consistent pass across keyword+vector fusion). - life_capture/rpc.rs: handle_search validates embedder dim and response length against the fixed 1536-wide index column — clear RPC error instead of a cryptic sqlite-vec failure when the embeddings model is swapped. Docs: - docs/event-bus.md: remove erroneous `.await` on register_native_global (it is sync; the handler closure is async). - life-capture design spec: soften PII redaction claim — only regex is implemented today; light NER is flagged as a future enhancement.
Rendered prompts now surface runtime curated-memory writes: - `PromptContext` gains `curated_snapshot: Option<&MemorySnapshot>`. - `UserFilesSection` prefers the snapshot over the workspace-file loader when one is attached, and injects `USER.md` alongside `MEMORY.md` using a byte-compatible `inject_snapshot_content` helper. - `Session` carries `Option<Arc<MemorySnapshot>>`, populated by `ensure_curated_snapshot` on the first turn from `curated_memory::runtime::get()`. Reused across turns so prompt bytes stay frozen (KV-cache prefix contract) while mid-session `curated_memory.add/replace/remove` writes land on the NEXT session. - `ParentExecutionContext` inherits the snapshot so sub-agents render identical `MEMORY.md`/`USER.md` blocks as the parent. - Legacy workspace-file fallback preserved for embeds that don't initialise the curated-memory runtime (pure unit tests).
Loads a 32-item corpus across gmail/calendar/imessage/slack into an in-memory PersonalIndex and runs 12 queries through keyword / vector / hybrid paths, asserting must_contain / must_not_contain within per-query top-K prefixes. The vector leg uses a deterministic FNV-1a feature-hashing embedder (1536 dims, L2-normalized) so the test is offline and reproducible; real embedder swaps behind one call. Fixture reserves `relevant` and `pending` fields so recall@k / MRR / nDCG bolt onto the same JSON later. q-src-01 is marked pending: hybrid_search currently ignores Query.sources/since/until. Flip to false once filtering lands.
Previously every read grabbed the same `Arc<Mutex<Connection>>` as the writer, so `IndexReader` calls serialised behind in-flight ingests even though WAL would allow parallel readers. Split `PersonalIndex` into: - `writer: Arc<Mutex<Connection>>` — unchanged single-writer model. - `reader_pool: Option<Arc<r2d2::Pool<SqliteConnectionManager>>>` — four-connection read pool on file-backed indexes, built alongside the writer after migrations run. Each pooled connection gets `query_only=ON` as a belt-and-suspenders guard; sqlite-vec is loaded automatically via the process-wide auto-extension. `IndexReader` routes `keyword_search` / `vector_search` through a new `with_read_conn` helper that picks the pool when present and falls back to the writer lock otherwise. In-memory handles keep the single-connection layout — shared-cache URIs buy nothing at test-fixture scale and would force every test to grow a unique name.
- Add `life_capture.ingest` RPC controller (idempotent upsert by
`(source, external_id)`, embeds text + atomic vector replace).
- Switch iMessage scanner's `ingest_group()` to POST
`openhuman.life_capture_ingest` instead of `openhuman.memory_doc_ingest`,
so chat-day transcripts land in the PersonalIndex (the same store
`life_capture.search` reads from).
- Fix `life_capture.search` envelope: wrap hits in `{"hits": [...]}` per
the schema (previously returned a bare array — obs 3165).
- New `#[ignore]`d e2e at `tests/imessage_ingest_e2e.rs` proves the bridge:
ingest → search → re-ingest is idempotent (`total_items` unchanged,
`replaced=true`).
Files added: src/openhuman/eventkit/mod.rs — light export-only module root src/openhuman/eventkit/types.rs — CalendarEvent, Reminder, CachedEvent (serde) src/openhuman/eventkit/store.rs — SQLite dedup cache; upsert/is_known/list_cached src/openhuman/eventkit/migrations/0001_init.sql — eventkit_calendar_events table src/openhuman/eventkit/calendar.rs — EKEventStore read (macOS imp + non-mac stub) src/openhuman/eventkit/reminders.rs — EKEventStore write (macOS imp + non-mac stub) src/openhuman/eventkit/rpc.rs — handle_list_events, handle_create_reminder src/openhuman/eventkit/runtime.rs — OnceLock SQLite singleton src/openhuman/eventkit/schemas.rs — ControllerSchema + handler adapters Files modified: Cargo.toml — objc2, objc2-foundation, objc2-event-kit, block2 (macos target) src/openhuman/mod.rs — pub mod eventkit src/core/all.rs — wired into registered_controllers, declared_schemas, namespace_description Permission requirement: NSCalendarsUsageDescription (Info.plist — Tauri host concern) NSRemindersUsageDescription (Info.plist — Tauri host concern) Dedup key: SHA-256(ical_uid | "|" | calendar_id) stored as store_id in eventkit_calendar_events. list_events returns only events not already cached. Non-macOS targets compile to stubs returning a NotSupported error string.
…ctness (A7 review blockers) Block lifetime (calendar.rs, reminders.rs): replace the UB cast `&*block as *const _ as *mut _` with `RcBlock::as_ptr(&block).cast()`. `RcBlock::as_ptr` returns a proper `*mut Block<F>` owned by the RcBlock; holding `block` alive until after `blocking_recv()` returns guarantees the block outlives EventKit's retention window with no UB. EKEventStore caching (runtime.rs, calendar.rs, reminders.rs): add a process-global `Arc<Mutex<EventStoreHandle>>` singleton via `OnceLock`. `EventStoreHandle` is a newtype over `Retained<EKEventStore>` with manual `Send + Sync` — EKEventStore is Apple-documented as thread-safe and is always guarded by the Mutex at call sites. `fetch_events` and `save_reminder` now accept the cached store instead of constructing their own, matching Apple's one-store-per-process recommendation. runtime::get() is also changed from panic to `Result<_, String>` so RPC callers receive a clean error instead of a core crash on DB open/migration failure. Dedupe correctness (calendar.rs): chose option (a) — skip events that lack `calendarItemExternalIdentifier` entirely (log at `info`). The fallback to the local `calendarItemIdentifier` caused re-emission: before iCloud syncs, the local id yields hash A; after sync the external id yields hash B — same event appears twice. Skipping once is safer than silently duping; the log makes the miss rate observable. Validation (rpc.rs): `handle_create_reminder` now rejects empty/blank title, priority > 9, and malformed RFC 3339 due_date with structured error strings. Past due_date logs a warn but does not reject (lenient for clock skew). Tests added: - dedup: `no_dupe_on_resync_with_same_external_id` + `different_external_ids_stored_separately` - runtime: `get_error_on_bad_path` (no panic on bad DB path) - rpc validation: empty title, whitespace title, priority > 9, accepts priority 9, malformed due_date, valid future due_date, past due_date warn-not-reject - non-macOS stubs: `list_events` and `create_reminder` stub tests compile but are only active on non-macOS targets; macOS CI runs the live path All 20 eventkit tests pass. `cargo check --bin openhuman-core` is clean.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 24 minutes and 20 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (73)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
Adds the A7 piece of the v1 plan: the
eventkitmodule — Calendar read + Reminders write, gated by one macOS EventKit permission prompt.src/openhuman/eventkit/—mod,types,calendar,reminders,store,runtime,rpc,schemas,migrationsEKEventStore+objc2-event-kit 0.3.2EKEventStore.saveRemindercalendarItemExternalIdentifier(cross-device stable); events without an external id are skipped + logged atinforather than falling back to the local id (prevents dupes when iCloud hydrates later)EKEventStorecached in a process-globalOnceLock<Arc<Mutex<EventStoreHandle>>>(not re-constructed per call)blocking_recv()— no lifetime UB across FFINotSupportedstubscreate_reminder: rejects empty title, priority > 9, malformed due dateeventkit.list_events,eventkit.create_reminderDependency
feat/life-capture-foundation). Diff includes A1 (be1112f9) on top of A7. Rebases off once #817 lands.Shell counterpart:
NSCalendarsUsageDescription+NSRemindersUsageDescriptionstrings added toapp/src-tauri/Info.plistontest/imessage-live-e2e(commitb9c4233a).Test plan
cargo build --testscleancargo test --lib eventkit— 20 passedKnown follow-ups (not blockers)