lerim-dev · kargarisaac · May 14, 2026 · May 13, 2026 · May 13, 2026 · May 14, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.1.83] - 2026-05-14
+
+### Added
+- Packaged the BAML source and generated client under `src/lerim/agents/` so future agents can share the same BAML/LangGraph layout.
+- Added the production BAML/LangGraph extract package with deterministic trace windowing, typed BAML scans, record synthesis, context-store persistence, and structured graph events.
+
+### Changed
+- Replaced sync extraction with the BAML/LangGraph harness while keeping maintain, ask, and working-memory on PydanticAI.
+- Updated extraction evals, integration tests, docs, and run artifacts to use graph events instead of PydanticAI extract messages.
+- Tuned extraction prompts to avoid storing incidental personal names unless identity itself is the durable context.
+
+### Fixed
+- Hardened session catalog/API status paths so catalog storage issues degrade status responses instead of crashing status or maintain.
+- Made extraction persistence idempotent when a rebuilt session catalog replays a session whose episode already exists.
+- Improved long-running extraction queue handling so transient SQLite heartbeat write failures and sequential processing do not create false stale-running jobs.
+
+### Removed
+- Removed the legacy PydanticAI extract agent, extract-only trace tools, history processors, and the experimental `baml_agents/` sidecar.
+
 ## [0.1.81] - 2026-04-29
 
 ### Fixed

diff --git a/README.md b/README.md
@@ -222,23 +222,26 @@ Project separation happens inside the database by `project_id`.
 
 There is no per-project durable store on disk.
 
-## Agent Tools
+## Agent Runtime
 
-The agent-facing tool contract is intentionally small:
+The sync extractor uses a BAML plus LangGraph graph under
+`src/lerim/agents/`. The graph reads deterministic trace windows, asks BAML
+for typed window scans, synthesizes one final record payload, and persists the
+result to SQLite.
+
+The maintain, ask, and working-memory flows still use PydanticAI with a small
+semantic DB-era tool surface:
 
-- `read_trace`
 - `list_context`
 - `search_context`
 - `get_context`
-- `save_context`
 - `revise_context`
 - `archive_context`
 - `supersede_context`
 - `count_context`
-- `note_trace_findings`
-- `prune_trace_reads`
 
-These are the authoritative runtime tool names. Keeping the surface DB-era and semantic makes the runtime easier to reason about and gives smaller future models a cleaner action space for training.
+Keeping the surface DB-era and semantic makes the runtime easier to reason
+about and gives smaller future models a cleaner action space for training.
 
 ## Common Commands
 

diff --git a/docs/cli/sync.md b/docs/cli/sync.md
@@ -24,29 +24,23 @@ lerim sync --agent claude,codex
 ```mermaid
 flowchart TD
     A["Trigger: lerim sync or daemon"] --> B["Discover and queue changed sessions"]
-    B --> C["Extract agent receives one session trace"]
-
-    C --> D["Prompt goal: turn this session into durable project memory"]
-    D --> E["Agent reads trace chunks with read_trace"]
-    E --> F{"Has the agent read enough of the trace?"}
-    F -- "no" --> E
-    F -- "yes" --> G["Agent identifies candidate memories: episode, decisions, preferences, constraints, facts, references"]
-
-    G --> H{"Could this update or duplicate existing memory?"}
-    H -- "yes" --> I["Use search_context/get_context to inspect existing records"]
-    H -- "no" --> J["Prepare new records"]
-
-    I --> K{"Existing record should change?"}
-    K -- "revise" --> L["Use revise_context on fetched record"]
-    K -- "new memory" --> J
-    K -- "no durable value" --> M["Do not write"]
-
-    J --> N["Use save_context for supported durable records"]
-    L --> O["SQLite context DB + record_versions"]
-    N --> O
-    M --> P["Completion summary"]
-    O --> P
-    P --> Q["Sync artifacts: manifest, agent log, trace"]
+    B --> C["Extractor receives one session trace"]
+
+    C --> D["Deterministic graph reads the next trace window"]
+    D --> E["BAML ScanTraceWindow returns typed findings"]
+    E --> F{"More trace windows?"}
+    F -- "yes" --> D
+    F -- "no" --> G["BAML SynthesizeExtractRecords creates one episode and durable candidates"]
+
+    G --> H["Persistence normalizes and validates record drafts"]
+    H --> I{"Durable records present?"}
+    I -- "yes" --> J["Write active durable records"]
+    I -- "no" --> K["Write archived episode only"]
+
+    J --> L["SQLite context DB + record_versions"]
+    K --> L
+    L --> M["Completion summary"]
+    M --> N["Sync artifacts: manifest, graph events, trace"]
 ```
 
 ## Notes

diff --git a/docs/concepts/how-it-works.md b/docs/concepts/how-it-works.md
@@ -29,27 +29,25 @@ Canonical storage is global:
 
 Projects are scoped by `project_id` inside the database.
 
-## Agent tool surface
+## Agent runtime surface
 
-Lerim does not expose raw SQL or file CRUD to the agent.
+Lerim does not expose raw SQL or file CRUD to agents.
 
-The durable context tools are:
+The sync extractor is a BAML plus LangGraph graph. It reads deterministic trace
+windows, scans each window into typed findings, synthesizes records once, and
+persists them through the context store.
+
+The maintain, ask, and working-memory flows use PydanticAI. Their semantic
+context tools are:
 
-- `read_trace`
 - `list_context`
 - `search_context`
 - `get_context`
-- `save_context`
 - `revise_context`
 - `archive_context`
 - `supersede_context`
 - `count_context`
 
-The extract flow also uses:
-
-- `note_trace_findings`
-- `prune_trace_reads`
-
 Retrieval is hybrid:
 
 - local ONNX embeddings from `mixedbread-ai/mxbai-embed-xsmall-v1`

diff --git a/docs/concepts/sync-maintain.md b/docs/concepts/sync-maintain.md
@@ -7,7 +7,8 @@ clean:
 - **Maintain** (cold path) -- refines existing records offline
 
 Both run automatically in the daemon loop and can also be triggered manually.
-Both use the same PydanticAI runtime and the `[roles.agent]` role model.
+Sync extraction uses the BAML plus LangGraph runtime and the `[roles.agent]`
+role model. Maintain uses the PydanticAI runtime with the same role model.
 
 ---
 
@@ -20,7 +21,7 @@ records:
 2. **Index** -- new sessions are cataloged in `sessions.sqlite3`
 3. **Match to project** -- sessions matching a registered project are enqueued; unmatched sessions are indexed but not extracted
 4. **Compact** -- traces are compacted (tool outputs stripped) and cached
-5. **Extract flow** -- the PydanticAI extraction agent (`[roles.agent]`) reads the trace and uses `read_trace`, `note_trace_findings`, `prune_trace_reads`, `search_context`, `get_context`, `save_context`, and `revise_context` to write one episode record plus a small number of durable records into `~/.lerim/context.sqlite3`
+5. **Extract flow** -- the BAML plus LangGraph extractor (`[roles.agent]`) reads deterministic trace windows, scans typed findings, synthesizes the final payload, and writes one episode record plus a small number of durable records into `~/.lerim/context.sqlite3`
 
 ### Record quality contract
 

diff --git a/docs/configuration/tracing.md b/docs/configuration/tracing.md
@@ -1,6 +1,6 @@
 # Tracing
 
-Lerim uses [MLflow](https://mlflow.org) for PydanticAI agent observability.
+Lerim uses [MLflow](https://mlflow.org) for agent observability.
 Tracing is opt-in and controlled by `[observability].mlflow_enabled` in
 `~/.lerim/config.toml`. The `LERIM_MLFLOW` environment variable can override it
 for one-off runs.
@@ -9,10 +9,12 @@ for one-off runs.
 
 When tracing is enabled, MLflow records:
 
-- **PydanticAI model calls** -- via `mlflow.pydantic_ai.autolog()`, every language model invocation
-  across sync/maintain/ask flows is captured automatically, including
-  input prompts, outputs, token counts, and latency.
-- **Agent/tool executions** -- tool calls and agent steps are traced as nested spans within each run.
+- **Sync extraction graph** -- the BAML plus LangGraph extractor emits a
+  top-level `lerim.agent.extract` span with trace metadata and model label.
+- **PydanticAI model calls** -- via `mlflow.pydantic_ai.autolog()`, maintain,
+  ask, and working-memory model invocations are captured automatically,
+  including input prompts, outputs, token counts, and latency.
+- **Agent/tool executions** -- tool calls and agent steps are traced as nested spans within each run when the runtime exposes them.
 - **agent_trace.json** -- each sync/maintain run also writes a local
   `agent_trace.json` under the run workspace for a full tool/message history
   (not MLflow-specific).
@@ -104,17 +106,18 @@ Lerim continues writing traces as long as the server is running with
 In the UI, look for:
 
 - **Experiments** -- select the `lerim` experiment.
-- **Traces** -- the primary view for PydanticAI autologging. Expand a trace to
-  see the model/tool span tree.
+- **Traces** -- the primary view for Lerim agent spans. Expand a trace to see
+  the sync graph span or PydanticAI model/tool span tree.
 - **Run id** -- match a local run folder to MLflow by searching for the
   `manifest.json` `run_id` value. It is also stored as `client_request_id` and
   the `lerim.run_id` tag.
-- **Model calls** -- every PydanticAI model request is logged with input prompts,
-  outputs, token counts, and latency.
+- **Model calls** -- PydanticAI model requests are logged with input prompts,
+  outputs, token counts, and latency. Sync extraction model metadata is attached
+  to the BAML/LangGraph extract span.
 - **Spans** -- nested spans show the call hierarchy from the top-level
   orchestration down to individual LM calls and tool invocations.
 
-Classic MLflow **Runs** may be empty for PydanticAI traces. That does not mean
+Classic MLflow **Runs** may be empty for agent traces. That does not mean
 tracing is broken; check the Traces view or verify the SQLite counts below.
 
 !!! tip "Filtering"
@@ -146,7 +149,7 @@ Important files:
 - `manifest.json` -- run id, operation, project, session id, artifact paths, and
   status. `mlflow_client_request_id` matches the MLflow trace request id.
 - `events.jsonl` -- compact started/succeeded/failed events for that run.
-- `agent_trace.json` -- serialized PydanticAI messages when available.
+- `agent_trace.json` -- serialized graph events or PydanticAI messages when available.
 - `agent.log` -- short human-readable agent summary on success.
 - `error.json` -- structured error details on failure.
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,13 +4,15 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "lerim"
-version = "0.1.82"
+version = "0.1.83"
 description = "Continual learning layer for coding agents and software projects."
 readme = "README.md"
 requires-python = ">=3.11"
 license = "BUSL-1.1"
 dependencies = [
     "pydantic==2.12.5",
+    "baml-py==0.222.0",
+    "langgraph==1.2.0",
     "pydantic-ai==1.70.0",
     "pydantic-evals==1.70.0",
     "eval_type_backport==0.3.1; python_version < '3.13'",
@@ -62,7 +64,12 @@ where = ["src"]
 include = ["lerim*"]
 
 [tool.setuptools.package-data]
-lerim = ["config/default.toml", "skills/*.md", "server/lerim-seccomp.json"]
+lerim = [
+    "agents/baml_src/*.baml",
+    "config/default.toml",
+    "skills/*.md",
+    "server/lerim-seccomp.json",
+]
 
 [tool.pytest.ini_options]
 pythonpath = ["."]
@@ -84,8 +91,12 @@ select = ["E4", "E7", "E9", "F"]
 extend-select = ["B006"]
 ignore = ["E501"]
 
+[tool.ruff]
+extend-exclude = ["src/lerim/agents/baml_client"]
+
 [tool.vulture]
 paths = ["src/lerim/", "vulture_whitelist.py"]
+exclude = ["*/src/lerim/agents/baml_client/*"]
 min_confidence = 60
 
 [dependency-groups]

diff --git a/src/lerim/README.md b/src/lerim/README.md
@@ -3,13 +3,14 @@
 ## Summary
 
 This folder contains the Lerim runtime package.
-Current architecture is PydanticAI-only for agent execution.
+Current architecture uses BAML plus LangGraph for sync extraction, and
+PydanticAI for maintain, ask, and working-memory agent execution.
 Durable Lerim context now lives in the global SQLite store at `~/.lerim/context.sqlite3`.
 Project identity is used to separate records by repo inside that shared DB.
 
 The package is organized by feature boundary:
 
-- `agents/`: agent flows (`extract.py`, `maintain.py`, `ask.py`, `working_memory.py`), semantic context tools (`tools.py`), typed contracts (`contracts.py`)
+- `agents/`: agent flows (`extract/`, `maintain.py`, `ask.py`, `working_memory.py`), BAML source/client files (`baml_src/`, `baml_client/`), semantic context tools (`tools.py`), typed contracts (`contracts.py`)
 - `server/`: CLI (`cli.py`), HTTP API (`httpd.py`), daemon (`daemon.py`), runtime orchestrator (`runtime.py`), Docker/runtime API helpers (`api.py`)
 - `config/`: config loading (`settings.py`), PydanticAI model builders (`providers.py`), tracing and logging setup
 - `context/`: global SQLite context store, ONNX embedding provider, `sqlite-vec` index management, and retrieval/write helpers
@@ -29,8 +30,9 @@ If you are new to the codebase, read in this order:
 4. `working_memory.py` and `agents/working_memory.py` for generated Working Memory.
 5. `context/store.py` for the canonical SQLite schema and retrieval/write logic.
    This is where hybrid search happens: local ONNX embeddings, `sqlite-vec` KNN, SQLite FTS5, and RRF fusion.
-6. `agents/tools.py` for the authoritative semantic agent tool surface (`read_trace`, `list_context`, `search_context`, `get_context`, `save_context`, `revise_context`, `archive_context`, `supersede_context`, `count_context`, `note_trace_findings`, `prune_trace_reads`).
-7. `agents/extract.py`, `agents/maintain.py`, `agents/ask.py` for PydanticAI agent behavior.
+6. `agents/extract/` and `agents/baml_src/` for sync extraction behavior.
+7. `agents/tools.py` for the maintain/ask semantic tool surface (`list_context`, `search_context`, `get_context`, `revise_context`, `archive_context`, `supersede_context`, `count_context`).
+8. `agents/maintain.py`, `agents/ask.py`, and `agents/working_memory.py` for PydanticAI agent behavior.
 
 ## Working Memory flow
 

diff --git a/src/lerim/agents/__init__.py b/src/lerim/agents/__init__.py
@@ -1,4 +1,4 @@
-"""Agent modules: extract, maintain, ask + shared tools (all PydanticAI)."""
+"""Agent modules for extract, maintain, ask, and working-memory flows."""
 
 from __future__ import annotations
 

diff --git a/src/lerim/agents/baml_client/__init__.py b/src/lerim/agents/baml_client/__init__.py
@@ -0,0 +1,60 @@
+# ----------------------------------------------------------------------------
+#
+#  Welcome to Baml! To use this generated code, please run the following:
+#
+#  $ pip install baml
+#
+# ----------------------------------------------------------------------------
+
+# This file was generated by BAML: please do not edit it. Instead, edit the
+# BAML files and re-generate this code using: baml-cli generate
+# baml-cli is available with the baml package.
+
+__version__ = "0.222.0"
+
+try:
+  from baml_py.safe_import import EnsureBamlPyImport
+except ImportError:
+  raise ImportError(f"""Update to baml-py required.
+Version of baml_client generator (see generators.baml): {__version__}
+
+Please upgrade baml-py to version "{__version__}".
+
+$ pip install baml-py=={__version__}
+$ uv add baml-py=={__version__}
+
+If nothing else works, please ask for help:
+
+https://github.com/boundaryml/baml/issues
+https://boundaryml.com/discord
+""") from None
+
+
+with EnsureBamlPyImport(__version__) as e:
+  e.raise_if_incompatible_version(__version__)
+
+  from . import types
+  from . import tracing
+  from . import stream_types
+  from . import config
+  from .config import reset_baml_env_vars
+
+  from .sync_client import b
+
+  from . import watchers
+
+
+# FOR LEGACY COMPATIBILITY, expose "partial_types" as an alias for "stream_types"
+# WE RECOMMEND USERS TO USE "stream_types" INSTEAD
+partial_types = stream_types
+
+__all__ = [
+  "b",
+  "stream_types",
+  "partial_types",
+  "tracing",
+  "types",
+  "reset_baml_env_vars",
+  "config",
+  "watchers",
+]