Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.1.83] - 2026-05-14

### Added
- Packaged the BAML source and generated client under `src/lerim/agents/` so future agents can share the same BAML/LangGraph layout.
- Added the production BAML/LangGraph extract package with deterministic trace windowing, typed BAML scans, record synthesis, context-store persistence, and structured graph events.

### Changed
- Replaced sync extraction with the BAML/LangGraph harness while keeping maintain, ask, and working-memory on PydanticAI.
- Updated extraction evals, integration tests, docs, and run artifacts to use graph events instead of PydanticAI extract messages.
- Tuned extraction prompts to avoid storing incidental personal names unless identity itself is the durable context.

### Fixed
- Hardened session catalog/API status paths so catalog storage issues degrade status responses instead of crashing status or maintain.
- Made extraction persistence idempotent when a rebuilt session catalog replays a session whose episode already exists.
- Improved long-running extraction queue handling so transient SQLite heartbeat write failures and sequential processing do not create false stale-running jobs.

### Removed
- Removed the legacy PydanticAI extract agent, extract-only trace tools, history processors, and the experimental `baml_agents/` sidecar.

## [0.1.81] - 2026-04-29

### Fixed
Expand Down
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,23 +222,26 @@ Project separation happens inside the database by `project_id`.

There is no per-project durable store on disk.

## Agent Tools
## Agent Runtime

The agent-facing tool contract is intentionally small:
The sync extractor uses a BAML plus LangGraph graph under
`src/lerim/agents/`. The graph reads deterministic trace windows, asks BAML
for typed window scans, synthesizes one final record payload, and persists the
result to SQLite.

The maintain, ask, and working-memory flows still use PydanticAI with a small
semantic DB-era tool surface:

- `read_trace`
- `list_context`
- `search_context`
- `get_context`
- `save_context`
- `revise_context`
- `archive_context`
- `supersede_context`
- `count_context`
- `note_trace_findings`
- `prune_trace_reads`

These are the authoritative runtime tool names. Keeping the surface DB-era and semantic makes the runtime easier to reason about and gives smaller future models a cleaner action space for training.
Keeping the surface DB-era and semantic makes the runtime easier to reason
about and gives smaller future models a cleaner action space for training.

## Common Commands

Expand Down
40 changes: 17 additions & 23 deletions docs/cli/sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,29 +24,23 @@ lerim sync --agent claude,codex
```mermaid
flowchart TD
A["Trigger: lerim sync or daemon"] --> B["Discover and queue changed sessions"]
B --> C["Extract agent receives one session trace"]

C --> D["Prompt goal: turn this session into durable project memory"]
D --> E["Agent reads trace chunks with read_trace"]
E --> F{"Has the agent read enough of the trace?"}
F -- "no" --> E
F -- "yes" --> G["Agent identifies candidate memories: episode, decisions, preferences, constraints, facts, references"]

G --> H{"Could this update or duplicate existing memory?"}
H -- "yes" --> I["Use search_context/get_context to inspect existing records"]
H -- "no" --> J["Prepare new records"]

I --> K{"Existing record should change?"}
K -- "revise" --> L["Use revise_context on fetched record"]
K -- "new memory" --> J
K -- "no durable value" --> M["Do not write"]

J --> N["Use save_context for supported durable records"]
L --> O["SQLite context DB + record_versions"]
N --> O
M --> P["Completion summary"]
O --> P
P --> Q["Sync artifacts: manifest, agent log, trace"]
B --> C["Extractor receives one session trace"]

C --> D["Deterministic graph reads the next trace window"]
D --> E["BAML ScanTraceWindow returns typed findings"]
E --> F{"More trace windows?"}
F -- "yes" --> D
F -- "no" --> G["BAML SynthesizeExtractRecords creates one episode and durable candidates"]

G --> H["Persistence normalizes and validates record drafts"]
H --> I{"Durable records present?"}
I -- "yes" --> J["Write active durable records"]
I -- "no" --> K["Write archived episode only"]

J --> L["SQLite context DB + record_versions"]
K --> L
L --> M["Completion summary"]
M --> N["Sync artifacts: manifest, graph events, trace"]
```

## Notes
Expand Down
18 changes: 8 additions & 10 deletions docs/concepts/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,27 +29,25 @@ Canonical storage is global:

Projects are scoped by `project_id` inside the database.

## Agent tool surface
## Agent runtime surface

Lerim does not expose raw SQL or file CRUD to the agent.
Lerim does not expose raw SQL or file CRUD to agents.

The durable context tools are:
The sync extractor is a BAML plus LangGraph graph. It reads deterministic trace
windows, scans each window into typed findings, synthesizes records once, and
persists them through the context store.

The maintain, ask, and working-memory flows use PydanticAI. Their semantic
context tools are:

- `read_trace`
- `list_context`
- `search_context`
- `get_context`
- `save_context`
- `revise_context`
- `archive_context`
- `supersede_context`
- `count_context`

The extract flow also uses:

- `note_trace_findings`
- `prune_trace_reads`

Retrieval is hybrid:

- local ONNX embeddings from `mixedbread-ai/mxbai-embed-xsmall-v1`
Expand Down
5 changes: 3 additions & 2 deletions docs/concepts/sync-maintain.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ clean:
- **Maintain** (cold path) -- refines existing records offline

Both run automatically in the daemon loop and can also be triggered manually.
Both use the same PydanticAI runtime and the `[roles.agent]` role model.
Sync extraction uses the BAML plus LangGraph runtime and the `[roles.agent]`
role model. Maintain uses the PydanticAI runtime with the same role model.

---

Expand All @@ -20,7 +21,7 @@ records:
2. **Index** -- new sessions are cataloged in `sessions.sqlite3`
3. **Match to project** -- sessions matching a registered project are enqueued; unmatched sessions are indexed but not extracted
4. **Compact** -- traces are compacted (tool outputs stripped) and cached
5. **Extract flow** -- the PydanticAI extraction agent (`[roles.agent]`) reads the trace and uses `read_trace`, `note_trace_findings`, `prune_trace_reads`, `search_context`, `get_context`, `save_context`, and `revise_context` to write one episode record plus a small number of durable records into `~/.lerim/context.sqlite3`
5. **Extract flow** -- the BAML plus LangGraph extractor (`[roles.agent]`) reads deterministic trace windows, scans typed findings, synthesizes the final payload, and writes one episode record plus a small number of durable records into `~/.lerim/context.sqlite3`

### Record quality contract

Expand Down
25 changes: 14 additions & 11 deletions docs/configuration/tracing.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Tracing

Lerim uses [MLflow](https://mlflow.org) for PydanticAI agent observability.
Lerim uses [MLflow](https://mlflow.org) for agent observability.
Tracing is opt-in and controlled by `[observability].mlflow_enabled` in
`~/.lerim/config.toml`. The `LERIM_MLFLOW` environment variable can override it
for one-off runs.
Expand All @@ -9,10 +9,12 @@ for one-off runs.

When tracing is enabled, MLflow records:

- **PydanticAI model calls** -- via `mlflow.pydantic_ai.autolog()`, every language model invocation
across sync/maintain/ask flows is captured automatically, including
input prompts, outputs, token counts, and latency.
- **Agent/tool executions** -- tool calls and agent steps are traced as nested spans within each run.
- **Sync extraction graph** -- the BAML plus LangGraph extractor emits a
top-level `lerim.agent.extract` span with trace metadata and model label.
- **PydanticAI model calls** -- via `mlflow.pydantic_ai.autolog()`, maintain,
ask, and working-memory model invocations are captured automatically,
including input prompts, outputs, token counts, and latency.
- **Agent/tool executions** -- tool calls and agent steps are traced as nested spans within each run when the runtime exposes them.
- **agent_trace.json** -- each sync/maintain run also writes a local
`agent_trace.json` under the run workspace for a full tool/message history
(not MLflow-specific).
Expand Down Expand Up @@ -104,17 +106,18 @@ Lerim continues writing traces as long as the server is running with
In the UI, look for:

- **Experiments** -- select the `lerim` experiment.
- **Traces** -- the primary view for PydanticAI autologging. Expand a trace to
see the model/tool span tree.
- **Traces** -- the primary view for Lerim agent spans. Expand a trace to see
the sync graph span or PydanticAI model/tool span tree.
- **Run id** -- match a local run folder to MLflow by searching for the
`manifest.json` `run_id` value. It is also stored as `client_request_id` and
the `lerim.run_id` tag.
- **Model calls** -- every PydanticAI model request is logged with input prompts,
outputs, token counts, and latency.
- **Model calls** -- PydanticAI model requests are logged with input prompts,
outputs, token counts, and latency. Sync extraction model metadata is attached
to the BAML/LangGraph extract span.
- **Spans** -- nested spans show the call hierarchy from the top-level
orchestration down to individual LM calls and tool invocations.

Classic MLflow **Runs** may be empty for PydanticAI traces. That does not mean
Classic MLflow **Runs** may be empty for agent traces. That does not mean
tracing is broken; check the Traces view or verify the SQLite counts below.

!!! tip "Filtering"
Expand Down Expand Up @@ -146,7 +149,7 @@ Important files:
- `manifest.json` -- run id, operation, project, session id, artifact paths, and
status. `mlflow_client_request_id` matches the MLflow trace request id.
- `events.jsonl` -- compact started/succeeded/failed events for that run.
- `agent_trace.json` -- serialized PydanticAI messages when available.
- `agent_trace.json` -- serialized graph events or PydanticAI messages when available.
- `agent.log` -- short human-readable agent summary on success.
- `error.json` -- structured error details on failure.

Expand Down
15 changes: 13 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ build-backend = "setuptools.build_meta"

[project]
name = "lerim"
version = "0.1.82"
version = "0.1.83"
description = "Continual learning layer for coding agents and software projects."
readme = "README.md"
requires-python = ">=3.11"
license = "BUSL-1.1"
dependencies = [
"pydantic==2.12.5",
"baml-py==0.222.0",
"langgraph==1.2.0",
"pydantic-ai==1.70.0",
"pydantic-evals==1.70.0",
"eval_type_backport==0.3.1; python_version < '3.13'",
Expand Down Expand Up @@ -62,7 +64,12 @@ where = ["src"]
include = ["lerim*"]

[tool.setuptools.package-data]
lerim = ["config/default.toml", "skills/*.md", "server/lerim-seccomp.json"]
lerim = [
"agents/baml_src/*.baml",
"config/default.toml",
"skills/*.md",
"server/lerim-seccomp.json",
]

[tool.pytest.ini_options]
pythonpath = ["."]
Expand All @@ -84,8 +91,12 @@ select = ["E4", "E7", "E9", "F"]
extend-select = ["B006"]
ignore = ["E501"]

[tool.ruff]
extend-exclude = ["src/lerim/agents/baml_client"]

[tool.vulture]
paths = ["src/lerim/", "vulture_whitelist.py"]
exclude = ["*/src/lerim/agents/baml_client/*"]
min_confidence = 60

[dependency-groups]
Expand Down
10 changes: 6 additions & 4 deletions src/lerim/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@
## Summary

This folder contains the Lerim runtime package.
Current architecture is PydanticAI-only for agent execution.
Current architecture uses BAML plus LangGraph for sync extraction, and
PydanticAI for maintain, ask, and working-memory agent execution.
Durable Lerim context now lives in the global SQLite store at `~/.lerim/context.sqlite3`.
Project identity is used to separate records by repo inside that shared DB.

The package is organized by feature boundary:

- `agents/`: agent flows (`extract.py`, `maintain.py`, `ask.py`, `working_memory.py`), semantic context tools (`tools.py`), typed contracts (`contracts.py`)
- `agents/`: agent flows (`extract/`, `maintain.py`, `ask.py`, `working_memory.py`), BAML source/client files (`baml_src/`, `baml_client/`), semantic context tools (`tools.py`), typed contracts (`contracts.py`)
- `server/`: CLI (`cli.py`), HTTP API (`httpd.py`), daemon (`daemon.py`), runtime orchestrator (`runtime.py`), Docker/runtime API helpers (`api.py`)
- `config/`: config loading (`settings.py`), PydanticAI model builders (`providers.py`), tracing and logging setup
- `context/`: global SQLite context store, ONNX embedding provider, `sqlite-vec` index management, and retrieval/write helpers
Expand All @@ -29,8 +30,9 @@ If you are new to the codebase, read in this order:
4. `working_memory.py` and `agents/working_memory.py` for generated Working Memory.
5. `context/store.py` for the canonical SQLite schema and retrieval/write logic.
This is where hybrid search happens: local ONNX embeddings, `sqlite-vec` KNN, SQLite FTS5, and RRF fusion.
6. `agents/tools.py` for the authoritative semantic agent tool surface (`read_trace`, `list_context`, `search_context`, `get_context`, `save_context`, `revise_context`, `archive_context`, `supersede_context`, `count_context`, `note_trace_findings`, `prune_trace_reads`).
7. `agents/extract.py`, `agents/maintain.py`, `agents/ask.py` for PydanticAI agent behavior.
6. `agents/extract/` and `agents/baml_src/` for sync extraction behavior.
7. `agents/tools.py` for the maintain/ask semantic tool surface (`list_context`, `search_context`, `get_context`, `revise_context`, `archive_context`, `supersede_context`, `count_context`).
8. `agents/maintain.py`, `agents/ask.py`, and `agents/working_memory.py` for PydanticAI agent behavior.

## Working Memory flow

Expand Down
2 changes: 1 addition & 1 deletion src/lerim/agents/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Agent modules: extract, maintain, ask + shared tools (all PydanticAI)."""
"""Agent modules for extract, maintain, ask, and working-memory flows."""

from __future__ import annotations

Expand Down
60 changes: 60 additions & 0 deletions src/lerim/agents/baml_client/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# ----------------------------------------------------------------------------
#
# Welcome to Baml! To use this generated code, please run the following:
#
# $ pip install baml
#
# ----------------------------------------------------------------------------

# This file was generated by BAML: please do not edit it. Instead, edit the
# BAML files and re-generate this code using: baml-cli generate
# baml-cli is available with the baml package.

__version__ = "0.222.0"

try:
from baml_py.safe_import import EnsureBamlPyImport
except ImportError:
raise ImportError(f"""Update to baml-py required.
Version of baml_client generator (see generators.baml): {__version__}

Please upgrade baml-py to version "{__version__}".

$ pip install baml-py=={__version__}
$ uv add baml-py=={__version__}

If nothing else works, please ask for help:

https://github.com/boundaryml/baml/issues
https://boundaryml.com/discord
""") from None


with EnsureBamlPyImport(__version__) as e:
e.raise_if_incompatible_version(__version__)

from . import types
from . import tracing
from . import stream_types
from . import config
from .config import reset_baml_env_vars

from .sync_client import b

from . import watchers


# FOR LEGACY COMPATIBILITY, expose "partial_types" as an alias for "stream_types"
# WE RECOMMEND USERS TO USE "stream_types" INSTEAD
partial_types = stream_types

__all__ = [
"b",
"stream_types",
"partial_types",
"tracing",
"types",
"reset_baml_env_vars",
"config",
"watchers",
]
Loading
Loading