diff --git a/.gitignore b/.gitignore
index cb88ec0c..1c45f705 100644
--- a/.gitignore
+++ b/.gitignore
@@ -57,3 +57,10 @@ claude.scratchpad.md
# MCP config (contains secrets)
.mcp.json
+.mcp.json.bak
+
+# Claude Code project settings
+.claude/
+
+# Progress trackers (ephemeral)
+scripts/.kg_rebuild_progress.json
diff --git a/CLAUDE.md b/CLAUDE.md
index c9b6d6ce..083642d1 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -50,27 +50,53 @@ brainlayer enrich
- Chunking: AST-aware (tree-sitter); never split stack traces; mask large tool output
## Enrichment
-- Primary backend: **MLX** (`Qwen2.5-Coder-14B-Instruct-4bit`) on Apple Silicon (port 8080)
-- Fallback: Ollama (`glm-4.7-flash`) on port 11434, auto-switches after 3 consecutive MLX failures
+- Primary backend: **Groq** (cloud, configured in launchd plist)
+- Fallback: Gemini via `enrichment_controller.py`, Ollama as offline last-resort
- Override with `BRAINLAYER_ENRICH_BACKEND=ollama|mlx|groq`
+- Rate configurable via `BRAINLAYER_ENRICH_RATE` env var (default 0.2 = 12 RPM)
- Adds metadata (summary, tags, importance, intent); session enrichment captures decisions/corrections
## Interfaces
- Daemon API (core): `/health`, `/stats`, `/search`, `/context/{chunk_id}`, `/session/{session_id}`
- Brain graph API: `/brain/graph`, `/brain/node/{node_id}`
- Backlog API: `/backlog/items` (GET/POST/PATCH/DELETE)
-- MCP tools (9): `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person`, `brain_tags` (legacy `brainlayer_*` aliases still work)
+- MCP tools (11): `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person`, `brain_tags`, `brain_supersede`, `brain_archive` (legacy `brainlayer_*` aliases still work)
- MCP server entrypoint: `brainlayer-mcp`
## Exports
- `brainlayer brain-export` -> graph JSON for dashboard
- `brainlayer export-obsidian` -> Markdown vault (backlinks + tags)
+## Real-time JSONL Watcher
+- `brainlayer watch` — persistent watcher for `~/.claude/projects/*.jsonl`
+- LaunchAgent: `com.brainlayer.watch.plist` (KeepAlive, Nice=10)
+- 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip
+- Offset persistence: `~/.local/share/brainlayer/offsets.json` (survives restarts)
+- Rewind detection: file shrink = checkpoint restore → soft-archives reverted chunks
+- Axiom telemetry: startup, flush, error, heartbeat (60s) to `brainlayer-watcher` dataset
+- Source: `watcher.py` (tailer + indexer), `watcher_bridge.py` (pipeline integration)
+
+## Chunk Lifecycle
+- Columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table
+- Default search excludes lifecycle-managed chunks; `include_archived=True` shows history
+- `brain_supersede`: safety gate for personal data (journals, notes, health/finance)
+- `brain_archive`: soft-delete with timestamp
+- `brain_store` gains `supersedes` param for atomic store-and-replace
+
+## Session Dedup Coordination
+- `/tmp/brainlayer_session_{id}.json` — shared between SessionStart and UserPromptSubmit hooks
+- SessionStart writes injected chunk_ids; UserPromptSubmit skips already-injected
+- Handoff detection: prompts with "handoff", "session-handoff" skip auto-search
+- Module: `hooks/dedup_coordination.py`
+
## Data & Locks
- DB: `~/.local/share/brainlayer/brainlayer.db`
+- Watcher offsets: `~/.local/share/brainlayer/offsets.json`
- Prompts cache: `~/.local/share/brainlayer/prompts/`
+- Watcher logs: `~/.local/share/brainlayer/logs/watch.{log,err}`
- Socket: `/tmp/brainlayer.sock`
- Enrichment lock: `/tmp/brainlayer-enrichment.lock`
+- Session dedup: `/tmp/brainlayer_session_*.json`
## Bulk DB Operations (SAFETY)
1. **Stop enrichment workers first** — never run bulk ops while enrichment is writing (causes WAL bloat + potential freeze)
diff --git a/README.md b/README.md
index 365ebdee..fb72585b 100644
--- a/README.md
+++ b/README.md
@@ -1,22 +1,22 @@
# BrainLayer
-> Persistent memory and knowledge graph for AI agents — 9 MCP tools, real-time indexing hooks, and a native macOS daemon for always-on recall across every conversation.
+> Persistent memory and knowledge graph for AI agents — 11 MCP tools, real-time JSONL watcher, Axiom telemetry, and a native macOS daemon for always-on recall across every conversation.
[](https://pypi.org/project/brainlayer/)
[](https://github.com/EtanHey/brainlayer/actions/workflows/ci.yml)
[](LICENSE)
[](https://www.python.org/downloads/)
-[](https://modelcontextprotocol.io)
-[](#testing)
+[](https://modelcontextprotocol.io)
+[](#testing)
[](https://etanhey.github.io/brainlayer)
---
-**224,000+ chunks indexed** · **1,083 Python + 54 Swift tests** · **Real-time indexing hooks** · **9 MCP tools** · **BrainBar daemon (209KB)** · **Zero cloud dependencies**
+**281,000+ chunks indexed** · **1,204 Python + 54 Swift tests** · **Real-time JSONL watcher** · **11 MCP tools** · **Axiom telemetry** · **BrainBar daemon (209KB)**
**Your AI agent forgets everything between sessions.** Every architecture decision, every debugging session, every preference you've expressed — gone. You repeat yourself constantly.
-BrainLayer fixes this. It's a **local-first memory layer** that gives any MCP-compatible AI agent the ability to remember, think, and recall across conversations. Includes **BrainBar** — a 209KB native macOS daemon that provides always-on memory access.
+BrainLayer fixes this. It's a **local-first memory layer** that gives any MCP-compatible AI agent the ability to remember, think, and recall across conversations. Features a **real-time JSONL watcher** that indexes conversations within seconds, **chunk lifecycle management** (supersede, archive, search filtering), and **BrainBar** — a 209KB native macOS daemon for always-on memory access.
```
"What approach did I use for auth last month?" → brain_search
@@ -93,40 +93,45 @@ That's it. Your agent now has persistent memory across every conversation.
```mermaid
graph LR
- A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server
9 tools"]
+ A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server
11 tools"]
B --> C["Hybrid Search
semantic + keyword (RRF)"]
C --> D["SQLite + sqlite-vec
single .db file"]
B --> KG["Knowledge Graph
entities + relations"]
KG --> D
- E["Claude Code JSONL
conversations"] --> F["Pipeline"]
- F -->|extract → classify → chunk → embed| D
- G["Local LLM
Ollama / MLX"] -->|enrich| D
+ E["Claude Code JSONL
conversations"] --> W["Real-time Watcher
~1s polling + filters"]
+ W -->|classify → chunk → insert| D
+ F["Batch Pipeline"] -->|extract → classify → chunk → embed| D
+ G["Gemini / Groq"] -->|enrich| D
- H["Real-time Hooks"] -->|live per-message| D
+ H["Session Hooks"] -->|dedup coordination| D
I["BrainBar
macOS daemon"] -->|Unix socket MCP| B
+ J["Axiom"] -.->|telemetry| W
```
-**Everything runs locally.** No cloud accounts, no API keys, no Docker, no database servers.
+**Everything runs locally.** No cloud accounts required — Axiom telemetry and cloud enrichment are optional.
| Component | Implementation |
|-----------|---------------|
| Storage | SQLite + [sqlite-vec](https://github.com/asg017/sqlite-vec) (single `.db` file, WAL mode) |
| Embeddings | `bge-large-en-v1.5` via sentence-transformers (1024 dims, runs on CPU/MPS) |
| Search | Hybrid: vector similarity + FTS5 keyword, merged with Reciprocal Rank Fusion |
-| Enrichment | Local LLM via Ollama or MLX — 10-field metadata per chunk |
+| Real-time watcher | Polls `~/.claude/projects/` JSONL files (~1s), 4-layer content filters, offset-persistent |
+| Chunk lifecycle | Supersede, archive, search filtering — stale knowledge managed, not lost |
+| Enrichment | Gemini / Groq cloud or local LLM (Ollama / MLX) — 10-field metadata per chunk |
| MCP Server | stdio-based, MCP SDK v1.26+, compatible with any MCP client |
-| Clustering | Leiden + UMAP for brain graph visualization (optional) |
+| Telemetry | Axiom (`brainlayer-watcher` dataset) — flush metrics, errors, heartbeat, rewind detection |
+| Session dedup | Hook coordination file prevents duplicate chunk injection across session lifecycle |
| BrainBar | Native macOS daemon (209KB Swift binary) — always-on MCP over Unix socket |
-## MCP Tools (9)
+## MCP Tools (11)
### Core (4)
| Tool | Description |
|------|-------------|
-| `brain_search` | Semantic search — unified search across query, file_path, chunk_id, filters. |
-| `brain_store` | Persist memories — ideas, decisions, learnings, mistakes. Auto-type/auto-importance. |
+| `brain_search` | Semantic search — unified search across query, file_path, chunk_id, filters. Lifecycle-aware: excludes superseded/archived by default. |
+| `brain_store` | Persist memories — ideas, decisions, learnings, mistakes. Auto-type/auto-importance. Optional `supersedes` param for atomic store-and-replace. |
| `brain_recall` | Proactive retrieval — current context, sessions, session summaries. |
| `brain_tags` | Browse and filter by tag — discover what's in memory without a search query. |
@@ -140,6 +145,13 @@ graph LR
| `brain_update` | Update, archive, or merge existing memories. |
| `brain_get_person` | Person lookup — entity details, interactions, preferences (~200-500ms). |
+### Lifecycle (2)
+
+| Tool | Description |
+|------|-------------|
+| `brain_supersede` | Mark old memory as replaced by new one. Safety gate: personal data requires explicit confirmation. |
+| `brain_archive` | Soft-delete with timestamp. Excluded from default search, accessible via direct lookup. |
+
### Backward Compatibility
All 14 old `brainlayer_*` names still work as aliases.
@@ -161,44 +173,47 @@ BrainLayer enriches each chunk with 10 structured metadata fields using a local
| `debt_impact` | `introduction`, `resolution`, `none` |
| `external_deps` | "grammy, Supabase, Railway" |
-Three enrichment backends (auto-detect: MLX → Ollama → Groq, override via `BRAINLAYER_ENRICH_BACKEND`):
+Three enrichment backends (override via `BRAINLAYER_ENRICH_BACKEND`):
| Backend | Best for | Speed |
|---------|----------|-------|
-| **Groq** (cloud) | When local LLMs are unavailable | ~1-2s/chunk |
-| **MLX** (Apple Silicon) | M1/M2/M3 Macs (preferred) | 21-87% faster than Ollama |
-| **Ollama** | Any platform | ~1s/chunk (short), ~13s (long) |
+| **Groq** (cloud) | Primary — fast, reliable | ~1-2s/chunk |
+| **Gemini** (cloud) | Batch enrichment via `enrichment_controller.py` | ~0.6s/chunk |
+| **Ollama** (local) | Offline fallback | ~1-13s/chunk |
```bash
-brainlayer enrich # Default backend (auto-detects)
-BRAINLAYER_ENRICH_BACKEND=groq brainlayer enrich --batch-size=100
+brainlayer enrich # Default backend
+brainlayer watch # Real-time JSONL watcher (persistent)
```
## Why BrainLayer?
| | BrainLayer | Mem0 | Zep/Graphiti | Letta | LangChain Memory |
|---|:---:|:---:|:---:|:---:|:---:|
-| **MCP native** | 9 tools | 1 server | 1 server | No | No |
+| **MCP native** | 11 tools | 1 server | 1 server | No | No |
| **Think / Recall** | Yes | No | No | No | No |
+| **Chunk lifecycle** | Supersede/archive | Auto-dedup | No | No | No |
+| **Real-time watcher** | ~1s JSONL polling | No | No | No | No |
| **Local-first** | SQLite | Cloud-first | Cloud-only | Docker+PG | Framework |
| **Zero infra** | `pip install` | API key | API key | Docker | Multiple deps |
| **Multi-source** | 7 sources | API only | API only | API only | API only |
| **Enrichment** | 10 fields | Basic | Temporal | Self-write | None |
-| **Session analysis** | Yes | No | No | No | No |
-| **Real-time** | Per-message hooks | No | No | No | No |
+| **Telemetry** | Axiom | No | No | No | No |
| **Open source** | Apache 2.0 | Apache 2.0 | Source-available | Apache 2.0 | MIT |
BrainLayer is the only memory layer that:
-1. **Thinks before answering** — categorizes past knowledge by intent (decisions, bugs, patterns) instead of raw search results
-2. **Runs on a single file** — no database servers, no Docker, no cloud accounts
-3. **Works with every MCP client** — 9 tools, instant integration, zero SDK
-4. **Knowledge graph** — entities, relations, and person lookup across all indexed data
+1. **Indexes in real-time** — JSONL watcher ingests conversations within seconds, not hours
+2. **Manages knowledge lifecycle** — supersede stale facts, archive old decisions, search only current knowledge
+3. **Runs on a single file** — no database servers, no Docker, no cloud accounts
+4. **Works with every MCP client** — 11 tools, instant integration, zero SDK
+5. **Knowledge graph** — entities, relations, and person lookup across all indexed data
## CLI Reference
```bash
brainlayer init # Interactive setup wizard
-brainlayer index # Index new conversations
+brainlayer index # Batch index conversations
+brainlayer watch # Real-time JSONL watcher (persistent, ~1s latency)
brainlayer search "query" # Semantic + keyword search
brainlayer enrich # Run LLM enrichment on new chunks
brainlayer enrich-sessions # Session-level analysis (decisions, learnings)
@@ -227,6 +242,8 @@ All configuration is via environment variables:
| `GROQ_API_KEY` | (unset) | Groq API key for cloud enrichment backend |
| `BRAINLAYER_GROQ_URL` | `https://api.groq.com/openai/v1/chat/completions` | Groq API endpoint |
| `BRAINLAYER_GROQ_MODEL` | `llama-3.3-70b-versatile` | Groq model for enrichment |
+| `AXIOM_TOKEN` | (unset) | Axiom API token for watcher telemetry (optional) |
+| `BRAINLAYER_ENRICH_RATE` | `0.2` | Enrichment requests per second (0.2 = 12 RPM) |
## Optional Extras
@@ -237,6 +254,7 @@ pip install "brainlayer[youtube]" # YouTube transcript indexing
pip install "brainlayer[ast]" # AST-aware code chunking (tree-sitter)
pip install "brainlayer[kg]" # GliNER entity extraction (209M params, EN+HE)
pip install "brainlayer[style]" # ChromaDB vector store (alternative backend)
+pip install "brainlayer[telemetry]" # Axiom observability (optional — degrades gracefully)
pip install "brainlayer[dev]" # Development: pytest, ruff
```
@@ -253,13 +271,13 @@ BrainLayer can index conversations from multiple sources:
| Codex CLI | JSONL (`~/.codex/sessions`) | `brainlayer ingest-codex` |
| Markdown | Any `.md` files | `brainlayer index --source markdown` |
| Manual | Via MCP tool | `brain_store` |
-| Real-time | Claude Code hooks | Live per-message indexing (zero-lag) |
+| Real-time | `brainlayer watch` LaunchAgent | JSONL watcher (~1s latency, 4-layer filters, checkpoint rewind detection) |
## Testing
```bash
pip install -e ".[dev]"
-pytest tests/ # Full suite (1,083 Python tests)
+pytest tests/ # Full suite (1,204 Python tests)
pytest tests/ -m "not integration" # Unit tests only (fast)
ruff check src/ # Linting
# BrainBar (Swift): 54 tests via Xcode
diff --git a/src/brainlayer/enrichment_controller.py b/src/brainlayer/enrichment_controller.py
index 508eb532..922b77f6 100644
--- a/src/brainlayer/enrichment_controller.py
+++ b/src/brainlayer/enrichment_controller.py
@@ -114,7 +114,9 @@ def enrich_realtime(
store,
limit: int = 25,
since_hours: int = 24,
- rate_per_second: float = 0.2, # 12 RPM — safe for Gemini free-tier 15 RPM limit
+ rate_per_second: float = float(
+ os.environ.get("BRAINLAYER_ENRICH_RATE", "0.2")
+ ), # Default 12 RPM. Tier 1 allows 2000 RPM (~33/s)
max_retries: int = 12,
chunk_ids: list[str] | None = None,
) -> EnrichmentResult: