diff --git a/.gitignore b/.gitignore index cb88ec0c..1c45f705 100644 --- a/.gitignore +++ b/.gitignore @@ -57,3 +57,10 @@ claude.scratchpad.md # MCP config (contains secrets) .mcp.json +.mcp.json.bak + +# Claude Code project settings +.claude/ + +# Progress trackers (ephemeral) +scripts/.kg_rebuild_progress.json diff --git a/CLAUDE.md b/CLAUDE.md index c9b6d6ce..083642d1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -50,27 +50,53 @@ brainlayer enrich - Chunking: AST-aware (tree-sitter); never split stack traces; mask large tool output ## Enrichment -- Primary backend: **MLX** (`Qwen2.5-Coder-14B-Instruct-4bit`) on Apple Silicon (port 8080) -- Fallback: Ollama (`glm-4.7-flash`) on port 11434, auto-switches after 3 consecutive MLX failures +- Primary backend: **Groq** (cloud, configured in launchd plist) +- Fallback: Gemini via `enrichment_controller.py`, Ollama as offline last-resort - Override with `BRAINLAYER_ENRICH_BACKEND=ollama|mlx|groq` +- Rate configurable via `BRAINLAYER_ENRICH_RATE` env var (default 0.2 = 12 RPM) - Adds metadata (summary, tags, importance, intent); session enrichment captures decisions/corrections ## Interfaces - Daemon API (core): `/health`, `/stats`, `/search`, `/context/{chunk_id}`, `/session/{session_id}` - Brain graph API: `/brain/graph`, `/brain/node/{node_id}` - Backlog API: `/backlog/items` (GET/POST/PATCH/DELETE) -- MCP tools (9): `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person`, `brain_tags` (legacy `brainlayer_*` aliases still work) +- MCP tools (11): `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_get_person`, `brain_tags`, `brain_supersede`, `brain_archive` (legacy `brainlayer_*` aliases still work) - MCP server entrypoint: `brainlayer-mcp` ## Exports - `brainlayer brain-export` -> graph JSON for dashboard - `brainlayer export-obsidian` -> Markdown vault (backlinks + tags) +## Real-time JSONL Watcher +- `brainlayer watch` — persistent watcher for `~/.claude/projects/*.jsonl` +- LaunchAgent: `com.brainlayer.watch.plist` (KeepAlive, Nice=10) +- 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip +- Offset persistence: `~/.local/share/brainlayer/offsets.json` (survives restarts) +- Rewind detection: file shrink = checkpoint restore → soft-archives reverted chunks +- Axiom telemetry: startup, flush, error, heartbeat (60s) to `brainlayer-watcher` dataset +- Source: `watcher.py` (tailer + indexer), `watcher_bridge.py` (pipeline integration) + +## Chunk Lifecycle +- Columns: `superseded_by`, `aggregated_into`, `archived_at` on chunks table +- Default search excludes lifecycle-managed chunks; `include_archived=True` shows history +- `brain_supersede`: safety gate for personal data (journals, notes, health/finance) +- `brain_archive`: soft-delete with timestamp +- `brain_store` gains `supersedes` param for atomic store-and-replace + +## Session Dedup Coordination +- `/tmp/brainlayer_session_{id}.json` — shared between SessionStart and UserPromptSubmit hooks +- SessionStart writes injected chunk_ids; UserPromptSubmit skips already-injected +- Handoff detection: prompts with "handoff", "session-handoff" skip auto-search +- Module: `hooks/dedup_coordination.py` + ## Data & Locks - DB: `~/.local/share/brainlayer/brainlayer.db` +- Watcher offsets: `~/.local/share/brainlayer/offsets.json` - Prompts cache: `~/.local/share/brainlayer/prompts/` +- Watcher logs: `~/.local/share/brainlayer/logs/watch.{log,err}` - Socket: `/tmp/brainlayer.sock` - Enrichment lock: `/tmp/brainlayer-enrichment.lock` +- Session dedup: `/tmp/brainlayer_session_*.json` ## Bulk DB Operations (SAFETY) 1. **Stop enrichment workers first** — never run bulk ops while enrichment is writing (causes WAL bloat + potential freeze) diff --git a/README.md b/README.md index 365ebdee..fb72585b 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,22 @@ # BrainLayer -> Persistent memory and knowledge graph for AI agents — 9 MCP tools, real-time indexing hooks, and a native macOS daemon for always-on recall across every conversation. +> Persistent memory and knowledge graph for AI agents — 11 MCP tools, real-time JSONL watcher, Axiom telemetry, and a native macOS daemon for always-on recall across every conversation. [![PyPI](https://img.shields.io/pypi/v/brainlayer.svg)](https://pypi.org/project/brainlayer/) [![CI](https://github.com/EtanHey/brainlayer/actions/workflows/ci.yml/badge.svg)](https://github.com/EtanHey/brainlayer/actions/workflows/ci.yml) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE) [![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/) -[![MCP](https://img.shields.io/badge/MCP-9%20tools-green.svg)](https://modelcontextprotocol.io) -[![Tests](https://img.shields.io/badge/tests-1%2C083%20Python%20%2B%2054%20Swift-brightgreen.svg)](#testing) +[![MCP](https://img.shields.io/badge/MCP-11%20tools-green.svg)](https://modelcontextprotocol.io) +[![Tests](https://img.shields.io/badge/tests-1%2C204%20Python%20%2B%2054%20Swift-brightgreen.svg)](#testing) [![Docs](https://img.shields.io/badge/docs-etanhey.github.io%2Fbrainlayer-blue.svg)](https://etanhey.github.io/brainlayer) --- -**224,000+ chunks indexed** · **1,083 Python + 54 Swift tests** · **Real-time indexing hooks** · **9 MCP tools** · **BrainBar daemon (209KB)** · **Zero cloud dependencies** +**281,000+ chunks indexed** · **1,204 Python + 54 Swift tests** · **Real-time JSONL watcher** · **11 MCP tools** · **Axiom telemetry** · **BrainBar daemon (209KB)** **Your AI agent forgets everything between sessions.** Every architecture decision, every debugging session, every preference you've expressed — gone. You repeat yourself constantly. -BrainLayer fixes this. It's a **local-first memory layer** that gives any MCP-compatible AI agent the ability to remember, think, and recall across conversations. Includes **BrainBar** — a 209KB native macOS daemon that provides always-on memory access. +BrainLayer fixes this. It's a **local-first memory layer** that gives any MCP-compatible AI agent the ability to remember, think, and recall across conversations. Features a **real-time JSONL watcher** that indexes conversations within seconds, **chunk lifecycle management** (supersede, archive, search filtering), and **BrainBar** — a 209KB native macOS daemon for always-on memory access. ``` "What approach did I use for auth last month?" → brain_search @@ -93,40 +93,45 @@ That's it. Your agent now has persistent memory across every conversation. ```mermaid graph LR - A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server
9 tools"] + A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server
11 tools"] B --> C["Hybrid Search
semantic + keyword (RRF)"] C --> D["SQLite + sqlite-vec
single .db file"] B --> KG["Knowledge Graph
entities + relations"] KG --> D - E["Claude Code JSONL
conversations"] --> F["Pipeline"] - F -->|extract → classify → chunk → embed| D - G["Local LLM
Ollama / MLX"] -->|enrich| D + E["Claude Code JSONL
conversations"] --> W["Real-time Watcher
~1s polling + filters"] + W -->|classify → chunk → insert| D + F["Batch Pipeline"] -->|extract → classify → chunk → embed| D + G["Gemini / Groq"] -->|enrich| D - H["Real-time Hooks"] -->|live per-message| D + H["Session Hooks"] -->|dedup coordination| D I["BrainBar
macOS daemon"] -->|Unix socket MCP| B + J["Axiom"] -.->|telemetry| W ``` -**Everything runs locally.** No cloud accounts, no API keys, no Docker, no database servers. +**Everything runs locally.** No cloud accounts required — Axiom telemetry and cloud enrichment are optional. | Component | Implementation | |-----------|---------------| | Storage | SQLite + [sqlite-vec](https://github.com/asg017/sqlite-vec) (single `.db` file, WAL mode) | | Embeddings | `bge-large-en-v1.5` via sentence-transformers (1024 dims, runs on CPU/MPS) | | Search | Hybrid: vector similarity + FTS5 keyword, merged with Reciprocal Rank Fusion | -| Enrichment | Local LLM via Ollama or MLX — 10-field metadata per chunk | +| Real-time watcher | Polls `~/.claude/projects/` JSONL files (~1s), 4-layer content filters, offset-persistent | +| Chunk lifecycle | Supersede, archive, search filtering — stale knowledge managed, not lost | +| Enrichment | Gemini / Groq cloud or local LLM (Ollama / MLX) — 10-field metadata per chunk | | MCP Server | stdio-based, MCP SDK v1.26+, compatible with any MCP client | -| Clustering | Leiden + UMAP for brain graph visualization (optional) | +| Telemetry | Axiom (`brainlayer-watcher` dataset) — flush metrics, errors, heartbeat, rewind detection | +| Session dedup | Hook coordination file prevents duplicate chunk injection across session lifecycle | | BrainBar | Native macOS daemon (209KB Swift binary) — always-on MCP over Unix socket | -## MCP Tools (9) +## MCP Tools (11) ### Core (4) | Tool | Description | |------|-------------| -| `brain_search` | Semantic search — unified search across query, file_path, chunk_id, filters. | -| `brain_store` | Persist memories — ideas, decisions, learnings, mistakes. Auto-type/auto-importance. | +| `brain_search` | Semantic search — unified search across query, file_path, chunk_id, filters. Lifecycle-aware: excludes superseded/archived by default. | +| `brain_store` | Persist memories — ideas, decisions, learnings, mistakes. Auto-type/auto-importance. Optional `supersedes` param for atomic store-and-replace. | | `brain_recall` | Proactive retrieval — current context, sessions, session summaries. | | `brain_tags` | Browse and filter by tag — discover what's in memory without a search query. | @@ -140,6 +145,13 @@ graph LR | `brain_update` | Update, archive, or merge existing memories. | | `brain_get_person` | Person lookup — entity details, interactions, preferences (~200-500ms). | +### Lifecycle (2) + +| Tool | Description | +|------|-------------| +| `brain_supersede` | Mark old memory as replaced by new one. Safety gate: personal data requires explicit confirmation. | +| `brain_archive` | Soft-delete with timestamp. Excluded from default search, accessible via direct lookup. | + ### Backward Compatibility All 14 old `brainlayer_*` names still work as aliases. @@ -161,44 +173,47 @@ BrainLayer enriches each chunk with 10 structured metadata fields using a local | `debt_impact` | `introduction`, `resolution`, `none` | | `external_deps` | "grammy, Supabase, Railway" | -Three enrichment backends (auto-detect: MLX → Ollama → Groq, override via `BRAINLAYER_ENRICH_BACKEND`): +Three enrichment backends (override via `BRAINLAYER_ENRICH_BACKEND`): | Backend | Best for | Speed | |---------|----------|-------| -| **Groq** (cloud) | When local LLMs are unavailable | ~1-2s/chunk | -| **MLX** (Apple Silicon) | M1/M2/M3 Macs (preferred) | 21-87% faster than Ollama | -| **Ollama** | Any platform | ~1s/chunk (short), ~13s (long) | +| **Groq** (cloud) | Primary — fast, reliable | ~1-2s/chunk | +| **Gemini** (cloud) | Batch enrichment via `enrichment_controller.py` | ~0.6s/chunk | +| **Ollama** (local) | Offline fallback | ~1-13s/chunk | ```bash -brainlayer enrich # Default backend (auto-detects) -BRAINLAYER_ENRICH_BACKEND=groq brainlayer enrich --batch-size=100 +brainlayer enrich # Default backend +brainlayer watch # Real-time JSONL watcher (persistent) ``` ## Why BrainLayer? | | BrainLayer | Mem0 | Zep/Graphiti | Letta | LangChain Memory | |---|:---:|:---:|:---:|:---:|:---:| -| **MCP native** | 9 tools | 1 server | 1 server | No | No | +| **MCP native** | 11 tools | 1 server | 1 server | No | No | | **Think / Recall** | Yes | No | No | No | No | +| **Chunk lifecycle** | Supersede/archive | Auto-dedup | No | No | No | +| **Real-time watcher** | ~1s JSONL polling | No | No | No | No | | **Local-first** | SQLite | Cloud-first | Cloud-only | Docker+PG | Framework | | **Zero infra** | `pip install` | API key | API key | Docker | Multiple deps | | **Multi-source** | 7 sources | API only | API only | API only | API only | | **Enrichment** | 10 fields | Basic | Temporal | Self-write | None | -| **Session analysis** | Yes | No | No | No | No | -| **Real-time** | Per-message hooks | No | No | No | No | +| **Telemetry** | Axiom | No | No | No | No | | **Open source** | Apache 2.0 | Apache 2.0 | Source-available | Apache 2.0 | MIT | BrainLayer is the only memory layer that: -1. **Thinks before answering** — categorizes past knowledge by intent (decisions, bugs, patterns) instead of raw search results -2. **Runs on a single file** — no database servers, no Docker, no cloud accounts -3. **Works with every MCP client** — 9 tools, instant integration, zero SDK -4. **Knowledge graph** — entities, relations, and person lookup across all indexed data +1. **Indexes in real-time** — JSONL watcher ingests conversations within seconds, not hours +2. **Manages knowledge lifecycle** — supersede stale facts, archive old decisions, search only current knowledge +3. **Runs on a single file** — no database servers, no Docker, no cloud accounts +4. **Works with every MCP client** — 11 tools, instant integration, zero SDK +5. **Knowledge graph** — entities, relations, and person lookup across all indexed data ## CLI Reference ```bash brainlayer init # Interactive setup wizard -brainlayer index # Index new conversations +brainlayer index # Batch index conversations +brainlayer watch # Real-time JSONL watcher (persistent, ~1s latency) brainlayer search "query" # Semantic + keyword search brainlayer enrich # Run LLM enrichment on new chunks brainlayer enrich-sessions # Session-level analysis (decisions, learnings) @@ -227,6 +242,8 @@ All configuration is via environment variables: | `GROQ_API_KEY` | (unset) | Groq API key for cloud enrichment backend | | `BRAINLAYER_GROQ_URL` | `https://api.groq.com/openai/v1/chat/completions` | Groq API endpoint | | `BRAINLAYER_GROQ_MODEL` | `llama-3.3-70b-versatile` | Groq model for enrichment | +| `AXIOM_TOKEN` | (unset) | Axiom API token for watcher telemetry (optional) | +| `BRAINLAYER_ENRICH_RATE` | `0.2` | Enrichment requests per second (0.2 = 12 RPM) | ## Optional Extras @@ -237,6 +254,7 @@ pip install "brainlayer[youtube]" # YouTube transcript indexing pip install "brainlayer[ast]" # AST-aware code chunking (tree-sitter) pip install "brainlayer[kg]" # GliNER entity extraction (209M params, EN+HE) pip install "brainlayer[style]" # ChromaDB vector store (alternative backend) +pip install "brainlayer[telemetry]" # Axiom observability (optional — degrades gracefully) pip install "brainlayer[dev]" # Development: pytest, ruff ``` @@ -253,13 +271,13 @@ BrainLayer can index conversations from multiple sources: | Codex CLI | JSONL (`~/.codex/sessions`) | `brainlayer ingest-codex` | | Markdown | Any `.md` files | `brainlayer index --source markdown` | | Manual | Via MCP tool | `brain_store` | -| Real-time | Claude Code hooks | Live per-message indexing (zero-lag) | +| Real-time | `brainlayer watch` LaunchAgent | JSONL watcher (~1s latency, 4-layer filters, checkpoint rewind detection) | ## Testing ```bash pip install -e ".[dev]" -pytest tests/ # Full suite (1,083 Python tests) +pytest tests/ # Full suite (1,204 Python tests) pytest tests/ -m "not integration" # Unit tests only (fast) ruff check src/ # Linting # BrainBar (Swift): 54 tests via Xcode diff --git a/src/brainlayer/enrichment_controller.py b/src/brainlayer/enrichment_controller.py index 508eb532..922b77f6 100644 --- a/src/brainlayer/enrichment_controller.py +++ b/src/brainlayer/enrichment_controller.py @@ -114,7 +114,9 @@ def enrich_realtime( store, limit: int = 25, since_hours: int = 24, - rate_per_second: float = 0.2, # 12 RPM — safe for Gemini free-tier 15 RPM limit + rate_per_second: float = float( + os.environ.get("BRAINLAYER_ENRICH_RATE", "0.2") + ), # Default 12 RPM. Tier 1 allows 2000 RPM (~33/s) max_retries: int = 12, chunk_ids: list[str] | None = None, ) -> EnrichmentResult: