An interactive research lab for exploring how different LLM architectures handle long-term memory. Visualizing compression, forgetting curves, and retrieval tradeoffs. Styled as an interactive academic paper.
As LLMs are deployed in longer-running applications: agents, assistants, multi-session tools, the question of what they remember, what they forget, and why becomes practically important and surprisingly nuanced.
Most practitioners encounter this as a token budget problem: "How do I fit more context into fewer tokens?" But the real problem is richer. Different memory architectures make fundamentally different tradeoffs:
- A sliding window gives you perfect recall within its range and zero beyond it.
- A summarization system trades fidelity for reach. You can go further back, but details blur.
- A RAG system remembers everything in theory, but retrieval quality degrades with corpus size and query complexity.
- A hierarchical system tries to have it all: recent verbatim, medium compressed, distant as keywords, but the tier transitions create their own distortions.
- A graph memory system treats facts as a knowledge graph, where well-connected entities resist forgetting while isolated ones decay faster.
These tradeoffs aren't often visualized. They show up as subtle regressions: an agent that forgets a user preference from 30 turns ago, a retrieval system that breaks on multi-hop questions, a summarizer that discards the one specific number that mattered.
This lab was built to make those tradeoffs visible and interactive: to turn abstract architectural decisions into something you can see, manipulate, and develop intuition about.
The app is structured as six interactive modules, each exploring a different dimension of LLM memory:
| # | Module | What it shows |
|---|---|---|
| 1 | Memory Decay Playground | How a single fact's retention score changes across conversation turns under each architecture. Ebbinghaus forgetting curves adapted per architecture. |
| 2 | Compression Explorer | Token-level importance heatmap of a conversation + side-by-side diff of what survives at different compression ratios. |
| 3 | Retrieval Accuracy Benchmark | Interactive precision/recall charts driven by parameter sliders: chunk size, top-k, embedding model, corpus size, query complexity. |
| 4 | Architecture Comparator | Synchronized side-by-side playback of 5 architectures processing the same conversation. Animated token flow, SVG diagrams, shared memory utilization timeline. |
| 5 | Context Window Visualizer | D3 stacked bar showing how context fills up. Compare four eviction strategies (FIFO, importance-based, recency-weighted, LRU). Drag-and-drop to manually prioritize segments. |
| 6 | Graph Memory Explorer | D3 force-directed knowledge graph growing turn-by-turn. Graph traversal vs. vector similarity comparison. Temporal timeline showing relationship creation and invalidation (inspired by Graphiti, Mem0, and Microsoft GraphRAG). |
Frontend: Next.js 14 (App Router) · TypeScript · Tailwind CSS · Zustand · Recharts · D3.js · Framer Motion · @dnd-kit
Backend: FastAPI · Python · NumPy · NetworkX · Anthropic SDK
Infrastructure: Docker · nginx · Docker Compose
Design philosophy: Clean minimal / technical paper. CMU Serif headings, numbered figures with captions, muted academic color palette. Every visualization is a "Figure N" container like a published paper. Six architectures each have a fixed color: sliding window=blue, summarization=purple, RAG=teal, hierarchical=amber, infinite=gray, graph memory=pink.
LLM-Memory-Compression-Lab/
├── frontend/ # Next.js application
│ └── src/
│ ├── app/ # Routes: /, /memory-decay, /compression, /retrieval,
│ │ # /architecture, /context-window, /graph-memory
│ ├── components/ # layout/, paper/, shared/, charts/, visualizations/, modules/
│ ├── lib/ # simulation/, data/, llm/, utils/
│ ├── hooks/ # useSimulation, useAnimationFrame, useDebouncedValue, etc.
│ ├── stores/ # Zustand simulation store
│ └── types/ # TypeScript definitions for all simulation types
│
├── backend/ # FastAPI application
│ └── app/
│ ├── simulation/ # compression.py, architecture.py, context_window.py, graph_memory.py
│ ├── routers/ # simulation.py, llm.py
│ └── main.py
│
├── docker-compose.yml
├── docker-compose.prod.yml
└── nginx.conf
cp .env.example .env # optionally add ANTHROPIC_API_KEY for live LLM features
docker compose up --buildApp: http://localhost · API: http://localhost/api
Frontend:
cd frontend
npm install
npm run dev # http://localhost:3000Backend:
cd backend
pip install -r requirements.txt
uvicorn app.main:main --reload --port 8000Set ANTHROPIC_API_KEY in your .env to unlock:
- Module 1: generate realistic filler conversation turns
- Module 2: dynamic token importance scoring for your own text
- Module 3: natural language summaries of benchmark findings
- Module 6: live entity/relationship extraction from any conversation
Every module works fully without the API key. The "Use Live LLM" toggle only appears when a key is detected. The key is server-side only and never sent to the browser.
Module 6 is directly inspired by real production graph memory systems:
| Framework | Key idea |
|---|---|
| Mem0 | Entity-relationship triple store with conflict detection and LLM-based resolution |
| Graphiti (Zep) | Bi-temporal knowledge graph that tracks both when a fact occurred and when it was ingested |
| Microsoft GraphRAG | Hierarchical community detection (Leiden algorithm) for corpus-level reasoning |
| Letta / MemGPT | OS-inspired two-tier memory where the agent manages its own memory via tool calls |
| Cognee | RDF-based ontology extraction across 30+ source types |
| Architecture | Color |
|---|---|
| Sliding Window | Blue #3B82F6 |
| Summarization | Purple #8B5CF6 |
| RAG | Teal #14B8A6 |
| Hierarchical | Amber #F59E0B |
| Infinite Attention | Gray #6B7280 |
| Graph Memory | Pink #EC4899 |
These colors are consistent across every chart, diagram, and animation in the app.