LLM Memory Compression Lab

An interactive research lab for exploring how different LLM architectures handle long-term memory. Visualizing compression, forgetting curves, and retrieval tradeoffs. Styled as an interactive academic paper.

Why This Project Exists

As LLMs are deployed in longer-running applications: agents, assistants, multi-session tools, the question of what they remember, what they forget, and why becomes practically important and surprisingly nuanced.

Most practitioners encounter this as a token budget problem: "How do I fit more context into fewer tokens?" But the real problem is richer. Different memory architectures make fundamentally different tradeoffs:

A sliding window gives you perfect recall within its range and zero beyond it.
A summarization system trades fidelity for reach. You can go further back, but details blur.
A RAG system remembers everything in theory, but retrieval quality degrades with corpus size and query complexity.
A hierarchical system tries to have it all: recent verbatim, medium compressed, distant as keywords, but the tier transitions create their own distortions.
A graph memory system treats facts as a knowledge graph, where well-connected entities resist forgetting while isolated ones decay faster.

These tradeoffs aren't often visualized. They show up as subtle regressions: an agent that forgets a user preference from 30 turns ago, a retrieval system that breaks on multi-hop questions, a summarizer that discards the one specific number that mattered.

This lab was built to make those tradeoffs visible and interactive: to turn abstract architectural decisions into something you can see, manipulate, and develop intuition about.

What's Inside

The app is structured as six interactive modules, each exploring a different dimension of LLM memory:

#	Module	What it shows
1	Memory Decay Playground	How a single fact's retention score changes across conversation turns under each architecture. Ebbinghaus forgetting curves adapted per architecture.
2	Compression Explorer	Token-level importance heatmap of a conversation + side-by-side diff of what survives at different compression ratios.
3	Retrieval Accuracy Benchmark	Interactive precision/recall charts driven by parameter sliders: chunk size, top-k, embedding model, corpus size, query complexity.
4	Architecture Comparator	Synchronized side-by-side playback of 5 architectures processing the same conversation. Animated token flow, SVG diagrams, shared memory utilization timeline.
5	Context Window Visualizer	D3 stacked bar showing how context fills up. Compare four eviction strategies (FIFO, importance-based, recency-weighted, LRU). Drag-and-drop to manually prioritize segments.
6	Graph Memory Explorer	D3 force-directed knowledge graph growing turn-by-turn. Graph traversal vs. vector similarity comparison. Temporal timeline showing relationship creation and invalidation (inspired by Graphiti, Mem0, and Microsoft GraphRAG).

Tech Stack

Frontend: Next.js 14 (App Router) · TypeScript · Tailwind CSS · Zustand · Recharts · D3.js · Framer Motion · @dnd-kit

Backend: FastAPI · Python · NumPy · NetworkX · Anthropic SDK

Infrastructure: Docker · nginx · Docker Compose

Design philosophy: Clean minimal / technical paper. CMU Serif headings, numbered figures with captions, muted academic color palette. Every visualization is a "Figure N" container like a published paper. Six architectures each have a fixed color: sliding window=blue, summarization=purple, RAG=teal, hierarchical=amber, infinite=gray, graph memory=pink.

Project Structure

LLM-Memory-Compression-Lab/
├── frontend/               # Next.js application
│   └── src/
│       ├── app/            # Routes: /, /memory-decay, /compression, /retrieval,
│       │                   #         /architecture, /context-window, /graph-memory
│       ├── components/     # layout/, paper/, shared/, charts/, visualizations/, modules/
│       ├── lib/            # simulation/, data/, llm/, utils/
│       ├── hooks/          # useSimulation, useAnimationFrame, useDebouncedValue, etc.
│       ├── stores/         # Zustand simulation store
│       └── types/          # TypeScript definitions for all simulation types
│
├── backend/                # FastAPI application
│   └── app/
│       ├── simulation/     # compression.py, architecture.py, context_window.py, graph_memory.py
│       ├── routers/        # simulation.py, llm.py
│       └── main.py
│
├── docker-compose.yml
├── docker-compose.prod.yml
└── nginx.conf

Getting Started

With Docker (recommended)

cp .env.example .env          # optionally add ANTHROPIC_API_KEY for live LLM features
docker compose up --build

App: http://localhost · API: http://localhost/api

Local development

Frontend:

cd frontend
npm install
npm run dev       # http://localhost:3000

Backend:

cd backend
pip install -r requirements.txt
uvicorn app.main:main --reload --port 8000

Optional: Live LLM Features

Set ANTHROPIC_API_KEY in your .env to unlock:

Module 1: generate realistic filler conversation turns
Module 2: dynamic token importance scoring for your own text
Module 3: natural language summaries of benchmark findings
Module 6: live entity/relationship extraction from any conversation

Every module works fully without the API key. The "Use Live LLM" toggle only appears when a key is detected. The key is server-side only and never sent to the browser.

Graph Memory Frameworks Referenced

Module 6 is directly inspired by real production graph memory systems:

Framework	Key idea
Mem0	Entity-relationship triple store with conflict detection and LLM-based resolution
Graphiti (Zep)	Bi-temporal knowledge graph that tracks both when a fact occurred and when it was ingested
Microsoft GraphRAG	Hierarchical community detection (Leiden algorithm) for corpus-level reasoning
Letta / MemGPT	OS-inspired two-tier memory where the agent manages its own memory via tool calls
Cognee	RDF-based ontology extraction across 30+ source types

Architecture Color Reference

Architecture	Color
Sliding Window	Blue `#3B82F6`
Summarization	Purple `#8B5CF6`
RAG	Teal `#14B8A6`
Hierarchical	Amber `#F59E0B`
Infinite Attention	Gray `#6B7280`
Graph Memory	Pink `#EC4899`

These colors are consistent across every chart, diagram, and animation in the app.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
backend		backend
frontend		frontend
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Memory Compression Lab

Why This Project Exists

What's Inside

Tech Stack

Project Structure

Getting Started

With Docker (recommended)

Local development

Optional: Live LLM Features

Graph Memory Frameworks Referenced

Architecture Color Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Memory Compression Lab

Why This Project Exists

What's Inside

Tech Stack

Project Structure

Getting Started

With Docker (recommended)

Local development

Optional: Live LLM Features

Graph Memory Frameworks Referenced

Architecture Color Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages