Skip to content

VPanjeta/LLM-Memory-Compression-Lab

Repository files navigation

LLM Memory Compression Lab

An interactive research lab for exploring how different LLM architectures handle long-term memory. Visualizing compression, forgetting curves, and retrieval tradeoffs. Styled as an interactive academic paper.


Why This Project Exists

As LLMs are deployed in longer-running applications: agents, assistants, multi-session tools, the question of what they remember, what they forget, and why becomes practically important and surprisingly nuanced.

Most practitioners encounter this as a token budget problem: "How do I fit more context into fewer tokens?" But the real problem is richer. Different memory architectures make fundamentally different tradeoffs:

  • A sliding window gives you perfect recall within its range and zero beyond it.
  • A summarization system trades fidelity for reach. You can go further back, but details blur.
  • A RAG system remembers everything in theory, but retrieval quality degrades with corpus size and query complexity.
  • A hierarchical system tries to have it all: recent verbatim, medium compressed, distant as keywords, but the tier transitions create their own distortions.
  • A graph memory system treats facts as a knowledge graph, where well-connected entities resist forgetting while isolated ones decay faster.

These tradeoffs aren't often visualized. They show up as subtle regressions: an agent that forgets a user preference from 30 turns ago, a retrieval system that breaks on multi-hop questions, a summarizer that discards the one specific number that mattered.

This lab was built to make those tradeoffs visible and interactive: to turn abstract architectural decisions into something you can see, manipulate, and develop intuition about.


What's Inside

The app is structured as six interactive modules, each exploring a different dimension of LLM memory:

# Module What it shows
1 Memory Decay Playground How a single fact's retention score changes across conversation turns under each architecture. Ebbinghaus forgetting curves adapted per architecture.
2 Compression Explorer Token-level importance heatmap of a conversation + side-by-side diff of what survives at different compression ratios.
3 Retrieval Accuracy Benchmark Interactive precision/recall charts driven by parameter sliders: chunk size, top-k, embedding model, corpus size, query complexity.
4 Architecture Comparator Synchronized side-by-side playback of 5 architectures processing the same conversation. Animated token flow, SVG diagrams, shared memory utilization timeline.
5 Context Window Visualizer D3 stacked bar showing how context fills up. Compare four eviction strategies (FIFO, importance-based, recency-weighted, LRU). Drag-and-drop to manually prioritize segments.
6 Graph Memory Explorer D3 force-directed knowledge graph growing turn-by-turn. Graph traversal vs. vector similarity comparison. Temporal timeline showing relationship creation and invalidation (inspired by Graphiti, Mem0, and Microsoft GraphRAG).

Tech Stack

Frontend: Next.js 14 (App Router) · TypeScript · Tailwind CSS · Zustand · Recharts · D3.js · Framer Motion · @dnd-kit

Backend: FastAPI · Python · NumPy · NetworkX · Anthropic SDK

Infrastructure: Docker · nginx · Docker Compose

Design philosophy: Clean minimal / technical paper. CMU Serif headings, numbered figures with captions, muted academic color palette. Every visualization is a "Figure N" container like a published paper. Six architectures each have a fixed color: sliding window=blue, summarization=purple, RAG=teal, hierarchical=amber, infinite=gray, graph memory=pink.


Project Structure

LLM-Memory-Compression-Lab/
├── frontend/               # Next.js application
│   └── src/
│       ├── app/            # Routes: /, /memory-decay, /compression, /retrieval,
│       │                   #         /architecture, /context-window, /graph-memory
│       ├── components/     # layout/, paper/, shared/, charts/, visualizations/, modules/
│       ├── lib/            # simulation/, data/, llm/, utils/
│       ├── hooks/          # useSimulation, useAnimationFrame, useDebouncedValue, etc.
│       ├── stores/         # Zustand simulation store
│       └── types/          # TypeScript definitions for all simulation types
│
├── backend/                # FastAPI application
│   └── app/
│       ├── simulation/     # compression.py, architecture.py, context_window.py, graph_memory.py
│       ├── routers/        # simulation.py, llm.py
│       └── main.py
│
├── docker-compose.yml
├── docker-compose.prod.yml
└── nginx.conf

Getting Started

With Docker (recommended)

cp .env.example .env          # optionally add ANTHROPIC_API_KEY for live LLM features
docker compose up --build

App: http://localhost · API: http://localhost/api

Local development

Frontend:

cd frontend
npm install
npm run dev       # http://localhost:3000

Backend:

cd backend
pip install -r requirements.txt
uvicorn app.main:main --reload --port 8000

Optional: Live LLM Features

Set ANTHROPIC_API_KEY in your .env to unlock:

  • Module 1: generate realistic filler conversation turns
  • Module 2: dynamic token importance scoring for your own text
  • Module 3: natural language summaries of benchmark findings
  • Module 6: live entity/relationship extraction from any conversation

Every module works fully without the API key. The "Use Live LLM" toggle only appears when a key is detected. The key is server-side only and never sent to the browser.


Graph Memory Frameworks Referenced

Module 6 is directly inspired by real production graph memory systems:

Framework Key idea
Mem0 Entity-relationship triple store with conflict detection and LLM-based resolution
Graphiti (Zep) Bi-temporal knowledge graph that tracks both when a fact occurred and when it was ingested
Microsoft GraphRAG Hierarchical community detection (Leiden algorithm) for corpus-level reasoning
Letta / MemGPT OS-inspired two-tier memory where the agent manages its own memory via tool calls
Cognee RDF-based ontology extraction across 30+ source types

Architecture Color Reference

Architecture Color
Sliding Window Blue #3B82F6
Summarization Purple #8B5CF6
RAG Teal #14B8A6
Hierarchical Amber #F59E0B
Infinite Attention Gray #6B7280
Graph Memory Pink #EC4899

These colors are consistent across every chart, diagram, and animation in the app.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors