🦐 ShrimPK

Push-based AI memory where memories find YOU.

3.50ms echo at 100K memories • +38% more accurate than plain LLM • 15.7% token savings • multimodal text + vision + speech

The Problem

AI tools forget everything between sessions. You re-explain your stack, preferences, and project context every single time. Standard RAG requires you to search. You shouldn't have to.

"What framework did I use for the API?"

Without ShrimPK: "I don't have access to your previous conversations."

With ShrimPK: "You chose FastAPI for your REST API, with SQLAlchemy for type-safe ORM queries."

How It Works

ShrimPK inverts the memory paradigm. Instead of searching for memories, memories find you.

┌─────────────────────────────────────────────────────────┐
│  1. You Converse                                        │
│     Just talk normally. ShrimPK stores context           │
│     automatically. No "remember this" needed.           │
├─────────────────────────────────────────────────────────┤
│  2. Memories Self-Activate                              │
│     When you mention something related, stored           │
│     memories activate through Hebbian associations      │
│     and surface relevant context — automatically.       │
├─────────────────────────────────────────────────────────┤
│  3. AI Knows You                                        │
│     Your name persists for a year. Preferences for      │
│     months. Casual chats fade in days. Just like        │
│     human memory.                                       │
└─────────────────────────────────────────────────────────┘

The Reef Ecosystem

In nature, cleaner shrimp maintain entire reef ecosystems — removing parasites, cleaning wounds, keeping everything healthy. ShrimPK does the same for your AI memory.

	Role	What It Does
🦐 ShrimPK	The Shrimp	Maintains your AI memory reef — stores, classifies, decays, consolidates
🪸 Echo Memory	The Reef	The associative memory structure — LSH, Bloom filters, Hebbian learning
🦞 You	The Lobster	You just talk. Memories come to you. No searching, no managing.
🌊 Push Activation	The Current	The autonomous flow that delivers memories without being asked

Performance

Metric	Value
Echo P50 at 100K memories	3.50ms
Echo P95 at 100K memories	6.88ms
Head-to-head accuracy	+38% vs plain LLM
Personalization rate	100%
Token savings	15.7% per request
Follow-up elimination	100%
RAM (1M memories, f32)	~1.8 GB
RAM (1M memories, binary)	~150 MB

Multimodal Memory

ShrimPK v0.5.0 introduces a 3-channel architecture: text, vision, and speech. Each channel has its own embedding model, LSH index, and persistence section -- unified under a single Echo Memory engine.

┌─────────────────────────────────────────────────────────┐
│               ShrimPK Echo Memory                       │
├──────────────┬──────────────┬───────────────────────────┤
│  Text (384d) │ Vision (512d)│ Speech (896d)             │
│  BGE-small   │ CLIP ViT-B-32│ ECAPA-TDNN+Whisper-tiny  │
│  LSH 16×10   │ LSH 16×10   │ LSH 16×10                │
├──────────────┴──────────────┴───────────────────────────┤
│  Cross-Modal Retrieval: text query → image result       │
│  Auto-Mode: searches all channels, deduplicates         │
└─────────────────────────────────────────────────────────┘

Cross-Modal Retrieval

Store an image. Query with text. ShrimPK finds it.

# Store an image
shrimpk store-image photo.jpg --tag "kitchen morning"

# Query with text — finds the image
shrimpk echo --modality vision "where's the cup?"
# → photo.jpg (similarity: 0.82) — CLIP matched "cup" to image content

# Auto mode searches all channels
shrimpk echo "what did I see this morning?"
# → text memories + image memories, deduplicated by score

Enabling Modalities

Vision and speech are compile-time feature flags. Vision is enabled by default; speech is architecture-ready (models wired in a future release).

# ~/.shrimpk-kernel/config.toml
enabled_modalities = ["text", "vision"]
vision_embedding_dim = 512
speech_embedding_dim = 896

# Build with vision support (default)
cargo build --release --features vision

# Build with all modalities
cargo build --release --features "vision,speech"

CLI

shrimpk store-image photo.jpg            # Store image via CLIP
shrimpk store-image screenshot.png --tag "bug report"
shrimpk echo --modality vision "red car"  # Vision-only search
shrimpk echo --modality auto "morning"    # Search all channels
shrimpk stats                             # Shows text_count, vision_count, speech_count

API

# Store image via daemon
curl -X POST localhost:11435/api/store_image \
  -F "file=@photo.jpg" -F "tag=kitchen"

# Echo with modality
curl -X POST localhost:11435/api/echo \
  -H "Content-Type: application/json" \
  -d '{"query":"where is the cup?","modality":"vision"}'

Speech Channel (KS50)

The speech channel combines two permissive-license models into a 640-dimensional embedding:

ECAPA-TDNN (256d, Apache-2.0) — speaker identity
Whisper-tiny encoder (384d, MIT) — prosody / rhythm / pace

Both models auto-download from HuggingFace Hub (~58 MB total) on first use and are cached locally. When enabled, shrimpk store-audio recording.wav works the same as image storage.

Scaling

The multimodal engine scales from Raspberry Pi to data center. Per-channel LSH indices keep retrieval sub-linear regardless of memory count. Vision adds ~512 bytes per stored image embedding; speech adds ~896 bytes. RAM auto-detection adjusts budgets per channel.

The Ollama Model

ShrimPK runs as a background daemon — just like Ollama. Install once, it serves every AI tool on your machine.

shrimpk-daemon                    ← runs on localhost:11435
  Model loads ONCE (~3s)          ← then serves forever
  Auto-consolidation              ← every 5 min
  Any client connects via HTTP    ← CLI, MCP, hooks, your app

No cold starts. No process spawning. Sub-5ms responses.

# Start the daemon
shrimpk-daemon

# Store a memory (via HTTP — instant)
curl -X POST localhost:11435/api/store \
  -H "Content-Type: application/json" \
  -d '{"text":"I prefer Rust for backend services","source":"cli"}'

# Echo memories (via HTTP — 3.5ms)
curl -X POST localhost:11435/api/echo \
  -H "Content-Type: application/json" \
  -d '{"query":"What language for the backend?"}'

Install

One-liner (recommended)

curl -fsSL https://raw.githubusercontent.com/bellkisai/kernel/master/scripts/install-remote.sh | sh

Installs pre-built binaries to ~/.shrimpk/bin/, registers the MCP server, and starts the daemon.

Docker

docker run -d --name shrimpk -p 11435:11435 -v shrimpk-data:/data bellkisai/shrimpk

From source

git clone https://github.com/bellkisai/kernel.git && cd kernel
cargo build --release -p shrimpk-cli -p shrimpk-mcp -p shrimpk-daemon -p shrimpk-tray
bash scripts/install.sh   # or: powershell scripts/install.ps1

GitHub Releases

Download pre-built binaries for your platform from Releases. Available for Linux (x86_64, aarch64), macOS (Apple Silicon, Intel), and Windows.

Verify

curl http://localhost:11435/health          # daemon running?
shrimpk status                              # system overview
shrimpk store "I prefer Rust"               # store a memory
shrimpk echo "What language do I like?"     # recall it

Proxy — Zero-Config Memory for Any LLM

Point your LLM client at ShrimPK instead of your provider. Every request gets transparent memory injection.

# Start with smart defaults (expands "ollama" to localhost:11434)
shrimpk-daemon --proxy-to ollama

# Now use localhost:11435 instead of localhost:11434
# Open WebUI, Chatbox, or any OpenAI-compatible client — just change the port

The daemon auto-detects local providers (Ollama, LM Studio, vLLM, Jan, LocalAI, GPT4All) and routes by model name. You'll see Memories injected: N in the daemon logs for every request.

Provider	Default Port	Flag
Ollama	11434	`--proxy-to ollama`
LM Studio	1234	`--proxy-to lmstudio`
vLLM	8000	`--proxy-to vllm`
Jan	1337	`--proxy-to jan`
LocalAI	8080	`--proxy-to localai`
GPT4All	4891	`--proxy-to gpt4all`
Custom	any	`--proxy-to http://host:port`

Client Request → ShrimPK (11435)
                    │
              ┌─────┴─────┐
              │ 1. Echo    │ ← find relevant memories (3.50ms)
              │ 2. Inject  │ ← prepend to system prompt
              │ 3. Store   │ ← save user message for future
              │ 4. Forward │ ← send to LLM provider
              └─────┬─────┘
                    │
              Provider (Ollama, LM Studio, etc.)
                    │
              Response → Client (streamed transparently)

Proxy Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	Chat with memory injection
GET	`/v1/models`	List available models from backend

Rust Library

use shrimpk_core::EchoConfig;
use shrimpk_memory::EchoEngine;

let config = EchoConfig::auto_detect();
let engine = EchoEngine::load(config)?;

engine.store("I prefer FastAPI for REST APIs", "conversation").await?;

let echoes = engine.echo("What framework for this API?", 5).await?;
// Returns: FastAPI memory (similarity: 0.85) in ~3.5ms

Python

from shrimpk import EchoMemory, EchoConfig

config = EchoConfig.auto_detect()
mem = EchoMemory(config)
mem.store("I prefer FastAPI", source="conversation")
results = mem.echo("What framework?", max_results=5)

MCP Server (Claude Code)

# Register globally — works from any directory
claude mcp add --transport stdio --scope user shrimpk -- shrimpk-mcp

The MCP server auto-detects the daemon and proxies via HTTP. Falls back to in-process if daemon isn't running.

HTTP API

The daemon exposes 10 REST endpoints on localhost:11435:

Method	Route	Purpose
`GET`	`/health`	Health check + memory count + uptime
`POST`	`/api/store`	Store a memory
`POST`	`/api/echo`	Find resonating memories
`GET`	`/api/stats`	Engine statistics
`GET`	`/api/memories`	List all memories
`DELETE`	`/api/memories/:id`	Forget a memory
`GET`	`/api/config`	Show configuration
`PUT`	`/api/config`	Set config value
`POST`	`/api/persist`	Force save to disk
`POST`	`/api/consolidate`	Trigger consolidation

Optional auth: set SHRIMPK_AUTH_TOKEN env var → required as Authorization: Bearer header.

Under the Hood

shrimpk-kernel          (facade — re-exports all)
  shrimpk-core          (types, config, errors, PII types)
  shrimpk-memory        (Echo Memory engine)
  shrimpk-router        (provider routing + cascade) — library, not yet wired to daemon
  shrimpk-context       (context assembly + token budgeting) — in progress
  shrimpk-security      (sandbox, permissions) — planned
  shrimpk-python        (PyO3 bindings)
  shrimpk-mcp           (MCP server — JSON-RPC stdio)
  shrimpk-daemon        (HTTP daemon — localhost:11435)

Echo Memory pipeline:

Bloom filter — O(1) topic rejection (is this query even relevant?)
LSH — sub-linear candidate retrieval (SimHash, 16 tables, 10 bits)
Cosine similarity — SIMD-accelerated exact scoring
Hebbian boost — co-activated memories get promoted
Category decay — Identity (365d) → Conversation (3d)

Configuration

ShrimPK auto-detects from system RAM. Override via config file or env vars:

# Config file (~/.shrimpk-kernel/config.toml)
shrimpk config set max_memories 500000
shrimpk config set quantization binary
shrimpk config show

# Environment variables (highest priority)
SHRIMPK_MAX_MEMORIES=500000
SHRIMPK_DATA_DIR=/custom/path
SHRIMPK_QUANTIZATION=binary
SHRIMPK_PORT=11435
SHRIMPK_AUTH_TOKEN=your-secret-token

Priority: env vars > config.toml > auto-detect

Two Products

	ShrimPK Kernel	Bellkis HUB
What	AI memory engine	AI desktop app
For	Developers embedding memory	Users wanting a local AI hub
License	Apache 2.0	BSL 1.1
Install	`cargo add shrimpk-kernel`	Desktop installer

Two brands, one household. The kernel powers the hub.

License

Apache 2.0 — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github		.github
benchmarks		benchmarks
cli		cli
crates		crates
docs		docs
integrations/openclaw		integrations/openclaw
scripts		scripts
src		src
tests		tests
wix		wix
.dockerignore		.dockerignore
.gitignore		.gitignore
BACKLOG.md		BACKLOG.md
CHANGELOG.md		CHANGELOG.md
CODEX.md		CODEX.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
greptile.json		greptile.json

Folders and files

Latest commit

History

Repository files navigation

🦐 ShrimPK

The Problem

How It Works

The Reef Ecosystem

Performance

Multimodal Memory

Cross-Modal Retrieval

Enabling Modalities

CLI

API

Speech Channel (KS50)

Scaling

The Ollama Model

Install

One-liner (recommended)

Docker

From source

GitHub Releases

Verify

Proxy — Zero-Config Memory for Any LLM

Proxy Endpoints

Rust Library

Python

MCP Server (Claude Code)

HTTP API

Under the Hood

Configuration

Two Products

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages