A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
-
Updated
Jun 30, 2025 - Python
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
Reliable and Efficient Semantic Prompt Caching with vCache
Redis integration for Google Agent Development Kit (ADK) - Memory, Sessions, Search Tools, MCP
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
Enhance LLM retrieval performance with Azure Cosmos DB Semantic Cache. Learn how to integrate and optimize caching strategies in real-world web applications.
AI real-estate automation platform: Telegram bot, RAG, apartment search, CRM workflows, voice agent, Langfuse observability, and Dockerized AI runtime.
Redis Vector Similarity Search, Semantic Caching, Recommendation Systems and RAG
🔀 One prompt in. The right model out. Open-source LLM router with 100% routing accuracy, 47+ providers. Budget enforcement, semantic cache, intelligent failover. Zero ML, 19.5KB. MIT.
A ChatBot using Redis Vector Similarity Search, which can recommend blogs based on user prompt
Optimized RAG Retrieval with Indexing, Quantization, Hybrid Search and Caching
Multi-agent LangGraph assistant for Montreal urban mobility — RAG, semantic caching, and a predictive ML collision model on FastAPI.
Semantic cache layer for LLM APIs — embed prompts locally, find near-matches, skip redundant LLM calls.
A universal open protocol for LLM semantic caching and cross-platform alignment (v0.1). High-efficiency semantic hashing based on S³ topology.
Semantic memory and caching for LLM agents with classifier-validated equivalence instead of naive cosine thresholds.
A semantic search system using vector embeddings, fuzzy clustering, FAISS indexing, and a custom semantic cache with a FastAPI service.
SQLite-backed LLM response cache. Exact match + fuzzy match. Decorator API. Zero mandatory server dependencies.
Orquestrador de agentes RAG corretivo (CRAG) para resolução de problemas de TI com rastreamento LangGraph, FastAPI, ChromaDB e OpenTelemetry/Phoenix.
Sub-query level semantic caching for LLM APIs — 3-tier hybrid engine with FAISS vector search. 87.5% cache hit rate, 71.8% cost savings on 100 real API calls.
Add a description, image, and links to the semantic-cache topic page so that developers can more easily learn about it.
To associate your repository with the semantic-cache topic, visit your repo's landing page and select "manage topics."