🍋 Lemonade Python SDK

A robust, production-grade Python wrapper for the Lemonade C++ Backend.

This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power Sorana (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.

🚀 Key Features

Auto-Discovery: Automatically scans 8 discrete ports (8000, 8020, 8040, 8060, 8080, 9000, 13305, 11434) to find active Lemonade instances. Distinguishes between real Ollama and Lemonade on port 11434.
Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
Health Checks & Server Stats: Lightweight /api/v1/health endpoint plus get_stats() for token usage, requests served, and performance metrics.
Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
Model Labels & Capabilities: Detect vision, reasoning, coding, and other model capabilities via the official Lemonade labels system.
Embeddings API: Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).
Audio API: Whisper speech-to-text and Kokoro text-to-speech.
Reranking API: Reorder documents by relevance for better RAG results.
Image Generation: Create images from text prompts using Stable Diffusion.
WebSocket Streaming: Real-time audio transcription with VAD.

📦 Installation

pip install .

Alternatively, you can install it directly from GitHub:

pip install git+https://github.com/Tetramatrix/lemonade-python-sdk.git

⚡ Quick Start

1. Connecting to Lemonade

The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.

from lemonade_sdk import LemonadeClient, find_available_lemonade_port

# Auto-discover running instance
port = find_available_lemonade_port()
if port:
    client = LemonadeClient(base_url=f"http://localhost:{port}")
    if client.health_check():
        print(f"Connected to Lemonade on port {port}")
else:
    print("No Lemonade instance found.")

1.1 Health Check & Stats

# Check if server is alive (uses /api/v1/health endpoint)
if client.health_check():
    print("Lemonade is running!")

# Get server statistics (performance metrics from last request)
stats = client.get_stats()
if stats:
    print(f"Time to first token: {stats.get('time_to_first_token', 0):.2f}s")
    print(f"Tokens/sec: {stats.get('tokens_per_second', 0):.1f}")
    print(f"Input tokens: {stats.get('input_tokens', 0)}")
    print(f"Output tokens: {stats.get('output_tokens', 0)}")
    print(f"Prompt tokens: {stats.get('prompt_tokens', 0)}")

Available stats fields: time_to_first_token, tokens_per_second, input_tokens, output_tokens, decode_token_times, prompt_tokens.

2. Chat Completion

response = client.chat_completion(
    model="Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Hello World in C++"}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])

3. Model Management

# List all available models
models = client.list_models()
for m in models:
    print(f"Found model: {m['id']}")

# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")

3.1 Model Labels & Capability Detection

Lemonade models include a labels array that describes their capabilities. The SDK provides ModelInfo objects for easy capability checking.

from lemonade_sdk import ModelInfo, LemonadeClient

client = LemonadeClient()

# Get all models with capability info
models = client.list_models_with_info()
for model in models:
    print(f"{model.name}: {model.get_capabilities_summary()}")

# Check if a specific model supports vision
if client.has_vision("Qwen3.5-122B"):
    print("This model can process images!")

# Find all vision models
vision_models = client.list_vision_models()
for m in vision_models:
    print(f"Vision model: {m.name}")

# Check other capabilities
print(client.has_reasoning("Qwen3.5-122B"))   # Extended thinking
print(client.has_tool_calling("Qwen3.5-122B")) # Function calling
print(client.has_coding("Qwen3.5-122B"))       # Code generation
print(client.has_embeddings(model_id))          # Embedding model
print(client.has_reranking(model_id))           # Reranking model
print(client.has_image_generation(model_id))    # Stable Diffusion

Official Lemonade Labels:

Label	Meaning
`vision`	Model supports image input (VLM)
`reasoning`	Model uses extended thinking/chain-of-thought
`coding`	Optimized for code generation tasks
`tool-calling`	Supports function/tool calling
`embeddings`	Text embedding model
`reranking`	Reranking model (for RAG pipelines)
`image`	Image generation model (Stable Diffusion etc.)
`hot`	Featured/recommended by Lemonade
`custom`	User-added model

You can also use labels directly:

from lemonade_sdk import LABEL_VISION, ModelInfo

model = ModelInfo.from_api_response(api_data)
if model.has_label(LABEL_VISION):
    print("This is a vision model")

4. Embeddings (NEW)

Generate text embeddings for semantic search, RAG pipelines, and clustering.

# List available embedding models (filtered by 'embeddings' label)
embedding_models = client.list_embedding_models()
for model in embedding_models:
    print(f"Embedding model: {model['id']}")

# Generate embeddings for single text
response = client.embeddings(
    input="Hello, world!",
    model="nomic-embed-text-v1-GGUF"
)

embedding_vector = response["data"][0]["embedding"]
print(f"Vector length: {len(embedding_vector)}")

# Generate embeddings for multiple texts
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embeddings(
    input=texts,
    model="nomic-embed-text-v1-GGUF"
)

for item in response["data"]:
    print(f"Text {item['index']}: {len(item['embedding'])} dimensions")

Supported Backends: (Lemonade)

✅ FLM (FastFlowLM) - NPU-accelerated on Windows
✅ llamacpp (.GGUF models) - CPU/GPU
❌ ONNX/OGA - Not supported

5. Audio Transcription (Whisper) - NEW

Transcribe audio files to text using Whisper.

# List available audio models (Whisper + Kokoro)
audio_models = client.list_audio_models()
for model in audio_models:
    print(f"Audio model: {model['id']}")

# Transcribe an audio file
result = client.transcribe_audio(
    file_path="meeting.wav",
    model="Whisper-Tiny",
    language="en",  # Optional: None for auto-detection
    response_format="json"  # Options: "json", "text", "verbose_json"
)

if "error" not in result:
    print(f"Transcription: {result['text']}")
    # Verbose format also includes: duration, language, segments

Supported Models:

Whisper-Tiny (~39M parameters)
Whisper-Base (~74M parameters)
Whisper-Small (~244M parameters)

Supported Formats: WAV, MP3, FLAC, OGG, WebM

Backend: whisper.cpp (NPU-accelerated on Windows)

6. Text-to-Speech (Kokoro) - NEW

Generate speech from text using Kokoro TTS.

# Generate speech and save to file
client.text_to_speech(
    input_text="Hello, Lemonade can now speak!",
    model="kokoro-v1",
    voice="shimmer",  # Options: shimmer, corey, af_bella, am_adam, etc.
    speed=1.0,  # 0.5 - 2.0
    response_format="mp3",  # Options: mp3, wav, opus, pcm, aac, flac
    output_file="speech.mp3"  # Saves directly to file
)

# Or get audio bytes directly
audio_bytes = client.text_to_speech(
    input_text="Short test!",
    model="kokoro-v1",
    voice="corey",
    response_format="mp3"
)

with open("speech.mp3", "wb") as f:
    f.write(audio_bytes)

Supported Models:

kokoro-v1 (~82M parameters)

Available Voices:

Voice ID	Language	Gender
`shimmer`	EN	Female
`corey`	EN	Male
`af_bella`, `af_nicole`	EN-US	Female
`am_adam`, `am_michael`	EN-US	Male
`bf_emma`, `bf_isabella`	EN-GB	Female
`bm_george`, `bm_lewis`	EN-GB	Male

Audio Formats: MP3, WAV, OPUS, PCM, AAC, FLAC

Backend: Kokoros (.onnx, CPU)

7. Reranking (NEW)

Rerank documents based on relevance to a query.

result = client.rerank(
    query="What is the capital of France?",
    documents=[
        "Berlin is the capital of Germany.",
        "Paris is the capital of France.",
        "London is the capital of the UK."
    ],
    model="bge-reranker-v2-m3-GGUF"
)

# Results sorted by relevance score
for r in result["results"]:
    print(f"Rank {r['index']}: Score={r['relevance_score']:.2f}")

Supported Models:

bge-reranker-v2-m3-GGUF
Other BGE reranker models

Backend: llamacpp (.GGUF only, not available for FLM or OGA)

8. Image Generation (NEW)

Generate images from text prompts using Stable Diffusion.

# Generate and save to file
client.generate_image(
    prompt="A sunset over mountains with lake reflection",
    model="SD-Turbo",
    size="512x512",
    steps=4,  # SD-Turbo needs only 4 steps
    cfg_scale=1.0,
    output_file="sunset.png"
)

# Or get image bytes
image_bytes = client.generate_image(
    prompt="A cute cat",
    model="SD-Turbo"
)

Supported Models:

SD-Turbo (fast, 4 steps)
SDXL-Turbo (fast, 4 steps)
SD-1.5 (standard, 20 steps)
SDXL-Base-1.0 (high quality, 20 steps)

Image Sizes: 512x512, 1024x1024, or custom

Backend: stable-diffusion.cpp

9. WebSocket Streaming (NEW)

Real-time audio transcription with Voice Activity Detection (VAD).

from lemonade_sdk import WhisperWebSocketClient

# Create streaming client
stream = client.create_whisper_stream(model="Whisper-Tiny")
stream.connect()

# Set callback for transcriptions
def on_transcript(text):
    print(f"Heard: '{text}'")

stream.on_transcription(on_transcript)

# Stream audio file (PCM16, 16kHz, mono)
for text in stream.stream("audio.pcm"):
    pass  # Callback handles output

# Or stream from microphone (requires pyaudio)
# for text in stream.stream_microphone():
#     print(f"Heard: {text}")

stream.disconnect()

Audio Format: 16kHz, mono, PCM16 (16-bit)

Features:

Voice Activity Detection (VAD)
Real-time streaming
Microphone support (with pyaudio)
Configurable sensitivity

Backend: whisper.cpp (NPU-accelerated on Windows)

📚 Documentation

Embeddings API - Complete guide for using embeddings
Audio API - Whisper transcription and Kokoro TTS (documentation)
Implementation Plan - Audio API implementation roadmap
Lemonade Server Docs - Official Lemonade documentation

🖼️ Production Showcase:

This SDK powers 3 real-world production applications:

Sorana — AI Visual Workspace

SDK drives semantic AI grouping of files and folders onto a spatial 2D canvas
SDK handles auto-discovery and connection to local Lemonade instances (zero config)

Aicono — AI Desktop Icon Organizer (Featured in CHIP Magazine 🇩🇪)

SDK drives AI inference for grouping and categorizing desktop icons
Reached millions of readers via CHIP, one of Germany's largest IT publications

TabNeuron — AI-Powered Tab Organizer

SDK enables local AI inference for grouping and categorizing browser tabs
Desktop companion app + browser extension, demonstrating SDK viability in lightweight client architectures

🛠️ Project Structure

client.py: Main entry point for API interactions (chat, embeddings, audio, reranking, images, model management).
port_scanner.py: Utilities for detecting Lemonade instances across 8 discrete ports (8000, 8020, 8040, 8060, 8080, 9000, 13305, 11434).
model_discovery.py: Logic for fetching and parsing model metadata.
request_builder.py: Helper functions to construct compliant payloads (chat, embeddings, audio, reranking, images).
audio_stream.py: WebSocket client for real-time audio transcription with VAD.
utils.py: Additional utility functions.

🤝 Contributing

Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
lemonade_sdk		lemonade_sdk
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_usage.py		example_usage.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍋 Lemonade Python SDK

🚀 Key Features

📦 Installation

⚡ Quick Start

1. Connecting to Lemonade

1.1 Health Check & Stats

2. Chat Completion

3. Model Management

3.1 Model Labels & Capability Detection

4. Embeddings (NEW)

5. Audio Transcription (Whisper) - NEW

6. Text-to-Speech (Kokoro) - NEW

7. Reranking (NEW)

8. Image Generation (NEW)

9. WebSocket Streaming (NEW)

📚 Documentation

🖼️ Production Showcase:

🛠️ Project Structure

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍋 Lemonade Python SDK

🚀 Key Features

📦 Installation

⚡ Quick Start

1. Connecting to Lemonade

1.1 Health Check & Stats

2. Chat Completion

3. Model Management

3.1 Model Labels & Capability Detection

4. Embeddings (NEW)

5. Audio Transcription (Whisper) - NEW

6. Text-to-Speech (Kokoro) - NEW

7. Reranking (NEW)

8. Image Generation (NEW)

9. WebSocket Streaming (NEW)

📚 Documentation

🖼️ Production Showcase:

🛠️ Project Structure

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages