This directory contains Node.js example clients for the AI Toolkit API. Each script demonstrates a specific API capability.
- Node.js: v18+ (uses global
fetch) - API server: running at
http://localhost:8000by default (seeserver/README.mdfor setup)
- Base URL: Set
AI_TOOLKIT_BASE_URLto point at your API if not on the default- Example:
export AI_TOOLKIT_BASE_URL=http://localhost:8000
- Example:
From the repository root:
node examples/chat_basic.mjs
node examples/chat_structured.mjs
node examples/embeddings.mjs
node examples/kokoro_tts.mjs
node examples/piper_tts.mjs
node examples/clip_classify.mjs
node examples/moondream.mjs
node examples/vram.mjs- What it does:
- Health check, load a GGUF model from Hugging Face (
unsloth/Qwen3-1.7B-GGUF), - Send a simple chat prompt,
- Print the assistant response,
- Unload the model.
- Health check, load a GGUF model from Hugging Face (
- Key endpoints:
GET /healthz,POST /load_gguf,POST /chat,POST /unload_gguf - Customization: Adjust
hf_repo,hf_file,n_ctx, andchat_formatin the script for your model.
Run:
node examples/chat_basic.mjsExpected output includes the model load status and a short answer (e.g., a haiku) from the model.
- What it does:
- Loads a GGUF model,
- Sends a prompt with
response_format: { type: 'json_object', schema }to enforce JSON output, - Prints a parsed JSON object (falls back to manual
JSON.parseif needed), - Unloads the model.
- Key endpoint:
POST /chatwithresponse_format - Note: The server also returns a
parsedfield when decoding succeeds.
Run:
node examples/chat_structured.mjsExpected output shows a JSON object matching the schema (fields like affected_attribute, amount, etc.).
- What it does:
- Loads a sentence-transformers model,
- Generates embeddings for a few texts (optionally normalized),
- Computes similarity between two vectors,
- Unloads the embedding model.
- Key endpoints:
POST /embeddings/loadPOST /embeddings/generatePOST /embeddings/similarityPOST /embeddings/unload
- Prerequisite: The API server must have
sentence-transformersavailable (handled in the server container).
Run:
node examples/embeddings.mjsExpected output includes the embedding dimension and a similarity score (e.g., cosine similarity ~0.5–0.9 depending on inputs).
- What it does:
- Loads a CLIP model (
openai/clip-vit-base-patch32), - Classifies a local image against candidate labels using zero-shot classification,
- Performs an NSFW check (label "nsfw content" vs threshold),
- Unloads the CLIP model.
- Loads a CLIP model (
- Key endpoints:
POST /clip/loadPOST /clip/classifyPOST /clip/nsfwPOST /clip/unload
- Run:
CLIP_IMAGE=/absolute/path/to/image.jpg node examples/clip_classify.mjs- What it does:
- Lists available TTS voices,
- Synthesizes speech for a given text,
- Prints a URL to the generated WAV under
/audio/....
- Key endpoints:
GET /kokoro/voices,POST /kokoro/synthesize - Prerequisite: Kokoro TTS must be installed and configured in the server (
server/tts). If unavailable, the API returns an error. - Playback: Use a browser to open the printed URL or play via ffmpeg:
node examples/kokoro_tts.mjsRun:
node examples/kokoro_tts.mjsExpected output shows available voices and a public URL to the generated audio.
- What it does:
- Lists available Piper voices,
- Optionally downloads a requested voice from Hugging Face (
rhasspy/piper-voices), - Synthesizes speech with the selected voice,
- Prints a URL to the generated WAV under
/audio/....
- Key endpoints:
GET /piper/voices,POST /piper/download,POST /piper/synthesize - Prerequisites:
- The server must have Piper installed (
piper-ttsPython package). - Voices must be present under
/models/piperor a directory listed inPIPER_VOICES_DIR. - This example can auto-download a voice from the Hugging Face catalog if missing.
- The server must have Piper installed (
- Voice catalog:
rhasspy/piper-voiceson Hugging Face
Run:
node examples/piper_tts.mjsNotes:
- The script requests
en_US-hfc_female-mediumby default. ChangerequestedVoiceinexamples/piper_tts.mjsto pick another, or set it to an empty value to use the default/local voice. - Playback example:
ffplay -autoexit -nodisp $(node -e "console.log(require('./examples/util.mjs').default.BASE_URL)")/audio/<printed-filename>.wav
- What it does:
- Loads the Moondream VLM (
moondream/moondream-2b-2025-04-14-4bit), - Generates a short and normal caption for a local image,
- Asks a visual question (VQA),
- Runs simple object detection and pointing,
- Unloads the model.
- Loads the Moondream VLM (
- Key endpoints:
POST /moondream/loadPOST /moondream/captionPOST /moondream/queryPOST /moondream/detectPOST /moondream/pointPOST /moondream/unload
- Run:
MOONDREAM_IMAGE=/absolute/path/to/image.jpg node examples/moondream.mjsNote: The server uses Hugging Face transformers with trust_remote_code for Moondream. For GPU, ensure CUDA is available; else it falls back to CPU.
References:
- Moondream 4-bit model card: Hugging Face
- Official docs and recipes: moondream.ai
- What it does:
- Queries VRAM statistics and prints total/used/free per GPU and memory used by the API process.
- Key endpoint:
GET /vram - Prerequisite: NVIDIA drivers, NVML available in the server container; otherwise the script reports that GPU/NVML is not available.
Run:
node examples/vram.mjsExpected output lists GPUs and memory figures, or a notice if unavailable.
This is a small helper module used by the examples:
BASE_URL: reads fromAI_TOOLKIT_BASE_URLor defaults tohttp://localhost:8000postJSON(path, payload): POST JSON and parse the responsegetJSON(path): GET JSON and parse the responseprettyBytes(num): human-readable byte formatting (used byvram.mjs)
- Connection refused: Ensure the API server is running and
AI_TOOLKIT_BASE_URLpoints to it. - Model not loaded / 503: Load a model before calling
/chat, or use the example scripts which load/unload automatically. - Embeddings 500: Ensure
sentence-transformersis available in the server environment. - Kokoro TTS 500: Kokoro TTS not installed/configured; see
server/tts. - Piper TTS 500: Ensure
piper-ttsis installed on the server and that voices exist under/models/piperorPIPER_VOICES_DIR. You can also callPOST /piper/downloadfrom the example to fetch a voice. - VRAM empty: Ensure GPUs are visible in the container (
--gpus all) and NVML is present.