A self-hosted, local-first AI stack for running and integrating LLMs.
AIXCL is a privacy-focused platform for individuals and teams who want full control over their models. It provides a simple CLI, a web interface, and a containerized stack to run, manage, and integrate Large Language Models directly into your developer workflow.
- Docker & Docker Compose installed.
- 8 GB VRAM (minimum recommended).
- 32 GB RAM (minimum recommended).
- 128 GB Disk Space (for models and images).
1. Clone and Verify
git clone https://github.com/xencon/aixcl.git && cd aixcl
./aixcl utils check-envNote: The check will warn if
hfis missing. Install with pip or brew if you plan to use llama.cpp or vLLM engines.
2. Start the Stack
# Choose a profile: usr (minimal), dev (UI+DB), ops (Observability), sys (Full)
./aixcl stack start --profile usr3. Choose Your Engine
# See available engines
./aixcl engine auto
# Or set manually
./aixcl engine set ollama # Recommended for beginners
./aixcl engine set vllm # For high-end GPUs
./aixcl engine set llamacpp # For GGUF models4. Add Your First Model
# Quick test model (smallest, fastest download)
./aixcl models add qwen2.5-coder:0.5bSee Quick Test Models for engine-specific options.
5. Launch OpenCode
./opencodeModels are downloaded on-demand when you run ./aixcl models add, not during installation. Download times vary based on model size and your connection speed.
| Model Size | Approximate File Size | Download Time (100 Mbps) | Download Time (20 Mbps) | Download Time (5 Mbps) |
|---|---|---|---|---|
| 0.5B params | ~350-400 MB | ~30 seconds | ~2 minutes | ~5 minutes |
| 1.5B params | ~1 GB | ~1 minute | ~5 minutes | ~15 minutes |
| 7B params | ~4-5 GB | ~5 minutes | ~20 minutes | ~45 minutes |
Note: Times are estimates. Actual speeds depend on network conditions and HuggingFace/Ollama server load.
vLLM Users: The vLLM container does not include the hf CLI. Models must be pre-downloaded on the host before starting vLLM. See the vLLM Workaround Guide for details.
llama.cpp Users: When switching to llama.cpp from another engine, the model configuration in opencode.json is cleared. You must re-add a GGUF model for llama.cpp.
These are the smallest viable models for testing your AIXCL setup with OpenCode. All models below have been tested and verified to work with the current version of AIXCL.
Note: Using the exact model names shown below ensures compatibility. Other models may work but have not been tested.
| Model | Size | Command |
|---|---|---|
| Qwen2.5-Coder 0.5B | ~398 MB | ./aixcl models add qwen2.5-coder:0.5b |
Ollama models use the format
model:tag. The0.5btag indicates the smallest variant.β Tested: Successfully tested with OpenCode integration
| Model | Size | Command |
|---|---|---|
| Qwen2.5-Coder 0.5B | ~1 GB* | ./aixcl models add Qwen/Qwen2.5-Coder-0.5B-Instruct |
*vLLM downloads the full HuggingFace model (safetensors format), which is larger than GGUF.
Note: vLLM container does not include
hfCLI - see workaround guide.β Tested: Successfully tested with OpenCode integration on RTX 4060
| Model | Size | Command |
|---|---|---|
| Qwen2.5-Coder 0.5B (Q4_K_M) | ~398 MB | ./aixcl models add Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF/qwen2.5-coder-0.5b-instruct-q4_k_m.gguf |
llama.cpp requires GGUF format models. The format is
username/repo/filename.gguf.Note: When switching engines, the model configuration is cleared. Re-add the GGUF model after switching.
β Tested: Successfully tested with OpenCode integration
AIXCL supports multiple backends. You can switch them instantly:
# Auto-detect optimal engine based on your hardware
./aixcl engine auto
# Manually switch to vLLM (Great for high-end GPUs - see notes below)
./aixcl engine set vllm
# Manually switch to llama.cpp (Great for CPU/Apple Silicon)
./aixcl engine set llamacpp
# Restart to apply changes
./aixcl stack restart enginevLLM GPU Compatibility: vLLM requires specific GPU tuning for different cards. If you encounter CUDA errors on startup, the default configuration includes optimizations for RTX 4060 and similar GPUs. For other GPUs, you may need to adjust
--gpu-memory-utilizationand--max-model-leninservices/docker-compose.yml.
Engine Testing: All engines have been tested and validated. See the Engine Switching Test Plan for comprehensive testing details.
Manage your local library across any active engine:
Ollama Engine:
# Add from Ollama Registry (tested model)
./aixcl models add qwen2.5-coder:0.5b
# Add multiple models
./aixcl models add qwen2.5-coder:0.5b qwen2.5-coder:1.5b
# List all local models
./aixcl models list
# Remove a model
./aixcl models remove qwen2.5-coder:0.5bvLLM Engine:
# Add from HuggingFace (full model path) - tested model
./aixcl models add Qwen/Qwen2.5-Coder-0.5B-Instruct
# List downloaded models
./aixcl models listllama.cpp Engine:
# Add GGUF from HuggingFace (requires full path with filename) - tested model
./aixcl models add Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF/qwen2.5-coder-0.5b-instruct-q4_k_m.gguf
# List GGUF files in volume
./aixcl models listAIXCL is designed to power local agentic development workflows via the OpenCode CLI. OpenCode connects to your stack for local chat, autocomplete, and agentic coding - all running on-device.
- Endpoint:
http://localhost:11434/v1 - Start a session:
./opencode - Setup: See OpenCode Setup Guide for full configuration details.
Agent workflow rules and permissions are configured automatically via opencode.json and DEVELOPMENT.md.
| Command | Description |
|---|---|
./aixcl utils check-env |
Validate environment and dependencies |
./aixcl stack status |
Check service health and OpenCode connectivity |
./aixcl stack logs engine |
View real-time inference logs |
./aixcl stack stop |
Stop all services gracefully |
./aixcl utils clean |
Wipe unused containers and volumes (Fresh start) |
- User Guide - Detailed workflows and tips.
- Architecture - Profiles and service contracts.
- Security - Rootless Podman/Docker operations.
- OpenCode Setup - CLI configuration and agent workflow.
- Contributing - Issue-first workflow, templates, and PR requirements.
Apache License 2.0 - See LICENSE.