Scripts for deploying Qwopus3.5-27B-v3 with vLLM + Anthropic API proxy, enabling Claude Code to use the model as a drop-in replacement.
conda create -p <path-to-conda-env> python=3.11
conda activate <path-to-conda-env>
Install the requirements.
- run
pip install -r requirements.txt - For all details of my configuration, see my-conda-env-dump-details.md
Download weight files to <path-to-weight>, from which you see:
- <path-to-weight>
- .gitattributes
- README.md
- chat_template.jinja
- config.json
- model.safetensors-00001-of-00012.safetensors
- ...
Start a command tab.
conda activate <path-to-conda-env>
bash start_vllm.sh
Start another command tab.
conda activate <path-to-conda-env>
bash start_proxy.sh
Edit ~/.claude/settings.json (fill in your IP).
{
"env": {
"ANTHROPIC_BASE_URL": "http://<your-ip>:8801",
"ANTHROPIC_AUTH_TOKEN": "sk-placeholder",
"ANTHROPIC_MODEL": "Qwopus3.5-27B-v3"
}
}| Variable | Default | Description |
|---|---|---|
CONDA_ENV |
— | Path to conda environment |
MODEL |
— | Path to model weight directory |
CUDA_VISIBLE_DEVICES |
1,2 |
GPU device IDs to use |
PORT |
8767 |
vLLM server port |
TP |
2 |
Tensor parallel size |
MAX_LEN |
200000 |
Max model context length |
| Variable | Default | Description |
|---|---|---|
VLLM_URL |
http://localhost:8767 |
vLLM backend URL |
PROXY_PORT |
8801 |
Anthropic proxy listen port |
MODEL_NAME |
Qwopus3.5-27B-v3 |
Served model name |