Would it be possible to configure docling to work with llama.cpp (openai compatible)? #2679

eleaner · 2025-11-25T18:48:06Z

eleaner
Nov 25, 2025

The idea is that I would love to use the power of LLM but I do not want to run my private LLM on the same machine as docling.
And if so - then how to do it?

Unfortunately the I am afraid I am too simple to understand what i should do (if at all possible)

Thanks

@eleaner · 2025-11-25T18:51:45Z

dosubot[bot]
bot Nov 25, 2025

Hi @eleaner! I'm Dosu and I’m helping the docling team.

Yes, you can configure docling to work with llama.cpp (or other OpenAI-compatible servers) running on a different machine. Docling supports remote LLMs via OpenAI-compatible API endpoints—users have successfully used llama-server and vllm as remote backends, and there’s no vendor lock-in to Ollama or any specific provider. The key is to enable remote services and set the API endpoint URL and model name in your configuration.

Here’s a simple step-by-step guide:

Start your llama.cpp server in OpenAI-compatible mode on your remote machine. Make sure it exposes an endpoint like http://<remote-ip>:8000/v1/chat/completions.
Configure docling to use this remote endpoint. In your Python script or configuration, set up the pipeline options like this:

from docling.datamodel.pipeline_options import ApiVlmOptions, VlmPipelineOptions, ResponseFormat

APIURL = "http://<remote-ip>:8000/v1/chat/completions"  # Replace <remote-ip> with your server's address
APIMODEL = "your-model-name"  # The model name as served by llama.cpp

def llama_vlm_options(model: str, prompt: str):
    return ApiVlmOptions(
        url=APIURL,
        params={"model": model},
        prompt=prompt,
        timeout=600,
        scale=1.0,
        response_format=ResponseFormat.MARKDOWN,
    )

pipeline_options = VlmPipelineOptions(
    enable_remote_services=True  # This must be True for remote connections
)
pipeline_options.vlm_options = llama_vlm_options(
    model=APIMODEL,
    prompt="Your prompt here"
)

Run docling as usual. It will send requests to your remote llama.cpp server.

Caveats:

Make sure your llama.cpp server is reachable from the docling machine (check firewalls and network settings).
Some servers (like Ollama) may not be fully compatible with docling’s expected OpenAI response format, but llama-server and vllm work well source.
You can adjust headers, timeout, and other parameters if needed.

For more details and troubleshooting, see the discussion and example code.

Let me know if you want a more detailed example or help with a specific part of the setup!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

eleaner Nov 25, 2025
Author

@dosu
if I may ask - how would i apply it in the docling-serve container that serves API for openwebui?

dosubot[bot] bot Nov 25, 2025

To use a remote llama.cpp (OpenAI-compatible) server with docling-serve in a container (serving openwebui), you need to:

Start your llama.cpp server in OpenAI-compatible mode on your remote machine (e.g., http://<remote-ip>:8000/v1/chat/completions).
Launch your docling-serve container with remote services enabled:
```
docker run -p 5001:5001 \
  -e DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true \
  quay.io/docling-project/docling-serve
```
(Add any volume mounts for models if needed—see below.)

When making API requests to docling-serve, specify the remote LLM endpoint and model in the payload using the vlm_pipeline_model_api option. Example JSON:

{
  "options": {
    "pipeline": "vlm",
    "vlm_pipeline_model_api": {
      "url": "http://<remote-ip>:8000/v1/chat/completions",
      "params": { "model": "your-model-name" },
      "timeout": 600,
      "prompt": "Your prompt here"
    }
  },
  "file_sources": [
    { "filename": "yourfile.pdf", "base64_string": "<base64-encoded-file>" }
  ]
}

Replace <remote-ip> and "your-model-name" with your actual server address and model.

Important caveats:

You must set DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true for remote API calls to work source.
The remote LLM API config (vlm_pipeline_model_api) must be included in each API request; there is no global config for this source.
Docling-serve may still require local model files for initialization, even when using a remote LLM. If you see errors about missing models, download them locally and mount them into the container, or set up auto-download source.
Some versions of docling-serve have issues with schema validation for vlm_pipeline_model_api—make sure you are using a recent version, and check for compatibility with your openwebui version source.
If you get validation errors about tuples vs. strings, ensure your remote LLM returns a plain string in the OpenAI format, not a tuple source.

If you run into issues with multipart/form-data requests (common with openwebui), make sure your openwebui version supports sending complex config objects as JSON, or check for recent fixes on both sides source.

Let me know if you need a more specific example or run into a particular error!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

dolfim-ibm · 2025-11-26T07:45:43Z

dolfim-ibm
Nov 26, 2025
Maintainer

Working with the llama-server app of llamacpp should work out-of-the-box. In general, we are already running examples using Ollama and LM Studio which leverage the llamacpp backend internally.

1 reply

apalazzi Feb 9, 2026

@dolfim-ibm can you provide an example for the command line? I'm trying to do the same, since I can't get docling to run with ROCm but I have a running llama.cpp server (#2966 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it be possible to configure docling to work with llama.cpp (openai compatible)? #2679

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Would it be possible to configure docling to work with llama.cpp (openai compatible)? #2679

Uh oh!

eleaner Nov 25, 2025

Replies: 2 comments · 3 replies

Uh oh!

dosubot[bot] bot Nov 25, 2025

Uh oh!

eleaner Nov 25, 2025 Author

Uh oh!

dosubot[bot] bot Nov 25, 2025

Uh oh!

dolfim-ibm Nov 26, 2025 Maintainer

Uh oh!

apalazzi Feb 9, 2026

eleaner
Nov 25, 2025

Replies: 2 comments 3 replies

dosubot[bot]
bot Nov 25, 2025

eleaner Nov 25, 2025
Author

dolfim-ibm
Nov 26, 2025
Maintainer