feat: support Granite-Docling model #2109

dhdaines · 2026-01-04T05:35:57Z

Applies on top of #2108 which has the necessary changes to MTMD.

This adds a chat format to support https://huggingface.co/ggml-org/granite-docling-258M-GGUF and derivatives. It should work with SmolVLM and SmolDocling as well.

In order to use these models effectively it is necessary to enable special tokens in the chat completion output, so this adds a special flag to all of the chat completion functions which matches what --special does in llama-cli (this is enabled by default in llama-mtmd-cli)

dhdaines · 2026-01-04T18:07:21Z

Ready for review!

- Update vendor/llama.cpp submodule to be47fb92 (2026-01-01) - Bump version from 0.3.16 to 0.4.0 Breaking changes: - Migrate flash_attn bool to flash_attn_type enum (backward compatible via None=AUTO) - Replace llama_kv_self_* API with llama_memory_* API New features: - Add LLAMA_FLASH_ATTN_TYPE_* enum (AUTO/DISABLED/ENABLED) - Add llama_model_params fields: no_host, no_alloc - Add mtmd_context_params fields: flash_attn_type, warmup, image_min/max_tokens - Add LLAMA_ROPE_TYPE_IMROPE, LLAMA_PARAMS_FIT_STATUS_* enums - Add 15+ new functions: llama_max_tensor_buft_overrides, llama_n_ctx_seq, llama_model_n_embd_inp, llama_model_is_hybrid, llama_log_*, llama_memory_*, llama_attach/detach_threadpool, llama_adapter_meta_* (4 functions) Fixes: - Server settings: flash_attn default None (AUTO) instead of False (DISABLED) - Enable FIM token functions: token_prefix/middle/suffix - Fix typos: additonal→additional, unnused→unused - Remove deprecated verbosity field from mtmd_context_params - Add CMake version workaround documentation Code quality: - Consistent stub style (... not pass) - Struct alignment verified against llama.h and mtmd.h - Minimal whitespace noise (0.4% of diff)

dhdaines force-pushed the granite-docling branch from cdcadc5 to 03cf904 Compare January 4, 2026 18:04

dhdaines marked this pull request as ready for review January 4, 2026 18:05

dhdaines force-pushed the granite-docling branch 2 times, most recently from 8c4d7ac to 3a04e21 Compare January 5, 2026 18:51

Ralf Waldukat and others added 4 commits January 6, 2026 18:00

feat: support Granite-Docling model

a1d99cb

feat: add special argument needed to make Granite-Docling useful

2e3dd38

feat: add special to all formatters/completers

8790ce6

dhdaines force-pushed the granite-docling branch from 3a04e21 to 8790ce6 Compare January 6, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support Granite-Docling model #2109

feat: support Granite-Docling model #2109

Uh oh!

dhdaines commented Jan 4, 2026 •

edited

Loading

Uh oh!

dhdaines commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support Granite-Docling model #2109

Are you sure you want to change the base?

feat: support Granite-Docling model #2109

Uh oh!

Conversation

dhdaines commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhdaines commented Jan 4, 2026 •

edited

Loading