Skip to content

Conversation

@dhdaines
Copy link

@dhdaines dhdaines commented Jan 4, 2026

Applies on top of #2108 which has the necessary changes to MTMD.

This adds a chat format to support https://huggingface.co/ggml-org/granite-docling-258M-GGUF and derivatives. It should work with SmolVLM and SmolDocling as well.

In order to use these models effectively it is necessary to enable special tokens in the chat completion output, so this adds a special flag to all of the chat completion functions which matches what --special does in llama-cli (this is enabled by default in llama-mtmd-cli)

@dhdaines dhdaines marked this pull request as ready for review January 4, 2026 18:05
@dhdaines
Copy link
Author

dhdaines commented Jan 4, 2026

Ready for review!

@dhdaines dhdaines force-pushed the granite-docling branch 2 times, most recently from 8c4d7ac to 3a04e21 Compare January 5, 2026 18:51
Ralf Waldukat and others added 4 commits January 6, 2026 18:00
- Update vendor/llama.cpp submodule to be47fb92 (2026-01-01)
- Bump version from 0.3.16 to 0.4.0

Breaking changes:
- Migrate flash_attn bool to flash_attn_type enum (backward compatible via None=AUTO)
- Replace llama_kv_self_* API with llama_memory_* API

New features:
- Add LLAMA_FLASH_ATTN_TYPE_* enum (AUTO/DISABLED/ENABLED)
- Add llama_model_params fields: no_host, no_alloc
- Add mtmd_context_params fields: flash_attn_type, warmup, image_min/max_tokens
- Add LLAMA_ROPE_TYPE_IMROPE, LLAMA_PARAMS_FIT_STATUS_* enums
- Add 15+ new functions: llama_max_tensor_buft_overrides, llama_n_ctx_seq,
  llama_model_n_embd_inp, llama_model_is_hybrid, llama_log_*, llama_memory_*,
  llama_attach/detach_threadpool, llama_adapter_meta_* (4 functions)

Fixes:
- Server settings: flash_attn default None (AUTO) instead of False (DISABLED)
- Enable FIM token functions: token_prefix/middle/suffix
- Fix typos: additonal→additional, unnused→unused
- Remove deprecated verbosity field from mtmd_context_params
- Add CMake version workaround documentation

Code quality:
- Consistent stub style (... not pass)
- Struct alignment verified against llama.h and mtmd.h
- Minimal whitespace noise (0.4% of diff)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant