[Do Not Merge] model : LFM2.5-Audio-1.5B by tdakhran · Pull Request #18641 · ggml-org/llama.cpp

tdakhran · 2026-01-06T14:25:07Z

LFM2.5-Audio-1.5B is Liquid AI's updated end-to-end audio foundation model. Key improvements include a custom, LFM based audio detokenizer, llama.cpp compatible GGUFs for CPU inference, and better ASR and TTS performance.

This PR is intended to provide a functional implementation in llama.cpp until necessary infrastructure is implemented.
The plan is to split and merge it into upstream in smaller chunks, while keeping and tracking functional implementation here. It will be rebased from time to time.

GGUFs, precompiled runners, and instructions, live in https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF.

Merge plan:

Support n_embd_out model : add LFM2-ColBert-350M #18607
Support llama_memory_hybrid_iswa memory : add llama_memory_hybrid_iswa #18601
istft for audio output mtmd: mtmd_audio_streaming_istft for audio output #18645
reuse llama-model infra for audio tokenizer model : Add tokenizer from LFM2.5-Audio-1.5B #19687
support speech output in llama-server and mtmd API @ngxson
tbd

Demo of capabilities (watch with audio on)

demo.mp4

Thank you, @ngxson for the help!

tdakhran · 2026-01-06T14:48:42Z

@ngxson @CISC is there a way to disable CI for this PR? There is no need to build it for each commit.

CISC · 2026-01-06T14:57:22Z

@ngxson @CISC is there a way to disable CI for this PR? There is no need to build it for each commit.

Only way I know is to have a merge conflict.

ggerganov · 2026-01-06T15:19:19Z

If the string [no ci] is present anywhere in the commit message, it won't execute the CI

CISC · 2026-01-06T15:22:09Z

If the string [no ci] is present anywhere in the commit message, it won't execute the CI

Or that. We just have to remember to remove them all from the merge message. :)

Change is decoupled from ggml-org#18641. [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) needs streaming istft for generating output audio. * add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction * replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output). * unified templated FFT/IFFT implementation supporting both forward and inverse transforms

… tarek/feat/os-lfm2.5-audio-1.5b-upstream [no ci]

Change is decoupled from #18641. [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) needs streaming istft for generating output audio. * add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction * replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output). * unified templated FFT/IFFT implementation supporting both forward and inverse transforms

elfarolab · 2026-01-07T10:08:51Z

@tdakhran

Hello Tarek,

I am trying to build your WIP PR.
I know it is a draft, it should be considered work in progress.

With the last commit: 'Read n_layer from gguf', using LTO, building fails at the very end of building here:

FAILED: bin/llama-liquid-audio-cli
: && /usr/bin/c++ -O3 -DNDEBUG  tools/liquid-audio/CMakeFiles/llama-liquid-audio-cli.dir/cli.cpp.o -o bin/llama-liquid-audio-cli  tools/liquid-audio/libliquid-audio.a  common/libcommon.a  /usr/lib/aarch64-linux-gnu/libcurl.so  tools/mtmd/libmtmd.a  src/libllama.a  ggml/src/libggml.a  ggml/src/libggml-cpu.a  /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.a  /usr/lib/aarch64-linux-gnu/libpthread.a  ggml/src/ggml-blas/libggml-blas.a  /usr/lib/aarch64-linux-gnu/libopenblas.so.0  ggml/src/ggml-cuda/libggml-cuda.a  ggml/src/libggml-base.a  -lm  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcudart_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublas_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublasLt_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libculibos.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcuda.so  -ldl  /usr/lib/aarch64-linux-gnu/librt.a && :
/usr/bin/ld: tools/mtmd/libmtmd.a(mtmd-helper.cpp.o):(.bss+0x28): multiple definition of `ma_atomic_global_lock'; tools/liquid-audio/CMakeFiles/llama-liquid-audio-cli.dir/cli.cpp.o:(.bss+0x0): first defined here
lto-wrapper: warning: using serial compilation of 17 LTRANS jobs
collect2: error: ld returned 1 exit status
[474/474] : && /usr/bin/c++ -O3 -DNDEBUG  tools/liquid-audio/CMakeFiles/llama-liquid-audio-server.dir/server.cpp.o -o bin/llama-liquid-audio-server  tools/liquid-audio/libliquid-audio.a  vendor/cpp-httplib/libcpp-httplib.a  common/libcommon.a  /usr/lib/aarch64-linux-gnu/libcurl.so  tools/mtmd/libmtmd.a  src/libllama.a  ggml/src/libggml.a  ggml/src/libggml-cpu.a  /usr/lib/gcc/aarch64-linux-gnu/11/libgomp.a  /usr/lib/aarch64-linux-gnu/libpthread.a  ggml/src/ggml-blas/libggml-blas.a  /usr/lib/aarch64-linux-gnu/libopenblas.so.0  ggml/src/ggml-cuda/libggml-cuda.a  ggml/src/libggml-base.a  -lm  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcudart_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublas_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libcublasLt_static.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/libculibos.a  /usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcuda.so  -ldl  /usr/lib/aarch64-linux-gnu/librt.a  /usr/lib/aarch64-linux-gnu/libssl.so  /usr/lib/aarch64-linux-gnu/libcrypto.so && :
lto-wrapper: warning: using serial compilation of 17 LTRANS jobs
ninja: build stopped: subcommand failed.

llama-server and llama-liquid-audio-server are succefully built, cli fails.

If there is anything I can do to help testing let me know.
I am building a system also with this model on Jetson Orin.

Thank you so much.

tdakhran · 2026-01-07T10:20:06Z

@elfarolab , mentioned commit didn't change anything related to compilation or LTO, could it be that there are stale object files somewhere?

Tested that the clean build in ubuntu:24.04 Docker image works

root@1641914992f4:/tmp/build# cmake /mnt -DLLAMA_CURL=OFF
root@1641914992f4:/tmp/build# make -j20 llama-liquid-audio-cli llama-liquid-audio-server
...
[ 98%] Built target liquid-audio
[100%] Built target llama-liquid-audio-cli
[100%] Built target llama-liquid-audio-server

UPD: it's related to miniaudio

cli defines implementation here https://github.com/ggml-org/llama.cpp/pull/18641/changes#diff-73f13371b37801825dc2cdbfacadf9af40aef9dca4770d9dacbbe3534c7a7dacR13 , another implementation is defined in mtmd audio.

try commenting this line

elfarolab · 2026-01-07T10:28:45Z

Before building I delete the building destination directory every time.
I am building with these options:

CMAKE_BUILD_TYPE=Release
CMAKE_INSTALL_PREFIX=$LLAMACPP_PREFIX_DIR
GGML_CUDA=ON
GGML_CUDA_FA=ON
GGML_CUDA_GRAPHS=ON
GGML_CUDA_FORCE_CUBLAS=ON
GGML_BLAS=ON
GGML_BLAS_VENDOR=OpenBLAS
BLAS_LIBRARIES="$OPENBLAS_LIB"
GGML_CUDA_USE_MMQ=ON
GGML_CUDA_FA_ALL_QUANTS=ON
GGML_AVX=OFF
GGML_AVX2=OFF
GGML_AVX512=OFF
GGML_SSE42=OFF
GGML_F16C=OFF
GGML_FMA=OFF
GGML_ACCELERATE=OFF
GGML_METAL=OFF
GGML_OPENCL=OFF
GGML_SYCL=OFF
GGML_HEXAGON=OFF
GGML_HIP=OFF
GGML_WEBGPU=OFF
GGML_VULKAN=OFF
GGML_LTO=ON
BUILD_SHARED_LIBS=OFF
GGML_STATIC=ON
CMAKE_CUDA_ARCHITECTURES=87
GGML_CUDA_F16=ON
GGML_CUDA_BF16=ON
BLA_STATIC=ON
LLAMA_BUILD_EXAMPLES=ON
LLAMA_BUILD_TESTS=OFF
LLAMA_OPENSSL=ON
LLAMA_CURL=ON
GGML_CUDA_JETSON_DEVICE=ON
GGML_CUDA_ENABLE_UNIFIED_MEMORY=ON
LLAMA_TOOLS_INSTALL=ON
GGML_BACKEND_DL=OFF
GGML_CPU_ALL_VARIANTS=OFF

I always build llama.cpp the same way with the options above, never get failures.
Also it is not the first time I build a PR.
I could try building without ninja.

tdakhran · 2026-01-07T10:33:28Z

@elfarolab , it should work now, there were two implementations of miniaudio

elfarolab · 2026-01-07T10:35:26Z

@elfarolab , it should work now, there were two implementations of miniaudio

rebuilding

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ```

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

…-lfm2.5-audio-1.5b-upstream

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ```

* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in #18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

IceWreck · 2026-03-02T20:15:28Z

@tdakhran I see all 4 have been merged does this mean LFM2.5 Audio works on LlamaCPP?

* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

tdakhran · 2026-03-03T20:30:16Z

@tdakhran I see all 4 have been merged does this mean LFM2.5 Audio works on LlamaCPP?

It's yes and no. ASR part was merged long time ago, for speech output, changes to mtmd API and llama server are required.

…2.5-audio-1.5b-upstream [no ci]

zcattacz · 2026-03-07T05:18:10Z

I tried to build from your branch at the last commit 006639c, but ended up with the following error:

cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release
....
[ 71%] Linking CXX shared library ../../bin/libmtmd.so
[ 71%] Built target mtmd
[ 72%] Building C object tests/CMakeFiles/test-mtmd-c-api.dir/test-mtmd-c-api.c.o
In file included from /dev/shm/llama/llama.cpp-tarek-feat-os-lfm2.5-audio-1.5b-upstream/tests/test-mtmd-c-api.c:4:
/dev/shm/llama/llama.cpp-tarek-feat-os-lfm2.5-audio-1.5b-upstream/tools/mtmd/./mtmd.h:266:10: error: unknown type name ‘mtmd_output_modality’
  266 | MTMD_API mtmd_output_modality mtmd_get_output_modality(mtmd_context * ctx);
      |          ^~~~~~~~~~~~~~~~~~~~
/dev/shm/llama/llama.cpp-tarek-feat-os-lfm2.5-audio-1.5b-upstream/tools/mtmd/./mtmd.h:278:68: error: unknown type name ‘mtmd_output_modality’
  278 | MTMD_API void mtmd_set_output_modalities(mtmd_context * ctx, const mtmd_output_modality * ptr, size_t len);
      |                                                                    ^~~~~~~~~~~~~~~~~~~~
gmake[2]: *** [tests/CMakeFiles/test-mtmd-c-api.dir/build.make:76: tests/CMakeFiles/test-mtmd-c-api.dir/test-mtmd-c-api.c.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:3096: tests/CMakeFiles/test-mtmd-c-api.dir/all] Error 2

I chagned mtmd_output_modality in mtmd.h to

typedef enum mtmd_output_modality {
    MTMD_OUTPUT_MODALITY_TEXT,
    MTMD_OUTPUT_MODALITY_AUDIO,
    MTMD_OUTPUT_MODALITY_END,
} mtmd_output_modality;

Grabbed the tarball from the webpage. Hmm, had to -DLLAMA_BUILD_TESTS=OFF to finish.

[ 72%] Linking CXX executable ../bin/test-mtmd-c-api
/usr/bin/ld: ../bin/libmtmd.so.0.0.0: undefined reference to `common_init_result::~common_init_result()'
/usr/bin/ld: ../bin/libmtmd.so.0.0.0: undefined reference to `common_init_result::model()'
/usr/bin/ld: ../bin/libmtmd.so.0.0.0: undefined reference to `common_init_result::context()'
/usr/bin/ld: ../bin/libmtmd.so.0.0.0: undefined reference to `LLAMA_BUILD_NUMBER'
/usr/bin/ld: ../bin/libmtmd.so.0.0.0: undefined reference to `common_init_from_params(common_params&)'
/usr/bin/ld: ../bin/libmtmd.so.0.0.0: undefined reference to `LLAMA_COMMIT'
collect2: error: ld returned 1 exit status
gmake[2]: *** [tests/CMakeFiles/test-mtmd-c-api.dir/build.make:123: bin/test-mtmd-c-api] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:3096: tests/CMakeFiles/test-mtmd-c-api.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

tdakhran · 2026-03-07T16:42:21Z

@zcattacz , this is a draft PR, not all targets are guaranteed to build.

This works

cmake --build build --target llama-server --target llama-liquid-audio-server --target llama-liquid-audio-cli

Remove PR ggml-org#12794 (OuteTTS 1.0) and PR ggml-org#18039 (Eagle-3 speculative decoding) from the cherry-pick list. Neither is used by any model in the registry. Only PR ggml-org#18641 (LFM2.5 audio) remains.

ngxson · 2026-04-15T15:10:40Z

FYI @tdakhran , I had some discussions recently with nvidia team to bring their chatterbox to llama.cpp. I summarized the design choice in #18641

I'll try to take over this PR when I have time (and implement it as the reference for the new audio generation API in mtmd). Feel free to continue the discussion in the mentioned issue. Thanks!

tdakhran added 7 commits January 6, 2026 15:17

model : add LFM2-ColBert-350M

3b53b5f

Use n_cls_out for pooling rank

af74ec1

memory : add llama_memory_hybrid_iswa

888dd47

Add istft audio utils

40e7a15

model : add Lfm25AudioTokenizer

4017949

LFM2.5-Audio-1.5B

8784689

Small fixes

e1a8fd1

github-actions bot added model Model specific examples python python script changes server labels Jan 6, 2026

loci-dev mentioned this pull request Jan 6, 2026

UPSTREAM PR #18641: [Do Not Merge] model : LFM2.5-Audio-1.5B auroralabs-loci/llama.cpp#835

Open

4 tasks

tdakhran force-pushed the tarek/feat/os-lfm2.5-audio-1.5b-upstream branch from c275436 to e1a8fd1 Compare January 6, 2026 14:46

ngxson mentioned this pull request Jan 6, 2026

common: support remote preset #18520

Merged

tdakhran added 2 commits January 6, 2026 17:03

Remove pimpl from mtmd-audio

ec2890d

tdakhran mentioned this pull request Jan 6, 2026

mtmd: mtmd_audio_streaming_istft for audio output #18645

Merged

Merge remote-tracking branch 'tdakhran/tarek/dev/istft-upstream' into…

07bf242

… tarek/feat/os-lfm2.5-audio-1.5b-upstream [no ci]

Read n_layer from gguf [no ci]

b0d4293

Move save_wav implementation to mtmd [no ci]

4a2f68a

tdakhran force-pushed the tarek/feat/os-lfm2.5-audio-1.5b-upstream branch from 4f1cc0c to 4bee388 Compare February 17, 2026 14:06

Make vocoder and audiotokenizer optional [no ci]

39ff210

tdakhran force-pushed the tarek/feat/os-lfm2.5-audio-1.5b-upstream branch from 4bee388 to 39ff210 Compare February 17, 2026 14:06

tdakhran mentioned this pull request Feb 17, 2026

model : Add tokenizer from LFM2.5-Audio-1.5B #19687

Merged

tdakhran and others added 11 commits February 17, 2026 15:28

Update convert_hf_to_gguf.py

46a9d9a

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Formatting

755d0ea

Rework check for attention layers

1ae63be

Add LFM2 SWA model support

e3ae935

Merge branch 'tarek/feat/upstream-audio-tokenizer' into tarek/feat/os…

615bd28

…-lfm2.5-audio-1.5b-upstream

Remove script

9862353

Address PR feedback

6c41320

Set vocab to none

7f1114b

Merge branch 'tarek/feat/upstream-audio-tokenizer' into tarek/feat/os…

51e2e0e

…-lfm2.5-audio-1.5b-upstream

Move helper function definitions to cpp file

567be9c

Merge branch 'tarek/feat/upstream-audio-tokenizer' into tarek/feat/os…

9c9f4e3

…-lfm2.5-audio-1.5b-upstream

Merge remote-tracking branch 'upstream/master' into tarek/feat/os-lfm…

006639c

…2.5-audio-1.5b-upstream [no ci]

This was referenced Apr 3, 2026

fix(ci): drop unused cherry-picks (OuteTTS, Eagle-3) runpod-labs/a2go-llamacpp#5

Closed

fix(ci): drop all cherry-picks for clean mainline rebase runpod-labs/a2go-llamacpp#6

Merged

Conversation

tdakhran commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdakhran commented Jan 6, 2026

Uh oh!

CISC commented Jan 6, 2026

Uh oh!

ggerganov commented Jan 6, 2026

Uh oh!

CISC commented Jan 6, 2026

Uh oh!

elfarolab commented Jan 7, 2026

Uh oh!

tdakhran commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elfarolab commented Jan 7, 2026

Uh oh!

tdakhran commented Jan 7, 2026

Uh oh!

elfarolab commented Jan 7, 2026

Uh oh!

IceWreck commented Mar 2, 2026

Uh oh!

tdakhran commented Mar 3, 2026

Uh oh!

zcattacz commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdakhran commented Mar 7, 2026

Uh oh!

ngxson commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

tdakhran commented Jan 6, 2026 •

edited

Loading

tdakhran commented Jan 7, 2026 •

edited

Loading

zcattacz commented Mar 7, 2026 •

edited

Loading