[Do Not Merge] model : LFM2.5-Audio-1.5B#18641
[Do Not Merge] model : LFM2.5-Audio-1.5B#18641tdakhran wants to merge 49 commits intoggml-org:masterfrom
Conversation
c275436 to
e1a8fd1
Compare
|
If the string |
Or that. We just have to remember to remove them all from the merge message. :) |
Change is decoupled from ggml-org#18641. [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) needs streaming istft for generating output audio. * add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction * replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output). * unified templated FFT/IFFT implementation supporting both forward and inverse transforms
… tarek/feat/os-lfm2.5-audio-1.5b-upstream [no ci]
Change is decoupled from #18641. [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) needs streaming istft for generating output audio. * add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction * replace global audio cache with per-instance cache, the model requires two independent caches, for preprocessing (audio input) and for istft (audio output). * unified templated FFT/IFFT implementation supporting both forward and inverse transforms
|
Hello Tarek, I am trying to build your WIP PR. With the last commit: 'Read n_layer from gguf', using LTO, building fails at the very end of building here: llama-server and llama-liquid-audio-server are succefully built, cli fails. If there is anything I can do to help testing let me know. Thank you so much. |
|
@elfarolab , mentioned commit didn't change anything related to compilation or LTO, could it be that there are stale object files somewhere? Tested that the clean build in UPD: it's related to miniaudio cli defines implementation here https://github.com/ggml-org/llama.cpp/pull/18641/changes#diff-73f13371b37801825dc2cdbfacadf9af40aef9dca4770d9dacbbe3534c7a7dacR13 , another implementation is defined in mtmd audio. try commenting this line |
|
Before building I delete the building destination directory every time. I always build llama.cpp the same way with the options above, never get failures. |
|
@elfarolab , it should work now, there were two implementations of miniaudio |
rebuilding |
4f1cc0c to
4bee388
Compare
4bee388 to
39ff210
Compare
[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ```
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
…-lfm2.5-audio-1.5b-upstream
…-lfm2.5-audio-1.5b-upstream
…-lfm2.5-audio-1.5b-upstream
[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ```
* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in #18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
@tdakhran I see all 4 have been merged does this mean LFM2.5 Audio works on LlamaCPP? |
* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in ggml-org#18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
It's yes and no. ASR part was merged long time ago, for speech output, changes to mtmd API and llama server are required. |
…2.5-audio-1.5b-upstream [no ci]
|
I tried to build from your branch at the last commit 006639c, but ended up with the following error: I chagned Grabbed the tarball from the webpage. Hmm, had to |
|
@zcattacz , this is a draft PR, not all targets are guaranteed to build. This works |
Remove PR ggml-org#12794 (OuteTTS 1.0) and PR ggml-org#18039 (Eagle-3 speculative decoding) from the cherry-pick list. Neither is used by any model in the registry. Only PR ggml-org#18641 (LFM2.5 audio) remains.
|
FYI @tdakhran , I had some discussions recently with nvidia team to bring their chatterbox to llama.cpp. I summarized the design choice in #18641 I'll try to take over this PR when I have time (and implement it as the reference for the new audio generation API in mtmd). Feel free to continue the discussion in the mentioned issue. Thanks! |
Liquid AI released LFM2.5-Audio-1.5B.
This PR is intended to provide a functional implementation in
llama.cppuntil necessary infrastructure is implemented.The plan is to split and merge it into upstream in smaller chunks, while keeping and tracking functional implementation here. It will be rebased from time to time.
GGUFs, precompiled runners, and instructions, live in https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B-GGUF.
Merge plan:
n_embd_outmodel : add LFM2-ColBert-350M #18607Demo of capabilities (watch with audio on)
demo.mp4
Thank you, @ngxson for the help!