ggml : do not abort when ggml_aligned_malloc fails#10130
ggml : do not abort when ggml_aligned_malloc fails#10130
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Should we add temporary GGML_ASSERTs in ggml_threadpool_new_impl where we use ggml_aligned_malloc, until we start handling the failures?
Yes, although if a malloc so small fails, there isn't much that you can do at that point anyway, so crashing the application in that case is fine. I am not sure why the threadpool needs aligned malloc in any case, I will replace it with a standard malloc and add a check. |
Would it be possible that, in case the kv initialization fails, llama.cpp diminishes by itself automatically the ctx size (by steps of 2048, for example) until the initialization passes during the same loading process, crashing only when no kv cache can be allocated, or does such failure technically demands a crash? |
|
To clarify:
The llama.cpp library however, should not do that automatically, that's entirely up to the application. So with that out of the way, the question is if the llama.cpp examples should do that? I expect that would make about as many people angry as it would make happy, so I would say no. |
I also think better not do it for the examples. |
|
The change to |
The change to use
ggml_aligned_mallocin ggml-backend also caused it to crash the application when the memory allocation fails, which is not intended.Additionally, added a suggestion to reduce the ctx size when kv initialization fails in llama.cpp.