ggml : do not abort when ggml_aligned_malloc fails by slaren · Pull Request #10130 · ggml-org/llama.cpp

slaren · 2024-11-01T23:56:59Z

The change to use ggml_aligned_malloc in ggml-backend also caused it to crash the application when the memory allocation fails, which is not intended.

Additionally, added a suggestion to reduce the ctx size when kv initialization fails in llama.cpp.

ggerganov

Should we add temporary GGML_ASSERTs in ggml_threadpool_new_impl where we use ggml_aligned_malloc, until we start handling the failures?

slaren · 2024-11-02T11:36:53Z

Should we add temporary GGML_ASSERTs in ggml_threadpool_new_impl where we use ggml_aligned_malloc, until we start handling the failures?

Yes, although if a malloc so small fails, there isn't much that you can do at that point anyway, so crashing the application in that case is fine. I am not sure why the threadpool needs aligned malloc in any case, I will replace it with a standard malloc and add a check.

Nexesenex · 2024-11-02T19:11:27Z

Additionally, added a suggestion to reduce the ctx size when kv initialization fails in llama.cpp.

Would it be possible that, in case the kv initialization fails, llama.cpp diminishes by itself automatically the ctx size (by steps of 2048, for example) until the initialization passes during the same loading process, crashing only when no kv cache can be allocated, or does such failure technically demands a crash?

slaren · 2024-11-02T20:03:38Z

To clarify:

llama.cpp does not crash when there is insufficient memory to allocate the KV cache, it returns an error (this was a bug)
Applications are free to handle the error in any way they want, including trying allocating a smaller KV cache

The llama.cpp library however, should not do that automatically, that's entirely up to the application.

So with that out of the way, the question is if the llama.cpp examples should do that? I expect that would make about as many people angry as it would make happy, so I would say no.

ggerganov · 2024-11-04T10:53:16Z

So with that out of the way, the question is if the llama.cpp examples should do that? I expect that would make about as many people angry as it would make happy, so I would say no.

I also think better not do it for the examples.

slaren · 2024-11-04T22:14:48Z

The change to ggml_aligned_malloc was included in #10144

slaren added 2 commits November 2, 2024 00:54

ggml : do not abort when ggml_aligned_malloc fails

bf60f27

llama : suggest reduce ctx size when kv init fails

20e1211

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 2, 2024

ggerganov approved these changes Nov 2, 2024

View reviewed changes

danbev approved these changes Nov 4, 2024

View reviewed changes

slaren closed this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : do not abort when ggml_aligned_malloc fails#10130

ggml : do not abort when ggml_aligned_malloc fails#10130
slaren wants to merge 2 commits intomasterfrom
sl/aligned-alloc-no-abort

slaren commented Nov 1, 2024

Uh oh!

ggerganov left a comment

Uh oh!

slaren commented Nov 2, 2024

Uh oh!

Nexesenex commented Nov 2, 2024

Uh oh!

slaren commented Nov 2, 2024

Uh oh!

ggerganov commented Nov 4, 2024

Uh oh!

slaren commented Nov 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

slaren commented Nov 1, 2024

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

slaren commented Nov 2, 2024

Uh oh!

Nexesenex commented Nov 2, 2024

Uh oh!

slaren commented Nov 2, 2024

Uh oh!

ggerganov commented Nov 4, 2024

Uh oh!

slaren commented Nov 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants