Bug: loading llava models fails

**Note: This issue was copied from [https://github.com/ggml-org/llama.cpp/issues/9455](https://github.com/ggml-org/llama.cpp/issues/9455)**

**Original Author:** @mudler
**Original Issue Number:** #9455
**Created:** 2024-09-12T16:59:37Z

---

### What happened?

It seems that loading llava models crashes entirely. I can reproduce that 100% hit with moondream models.

this issue has been discussed already in https://github.com/ggerganov/llama.cpp/issues/9066#issuecomment-2295314239 and in https://github.com/ggerganov/llama.cpp/pull/9294#issuecomment-2345382277, this ticket is just a tracker to discuss about the issue

### Name and Version

Commit still working here: 815b1fb20a53e439882171757825bacb1350de04 
Commit which is not working: e6b7801bd189d102d901d3e72035611a25456ef1 (which includes https://github.com/ggerganov/llama.cpp/pull/9082 ), also daa9623ab051a8162ae750b150b9522571b55f21 is not working (which is older)

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stderr /home/mudler/_git/LocalAI/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml.c:13835: GGML_ASSERT(i01 >= 0 &
& i01 < ne01) failed


10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout [Thread debugging using libthread_db enabled]
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout Using host libthread_db library "/lib64/libthread_db.so.1".
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout 0x00007f989b8e94a3 in ?? () from /lib64/libgomp.so.1
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #0  0x00007f989b8e94a3 in ?? () from /lib64/libgomp.so.1
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #1  0x00000000008222e5 in ggml_graph_compute_thread.isra ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #2  0x00007f989b8dcd16 in GOMP_parallel () from /lib64/libgomp.so.1
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #3  0x0000000000825a2a in ggml_graph_compute ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #4  0x0000000000834010 in ggml_backend_cpu_graph_compute ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #5  0x000000000083784c in ggml_backend_graph_compute ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #6  0x0000000000652b63 in clip_image_batch_encode.constprop ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #7  0x0000000000653553 in clip_image_encode ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #8  0x0000000000657ac8 in llava_image_embed_make_with_clip_img ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #9  0x00000000004e2c09 in llama_server_context::update_slots() [clone .isra.0] ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #10 0x00000000004d7629 in llama_server_queue::start_loop() ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout #11 0x000000000048b040 in main ()
10:25PM DBG GRPC(moondream2-text-model-f16.gguf-127.0.0.1:42747): stdout [Inferior 1 (process 13029) detached]
```

## Note

- Flagged as critical as it completely crashes `llama.cpp`
- `llama.cpp` is being used as a library ( https://github.com/mudler/LocalAI/pull/3497 )
- Applying the suggestion described in https://github.com/ggerganov/llama.cpp/issues/9066#issuecomment-2295314239 seems to workaround the issue for me
```
diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
index 342042ff..224db9b5 100644
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2419,7 +2419,7 @@ bool clip_image_batch_encode(clip_ctx * ctx, const int n_threads, const clip_ima
             struct ggml_tensor * patches = ggml_graph_get_tensor(gf, "patches");
             int* patches_data = (int*)malloc(ggml_nbytes(patches));
             for (int i = 0; i < num_patches; i++) {
-                patches_data[i] = i + 1;
+                patches_data[i] = i;
             }
             ggml_backend_tensor_set(patches, patches_data, 0, ggml_nbytes(patches));
             free(patches_data);
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: loading llava models fails #242

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: loading llava models fails #242

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions