ggml-backend : add device and backend reg interfaces by slaren · Pull Request #9707 · ggml-org/llama.cpp

slaren · 2024-10-01T15:26:41Z

Adds the backend device and backend registry interfaces. These interfaces represent an entry point to the backend, and aim to replace commonly used custom backend functions and pave the way to support dynamically loadable backends.

The backend registry interface provides a way to enumerate the devices exposed by the backend, obtain function pointers to custom backend functions, and other functionality that is common to the entire backend.

The backend device interface has functions to create backend instances and query information about the devices. Some of the functions of the backend interface have been moved to the device interface.

Currently, only the CUDA and CPU backends implement these interfaces, and support in other backends will be added progressively. During the transition period, currently existing backends that do not implement these interfaces can still be used, but eventually llama.cpp will be refactored to use the backend registry API only. Most backends already implement the functions in these interfaces, so this should only require shuffling some code around. test-backend-ops will stop working for backends that do not implement these interfaces.

Other changes:

Removes the GGML_CALL macro: this was added to support llamafile, but is never used within ggml. As a result, it is very hard to maintain because we don't know which functions need it, and it keeps creeping to new functions in a very inconsistent manner. Once support for loading backends dynamically is added to ggml, other projects can use this implementation rather than rolling their own.

ggml-ci

JohannesGaessler · 2024-10-01T15:34:19Z

-    GGML_API void                   ggml_backend_event_wait       (ggml_backend_t backend, ggml_backend_event_t event);
+    GGML_API ggml_backend_event_t ggml_backend_event_new        (ggml_backend_dev_t device);
+    GGML_API void                 ggml_backend_event_free       (ggml_backend_event_t event);
+    GGML_API void                 ggml_backend_event_record     (ggml_backend_event_t event, ggml_backend_t backend);


Why is it necessary to pass a backend? Is ggml_backend_dev_t not backend-specific?

ggml_backend_t represents a stream or async queue. The events are associated with a device, but not a stream. ggml_backend_event_record records the event on the stream represented by backend, which should be a backend (stream) of the same device than the event. I know that this is a bit confusing at the moment, ggml_backend_t should be renamed to something like ggml_backend_stream, but I am afraid that it will break a lot of code.

JohannesGaessler · 2024-10-01T15:56:16Z

+
+    GGML_API ggml_backend_t ggml_backend_cpu_init(void);
+
+    GGML_API bool ggml_backend_is_cpu                (ggml_backend_t backend);


Is the long-term plan to make this check against a device instead of a backend?

I don't intend to change these functions at the moment. Most of the functions that need these checks, like ggml_backend_cpu_set_n_threads, operate on a ggml_backend_t object, so it is still convenient to have a function to check if a ggml_backend_t belongs to a specific backend. After all the backends have been adapted to the new interface this could be re-evaluated.

JohannesGaessler · 2024-10-01T16:01:47Z

+        // (optional) tensor copy: dst is in the buffer, src may be in any buffer, including buffers from a different backend (return false if not supported)
+        bool         (*cpy_tensor)   (ggml_backend_buffer_t buffer, const struct ggml_tensor * src, struct ggml_tensor * dst);
+        // clear the entire buffer
+        void         (*clear)        (ggml_backend_buffer_t buffer, uint8_t value);


This is in essence the same functionality as memset_tensor except at a different scope, should we be using the same name?

The reason I didn't want to call this function memset when it was added is because it does not allow specifying neither the offset or the amount of memory to clear, it always applies to the entire buffer. I believe that the name clear makes it a bit more intuitive that the function applies to the entire buffer and is not as flexible as memset. memset_tensor is fine since it effectively provides the full functionality of a memset function, although limited to tensors. Anyway I may be overthinking this, it is a rather minor distinction.

JohannesGaessler · 2024-10-01T16:13:53Z

+    ggml_backend_registry() {
+#ifdef GGML_USE_CUDA
+        register_backend(ggml_backend_cuda_reg());
+#endif

-    return ggml_backend_registry_count;
-}
+        register_backend(ggml_backend_cpu_reg());

-size_t ggml_backend_reg_find_by_name(const char * name) {
-    ggml_backend_registry_init();
+        // TODO: sycl, metal, vulkan, kompute, cann
+    }


Is there a meaning behind the order of backends, e.g. the priority with which they are used?

Functions like ggml_backend_dev_by_type choose the first device of the given type, so the order can make a difference.

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

…backends

ggml-ci

ggerganov

I've started adapting the Metal backend to the new interfaces and everything is working out smoothly. Feel free to merge this PR at any point and in the meantime I will continue the Metal implementation in #9713.

LostRuins · 2024-10-06T09:33:02Z

Hi,
buffer type %s is not the default buffer type for device %s for async uploads

I'm a little confused by this error message, what exactly does it mean?

slaren · 2024-10-06T12:24:33Z

It's mostly to prevent host buffers from being used with the incorrect backend. It is not an error, it is a debug message meant to help developers understand why async uploads is not being used, in llama.cpp it shouldn't be printed unless run with --verbose.

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ggml_backend_tensor_copy checked tensor->buffer directly, but GGML views keep buffer NULL and store the backing allocation on view_src. tensor_get already resolved views; tensor_copy did not. The mismatch was latent since tensor_copy was added (ggml-org#9707, 2024-10). It surfaced after LLAMA_STATE_SEQ_FLAGS_ON_DEVICE I/O (ggml-org#22679, 2026-05): read/write device destructors stage copies via ggml_view_1d and tensor_copy, and server context checkpoints adopted ON_DEVICE device IO (41d6949). Parallel MTP workloads then hit GGML_ASSERT(buffer) in get_type during checkpoint save/load. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ggml-backend : add device and backend reg interfaces

0cbdf13

ggml-ci

slaren mentioned this pull request Oct 1, 2024

[SYCL] Add SYCL Backend registry, device and Event Interfaces #9705

Merged

4 tasks

JohannesGaessler reviewed Oct 1, 2024

View reviewed changes

slaren and others added 2 commits October 1, 2024 18:52

Update ggml/src/ggml-backend-impl.h

805fea9

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

add device props/caps, fully support async upload for all compatible …

6ff0e7a

…backends

github-actions Bot added the Kompute https://github.com/KomputeProject/kompute/ label Oct 1, 2024

slaren force-pushed the sl/backend-registry-2 branch 4 times, most recently from e7a6deb to 9ade7ce Compare October 2, 2024 00:45

update other backends

04ef648

ggml-ci

slaren force-pushed the sl/backend-registry-2 branch from 9ade7ce to 04ef648 Compare October 2, 2024 00:45

github-actions Bot added script Script related devops improvements to build systems and github actions labels Oct 2, 2024

fix pipeline parallelism check

db53f8e

slaren marked this pull request as ready for review October 2, 2024 02:01

ggerganov reviewed Oct 2, 2024

View reviewed changes

Comment thread ggml/src/ggml-backend.cpp Outdated

ggerganov reviewed Oct 2, 2024

View reviewed changes

Comment thread ggml/src/ggml-backend.cpp Outdated

ggerganov mentioned this pull request Oct 2, 2024

ggml : add metal backend registry / device #9713

Merged

4 tasks

ggerganov approved these changes Oct 2, 2024

View reviewed changes

slaren added 4 commits October 2, 2024 14:55

removed unused function, add missing statics

f9cab02

Merge remote-tracking branch 'origin/master' into sl/backend-registry-2

2a60833

remove move unused reg_init functions from backends

6ff0d67

fix align [no ci]

b5516aa

slaren added 5 commits October 2, 2024 20:40

fix consistency issues with the usage of main_gpu

dc475c3

move device backend_reg to the struct

d0c4954

fix some inconsistencies in the names of functions

cfef355

fix more naming inconsistencies, make interface structs const

ffeca35

Merge remote-tracking branch 'origin/master' into sl/backend-registry-2

a9d172c

slaren merged commit c83ad6d into master Oct 2, 2024

mtmcp mentioned this pull request Oct 3, 2024

ggml: unify backend logging mechanism #9709

Merged

4 tasks

leo-pony mentioned this pull request Oct 12, 2024

Feature Request: [CANN] backend adapts to llama.cpp dynamic backend loading mechanism #9862

Closed

4 tasks

leo-pony mentioned this pull request Oct 21, 2024

[CANN] Adapt to dynamically loadable backends mechanism #9970

Merged

4 tasks

slaren deleted the sl/backend-registry-2 branch October 29, 2024 11:18

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

7804ca4

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

0f92929

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

230f88a

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

de755b7

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

db78c79

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

64dc838

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

a115694

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026

ggml-backend : add device and backend reg interfaces (ggml-org#9707)

766283f

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>


		GGML_API ggml_backend_t ggml_backend_cpu_init(void);

		GGML_API bool ggml_backend_is_cpu (ggml_backend_t backend);

Conversation

slaren commented Oct 1, 2024

Uh oh!

JohannesGaessler Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohannesGaessler Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

LostRuins commented Oct 6, 2024

Uh oh!

slaren commented Oct 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

slaren Oct 1, 2024 •

edited

Loading

slaren Oct 1, 2024 •

edited

Loading

slaren commented Oct 6, 2024 •

edited

Loading