Skip to content

ggml-backend : add device and backend reg interfaces#9707

Merged
slaren merged 14 commits into
masterfrom
sl/backend-registry-2
Oct 2, 2024
Merged

ggml-backend : add device and backend reg interfaces#9707
slaren merged 14 commits into
masterfrom
sl/backend-registry-2

Conversation

@slaren
Copy link
Copy Markdown
Member

@slaren slaren commented Oct 1, 2024

Adds the backend device and backend registry interfaces. These interfaces represent an entry point to the backend, and aim to replace commonly used custom backend functions and pave the way to support dynamically loadable backends.

The backend registry interface provides a way to enumerate the devices exposed by the backend, obtain function pointers to custom backend functions, and other functionality that is common to the entire backend.

The backend device interface has functions to create backend instances and query information about the devices. Some of the functions of the backend interface have been moved to the device interface.

Currently, only the CUDA and CPU backends implement these interfaces, and support in other backends will be added progressively. During the transition period, currently existing backends that do not implement these interfaces can still be used, but eventually llama.cpp will be refactored to use the backend registry API only. Most backends already implement the functions in these interfaces, so this should only require shuffling some code around. test-backend-ops will stop working for backends that do not implement these interfaces.

Other changes:

  • Removes the GGML_CALL macro: this was added to support llamafile, but is never used within ggml. As a result, it is very hard to maintain because we don't know which functions need it, and it keeps creeping to new functions in a very inconsistent manner. Once support for loading backends dynamically is added to ggml, other projects can use this implementation rather than rolling their own.

@github-actions github-actions Bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 1, 2024
Comment thread ggml/include/ggml-backend.h Outdated
GGML_API void ggml_backend_event_wait (ggml_backend_t backend, ggml_backend_event_t event);
GGML_API ggml_backend_event_t ggml_backend_event_new (ggml_backend_dev_t device);
GGML_API void ggml_backend_event_free (ggml_backend_event_t event);
GGML_API void ggml_backend_event_record (ggml_backend_event_t event, ggml_backend_t backend);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to pass a backend? Is ggml_backend_dev_t not backend-specific?

Copy link
Copy Markdown
Member Author

@slaren slaren Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ggml_backend_t represents a stream or async queue. The events are associated with a device, but not a stream. ggml_backend_event_record records the event on the stream represented by backend, which should be a backend (stream) of the same device than the event. I know that this is a bit confusing at the moment, ggml_backend_t should be renamed to something like ggml_backend_stream, but I am afraid that it will break a lot of code.

Comment thread ggml/include/ggml-backend.h Outdated

GGML_API ggml_backend_t ggml_backend_cpu_init(void);

GGML_API bool ggml_backend_is_cpu (ggml_backend_t backend);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the long-term plan to make this check against a device instead of a backend?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend to change these functions at the moment. Most of the functions that need these checks, like ggml_backend_cpu_set_n_threads, operate on a ggml_backend_t object, so it is still convenient to have a function to check if a ggml_backend_t belongs to a specific backend. After all the backends have been adapted to the new interface this could be re-evaluated.

// (optional) tensor copy: dst is in the buffer, src may be in any buffer, including buffers from a different backend (return false if not supported)
bool (*cpy_tensor) (ggml_backend_buffer_t buffer, const struct ggml_tensor * src, struct ggml_tensor * dst);
// clear the entire buffer
void (*clear) (ggml_backend_buffer_t buffer, uint8_t value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in essence the same functionality as memset_tensor except at a different scope, should we be using the same name?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't want to call this function memset when it was added is because it does not allow specifying neither the offset or the amount of memory to clear, it always applies to the entire buffer. I believe that the name clear makes it a bit more intuitive that the function applies to the entire buffer and is not as flexible as memset. memset_tensor is fine since it effectively provides the full functionality of a memset function, although limited to tensors. Anyway I may be overthinking this, it is a rather minor distinction.

Comment thread ggml/src/ggml-backend-impl.h Outdated
Comment thread ggml/include/ggml-backend.h
Comment thread ggml/src/ggml-backend.cpp
Comment on lines +523 to +531
ggml_backend_registry() {
#ifdef GGML_USE_CUDA
register_backend(ggml_backend_cuda_reg());
#endif

return ggml_backend_registry_count;
}
register_backend(ggml_backend_cpu_reg());

size_t ggml_backend_reg_find_by_name(const char * name) {
ggml_backend_registry_init();
// TODO: sycl, metal, vulkan, kompute, cann
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a meaning behind the order of backends, e.g. the priority with which they are used?

Copy link
Copy Markdown
Member Author

@slaren slaren Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions like ggml_backend_dev_by_type choose the first device of the given type, so the order can make a difference.

slaren and others added 2 commits October 1, 2024 18:52
@github-actions github-actions Bot added the Kompute https://github.com/KomputeProject/kompute/ label Oct 1, 2024
@slaren slaren force-pushed the sl/backend-registry-2 branch 4 times, most recently from e7a6deb to 9ade7ce Compare October 2, 2024 00:45
@slaren slaren force-pushed the sl/backend-registry-2 branch from 9ade7ce to 04ef648 Compare October 2, 2024 00:45
@github-actions github-actions Bot added script Script related devops improvements to build systems and github actions labels Oct 2, 2024
@slaren slaren marked this pull request as ready for review October 2, 2024 02:01
Comment thread ggml/src/ggml-backend.cpp Outdated
Comment thread ggml/src/ggml-backend.cpp Outdated
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started adapting the Metal backend to the new interfaces and everything is working out smoothly. Feel free to merge this PR at any point and in the meantime I will continue the Metal implementation in #9713.

@slaren slaren merged commit c83ad6d into master Oct 2, 2024
@mtmcp mtmcp mentioned this pull request Oct 3, 2024
4 tasks
@LostRuins
Copy link
Copy Markdown
Collaborator

Hi,
buffer type %s is not the default buffer type for device %s for async uploads

I'm a little confused by this error message, what exactly does it mean?

@slaren
Copy link
Copy Markdown
Member Author

slaren commented Oct 6, 2024

It's mostly to prevent host buffers from being used with the incorrect backend. It is not an error, it is a debug message meant to help developers understand why async uploads is not being used, in llama.cpp it shouldn't be printed unless run with --verbose.

@slaren slaren deleted the sl/backend-registry-2 branch October 29, 2024 11:18
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Yoshi4470 added a commit to Yoshi4470/llama.cpp that referenced this pull request May 24, 2026
ggml_backend_tensor_copy checked tensor->buffer directly, but GGML views
keep buffer NULL and store the backing allocation on view_src. tensor_get
already resolved views; tensor_copy did not. The mismatch was latent since
tensor_copy was added (ggml-org#9707, 2024-10).

It surfaced after LLAMA_STATE_SEQ_FLAGS_ON_DEVICE I/O (ggml-org#22679, 2026-05):
read/write device destructors stage copies via ggml_view_1d and
tensor_copy, and server context checkpoints adopted ON_DEVICE device IO
(41d6949). Parallel MTP workloads then hit GGML_ASSERT(buffer) in
get_type during checkpoint save/load.

Co-authored-by: Cursor <cursoragent@cursor.com>
phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Kompute https://github.com/KomputeProject/kompute/ Nvidia GPU Issues specific to Nvidia GPUs script Script related SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants