llama : llama_perf + option to disable timings during decode by ggerganov · Pull Request #9355 · ggml-org/llama.cpp

ggerganov · 2024-09-07T17:53:02Z

Add option to disable time system calls during decode (llama_context_params.no_perf). Performance measurements are disabled by default for libllama, but for the examples in llama.cpp they are enabled by default
Restore getting internal timing information llama_perf_get

TODO:

add llama_arg after common : refactor arg parser #9308

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml-ci

ggerganov · 2024-09-10T09:27:46Z

        bool offload_kqv; // whether to offload the KQV ops (including the KV cache) to GPU
        bool flash_attn;  // whether to use flash attention [EXPERIMENTAL]
-      //bool no_perf;     // whether to measure performance timings, TODO: implement
+        bool no_perf;     // whether to measure performance timings

        // Abort callback


This is minor libllama API breaking change due to the addition of the no_perf parameter

I don't think this will be a breaking change, since struct llama_context_params is expected to be created by llama_context_default_params(), right?

AFAIK such changes still break external bindings, such as: https://github.com/abetlen/llama-cpp-python/blob/c032fc65b0873337ed39e5d63e15468a5d797646/llama_cpp/llama_cpp.py#L841

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

slaren · 2024-09-10T15:23:06Z

        LLAMA_PERF_TYPE_SAMPLER_CHAIN = 1,
    };

+    LLAMA_API struct llama_perf_data llama_perf_get(const void * ctx, enum llama_perf_type type);


I think it would be preferable to have two separate functions, just to remove the possibility of calling it with the wrong type of pointer.

ggml-ci

slaren · 2024-09-11T18:35:07Z

+        return data;
    }
+
+    const auto * p = (const struct llama_sampler_chain *) chain->ctx;


These casts are very error prone and should always be checked. To do so, I would suggest moving these functions to llama-sampling.cpp, and checking the interface pointer. The llama_sampler_chain struct could also be moved to llama-sampling.cpp.

Additionally, since this only works with the chain sampler, it should be documented somewhere, either in the function/struct names, or with an explicit comment, otherwise the natural assumption is that it should work with any sampler.

Upon passing a non-chain sampler, should it return empty data or call GGML_ABORT()?

I think an abort would be better here until we can return status codes from functions, since it is most definitely not intended and the important part is that the programmer notices.

ggml-ci

…g#9355) * llama : llama_perf + option to disable timings during decode ggml-ci * common : add llama_arg * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * perf : separate functions in the API ggml-ci * perf : safer pointer handling + naming update ggml-ci * minor : better local var name * perf : abort on invalid sampler pointer ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

ggerganov force-pushed the gg/llama-perf branch from f7cee89 to eda507d Compare September 7, 2024 17:58

ggerganov added 2 commits September 8, 2024 08:54

llama : llama_perf + option to disable timings during decode

471e7e1

ggml-ci

common : add llama_arg

ade52b6

ggerganov force-pushed the gg/llama-perf branch from eda507d to ade52b6 Compare September 8, 2024 05:58

ggerganov marked this pull request as ready for review September 8, 2024 05:58

ggerganov mentioned this pull request Sep 8, 2024

llama : refactor sampling v2 #9294

Merged

4 tasks

ngxson reviewed Sep 8, 2024

View reviewed changes

Comment thread common/common.cpp Outdated

Merge branch 'master' into gg/llama-perf

6cce78c

ggml-ci

ggerganov added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Sep 10, 2024

ggerganov commented Sep 10, 2024

View reviewed changes

ggerganov requested review from ngxson and slaren September 10, 2024 09:27

ngxson approved these changes Sep 10, 2024

View reviewed changes

ngxson reviewed Sep 10, 2024

View reviewed changes

Comment thread src/llama.cpp Outdated

Comment thread src/llama.cpp

Update src/llama.cpp

fd46535

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

slaren reviewed Sep 10, 2024

View reviewed changes

perf : separate functions in the API

f42de24

ggml-ci

github-actions bot added the examples label Sep 11, 2024

slaren reviewed Sep 11, 2024

View reviewed changes

ggerganov added 4 commits September 12, 2024 09:19

perf : safer pointer handling + naming update

7362f28

ggml-ci

Merge branch 'master' into gg/llama-perf

44f0218

ggml-ci

minor : better local var name

f35e9b8

perf : abort on invalid sampler pointer

444b757

ggml-ci

slaren approved these changes Sep 13, 2024

View reviewed changes

ggerganov merged commit 0abc6a2 into master Sep 13, 2024

ggerganov deleted the gg/llama-perf branch September 13, 2024 06:53

ggerganov mentioned this pull request Sep 13, 2024

changelog : libllama API #9289

Open

jakexcosme mentioned this pull request Oct 22, 2025

changelog : libllama API COG-GTM/llama.cpp#246

Open

RatStar811 approved these changes Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : llama_perf + option to disable timings during decode#9355

llama : llama_perf + option to disable timings during decode#9355
ggerganov merged 9 commits intomasterfrom
gg/llama-perf

ggerganov commented Sep 7, 2024 •

edited

Loading

Uh oh!

Uh oh!

ggerganov Sep 10, 2024

Uh oh!

ngxson Sep 10, 2024

Uh oh!

ggerganov Sep 10, 2024

Uh oh!

Uh oh!

Uh oh!

slaren Sep 10, 2024

Uh oh!

slaren Sep 11, 2024

Uh oh!

slaren Sep 12, 2024

Uh oh!

ggerganov Sep 12, 2024

Uh oh!

slaren Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ggerganov commented Sep 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ggerganov commented Sep 7, 2024 •

edited

Loading