Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
3f789b8
[Executorch] parallelize op_choose_qparams
kimishpatel Nov 5, 2025
08dd980
[Executorch] Add simd path for op quantize
kimishpatel Nov 5, 2025
27fc8b1
[Executorch] Add multithreading for op_quantize
kimishpatel Nov 5, 2025
ae61ab4
Reduce allocation overhead in quantized sdpa
kimishpatel Nov 5, 2025
ea16e15
[Executorch] Introduce caching cpu memory allocator
kimishpatel Nov 5, 2025
c3ed4b2
Update base for Update on "[Executorch] Introduce caching cpu memory …
kimishpatel Nov 6, 2025
08ab552
Update on "[Executorch] Introduce caching cpu memory allocator"
kimishpatel Nov 6, 2025
dbf63cc
Update base for Update on "[Executorch] Introduce caching cpu memory …
kimishpatel Nov 6, 2025
f9ce984
Update on "[Executorch] Introduce caching cpu memory allocator"
kimishpatel Nov 6, 2025
86c7c4b
Update base for Update on "[Executorch] Introduce caching cpu memory …
kimishpatel Nov 10, 2025
0c23c32
Update on "[Executorch] Introduce caching cpu memory allocator"
kimishpatel Nov 10, 2025
68d76d3
Update base for Update on "[Executorch] Introduce caching cpu memory …
kimishpatel Nov 11, 2025
79bb135
Update on "[Executorch] Introduce caching cpu memory allocator"
kimishpatel Nov 11, 2025
351a400
[Executorch] Use temp allocator for allocating scratch memory
kimishpatel Nov 11, 2025
b4fdc22
[Executorch] Make module constructors uniform across
kimishpatel Nov 11, 2025
daca5e0
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 14, 2025
30c6fba
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 20, 2025
e73b365
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 20, 2025
f12869c
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 20, 2025
7f9288a
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 21, 2025
3efee70
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 22, 2025
75900d0
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 23, 2025
ca1757a
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 23, 2025
a4912c5
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 23, 2025
39cd25d
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 24, 2025
5bce956
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 24, 2025
5df2408
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Nov 25, 2025
6a0d471
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
0bf3b2e
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
d83b4a9
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
a1f687f
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
2d79945
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
365be54
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
ba27007
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
20854fc
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
36cce27
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
834171f
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 4, 2025
bae4829
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 5, 2025
71cc532
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 5, 2025
230cd24
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 5, 2025
997b5e2
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 5, 2025
7590e9c
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 5, 2025
f06f5ba
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 6, 2025
e22cb35
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 7, 2025
251b270
Update base for Update on "[Executorch][LLM] Use caching allocator fo…
kimishpatel Dec 9, 2025
6ebb435
Use caching allocator for runner (#15730)
kimishpatel Apr 6, 2026
d759f09
Update base for Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 15, 2026
11ec89c
Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 15, 2026
e30bae0
Update base for Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 16, 2026
704fb2e
Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 16, 2026
467774d
Update base for Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 17, 2026
7359cf2
Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 17, 2026
d8b32c6
Update base for Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 17, 2026
5b9bf5e
Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 17, 2026
42830ac
Update base for Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 20, 2026
3dd3158
Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 20, 2026
056a2a3
Update base for Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 24, 2026
c75df37
Update on "Use caching allocator for runner (#15730)"
kimishpatel Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1124,6 +1124,8 @@ if(EXECUTORCH_BUILD_EXTENSION_TRAINING)
endif()

if(EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/extension/memory_allocator)
list(APPEND _executorch_extensions extension_memory_allocator)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/extension/llm/runner)
list(APPEND _executorch_extensions extension_llm_runner)
endif()
Expand Down
5 changes: 3 additions & 2 deletions extension/llm/runner/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,9 @@ add_subdirectory(
${CMAKE_CURRENT_BINARY_DIR}/../sampler
)

set(runner_deps executorch_core extension_module extension_tensor
extension_llm_sampler tokenizers::tokenizers
set(runner_deps
executorch_core extension_module extension_tensor extension_llm_sampler
extension_memory_allocator tokenizers::tokenizers
)

# depend on arange_utils
Expand Down
21 changes: 19 additions & 2 deletions extension/llm/runner/llm_runner_helper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <executorch/extension/llm/runner/text_llm_runner.h>
#include <executorch/extension/llm/runner/text_prefiller.h>
#include <executorch/extension/llm/runner/text_token_generator.h>
#include <executorch/extension/memory_allocator/cpu_caching_malloc_allocator.h>
#include <executorch/runtime/core/result.h>
#include <executorch/runtime/platform/runtime.h>
#include <pytorch/tokenizers/hf_tokenizer.h>
Expand Down Expand Up @@ -226,12 +227,28 @@ std::unique_ptr<TextLLMRunner> create_text_llm_runner(

// Create the Module
std::unique_ptr<Module> module;
uint32_t max_cached_memory_size_bytes_ = 1024 * 1024 * 10; // 10MB
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded value of 10MB for the caching allocator size should be documented or made configurable. According to the PR description, this improves performance by 6% on iOS for SDPA op temp allocations, but different models or use cases may benefit from different cache sizes. Consider:

  1. Adding a comment explaining why 10MB was chosen
  2. Making this value configurable through a parameter or constant
  3. Documenting the performance implications in code comments

Copilot uses AI. Check for mistakes.
if (data_files.size() > 0) {
module = std::make_unique<Module>(
model_path, data_files, load_mode, std::move(event_tracer));
model_path,
data_files,
load_mode,
std::move(event_tracer),
nullptr, // memory allocator
std::make_unique<
executorch::extension::CPUCachingAllocator>( // temp memory
// allocator
max_cached_memory_size_bytes_));
} else {
module = std::make_unique<Module>(
model_path, load_mode, std::move(event_tracer));
model_path,
load_mode,
std::move(event_tracer), // event tracer
nullptr, // memory allocator
std::make_unique<
executorch::extension::CPUCachingAllocator>( // temp memory
// allocator
max_cached_memory_size_bytes_));
}

// Get metadata from Module
Expand Down
1 change: 1 addition & 0 deletions extension/llm/runner/targets.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def define_common_targets():
":text_prefiller" + aten_suffix,
":text_token_generator" + aten_suffix,
"//executorch/extension/llm/runner/io_manager:io_manager" + aten_suffix,
"//executorch/extension/memory_allocator:cpu_caching_allocator",
"//pytorch/tokenizers:hf_tokenizer",
"//pytorch/tokenizers:llama2c_tokenizer",
"//pytorch/tokenizers:sentencepiece",
Expand Down
Loading