Add support for quantized LeakyReLU by digantdesai · Pull Request #1 · pytorch/executorch

digantdesai · 2023-06-28T00:11:41Z

Summary: Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

Summary: Also adds support for backend_config Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 509bd4c02eb7ff5d3d47762522debd827bee7240

facebook-github-bot · 2023-06-28T00:12:38Z

This pull request was exported from Phabricator. Differential Revision: D47043207

facebook-github-bot · 2023-06-28T19:14:35Z

This pull request was exported from Phabricator. Differential Revision: D47043207

facebook-github-bot · 2023-06-30T05:13:04Z

This pull request was exported from Phabricator. Differential Revision: D47043207

Summary: Pull Request resolved: pytorch#104309 X-link: pytorch/executorch#1 Also adds support for backend_config Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:` Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 3e2f7b614713ae5c3fba6ea3056376f15826de17

Summary: X-link: pytorch/pytorch#104309 Pull Request resolved: #1 Also adds support for backend_config Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 51abd266bba7441c28578f6c58686a3d021d9d2a

Before: Each node contains a `UniformParamsBuffer`. After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`. In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since 1. some are tensor-specific (e.g. image extents) and 2. others are operator-specific (e.g. alpha for binary ops). Hence, we need **`std::vector`**. We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**. Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/) [ghstack-poisoned]

Before: Each node contains a `UniformParamsBuffer`. After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`. In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since 1. some are tensor-specific (e.g. image extents) and 2. others are operator-specific (e.g. alpha for binary ops). Hence, we need **`std::vector`**. We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**. Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/) ghstack-source-id: 218195447 Pull Request resolved: #2348

Summary: bypass-github-export-checks Pull Request resolved: #2348 Before: Each node contains a `UniformParamsBuffer`. After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`. In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since 1. some are tensor-specific (e.g. image extents) and 2. others are operator-specific (e.g. alpha for binary ops). Hence, we need **`std::vector`**. We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**. ghstack-source-id: 218195447 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D54691831 fbshipit-source-id: 84ab9f777e247fd56234290ed7f7343b9701c73f

In #2271, we already added - IntList - DoubleList - BoolList - ValueList to the schema and the runtime's Value class. Their serialization was incomplete missing two components: 1. Receiving a list in `torch.fx.Node.args`. 2. Receiving a non-tensor in `torch.fx.Node`. This change completes #1. Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice. If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166 Differential Revision: [D54708353](https://our.internmc.facebook.com/intern/diff/D54708353/) [ghstack-poisoned]

Summary: bypass-github-export-checks Pull Request resolved: #2404 In #2271, we already added - IntList - DoubleList - BoolList - ValueList to the schema and the runtime's Value class. Their serialization was incomplete missing two components: 1. Receiving a list in `torch.fx.Node.args`. 2. Receiving a non-tensor in `torch.fx.Node`. This change completes #1. Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice. If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166 ghstack-source-id: 218539049 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D54708353 fbshipit-source-id: 8641647b515e201ea63db67115c01c1532ad6566

Reviewed By: itamaro Differential Revision: D51566750

Summary: Pull Request resolved: pytorch#3763 Reviewed By: itamaro Differential Revision: D51566750

Summary: Pull Request resolved: #3763 Reviewed By: itamaro, tarun292 Differential Revision: D51566750 fbshipit-source-id: 654c426d479833867e93083e9b55786abfc24a32

Differential Revision: D75911655

Summary: index should always be smaller than weight.size(0). Adding this check in `op_embedding`. This is to avoid wild-addr-read error: ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==3544359==ERROR: AddressSanitizer: SEGV on unknown address 0x7fce2364bc00 (pc 0x000002d225a0 bp 0x7ffffc792a40 sp 0x7ffffc792990 T0) ==3544359==The signal is caused by a READ memory access. SCARINESS: 20 (wild-addr-read) #0 0x2d225a0 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175 #1 0x2d22367 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #2 0x2d2223d in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #3 0x2d21d37 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #4 0x2d21bca in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #5 0x2d20f8f in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #6 0x2d20e13 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #7 0x2d20d06 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #8 0x2d226b7 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::KernelRuntimeContext&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:329 #9 0x2d09bef in torch::executor::function::(anonymous namespace)::$_7::operator()(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) const buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:322 #10 0x2d09a70 in torch::executor::function::(anonymous namespace)::$_7::__invoke(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:297 #11 0x27d769b in executorch::runtime::Method::execute_instruction() xplat/executorch/runtime/executor/method.cpp:1306 #12 0x27d8c55 in executorch::runtime::Method::execute() xplat/executorch/runtime/executor/method.cpp:1550 #13 0x27b1e25 in executorch::extension::Module::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.cpp:261 #14 0x27afe43 in executorch::extension::Module::forward(std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.h:340 #15 0x27e0519 in executorch::extension::llm::LlmBackboneRunner::run(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llm_backbone_runner.cpp:58 #16 0x27a35c9 in executorch::extension::llm::Llama4Runner::prefill_tokens(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llama4_runner.cpp:133 #17 0x885774 in main (/data/users/larryliu/fbsource/buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/cria/benchmark/llama4/__generation_main__/generation_main+0x885774) #18 0x7fce2122c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #19 0x7fce2122c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3 #20 0x884c20 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116 AddressSanitizer can not provide additional info. AddressSanitizer: SEGV xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&) ==3544359==ABORTING ``` Differential Revision: D75982682

Summary: Pull Request resolved: pytorch#11344 Reviewed By: hsharma35 Differential Revision: D75911655

Summary: index should always be smaller than weight.size(0). Adding this check in `op_embedding`. This is to avoid wild-addr-read error: ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==3544359==ERROR: AddressSanitizer: SEGV on unknown address 0x7fce2364bc00 (pc 0x000002d225a0 bp 0x7ffffc792a40 sp 0x7ffffc792990 T0) ==3544359==The signal is caused by a READ memory access. SCARINESS: 20 (wild-addr-read) #0 0x2d225a0 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175 #1 0x2d22367 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #2 0x2d2223d in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #3 0x2d21d37 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #4 0x2d21bca in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #5 0x2d20f8f in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #6 0x2d20e13 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #7 0x2d20d06 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303 #8 0x2d226b7 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::KernelRuntimeContext&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:329 #9 0x2d09bef in torch::executor::function::(anonymous namespace)::$_7::operator()(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) const buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:322 #10 0x2d09a70 in torch::executor::function::(anonymous namespace)::$_7::__invoke(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:297 #11 0x27d769b in executorch::runtime::Method::execute_instruction() xplat/executorch/runtime/executor/method.cpp:1306 #12 0x27d8c55 in executorch::runtime::Method::execute() xplat/executorch/runtime/executor/method.cpp:1550 #13 0x27b1e25 in executorch::extension::Module::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.cpp:261 #14 0x27afe43 in executorch::extension::Module::forward(std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.h:340 #15 0x27e0519 in executorch::extension::llm::LlmBackboneRunner::run(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llm_backbone_runner.cpp:58 #16 0x27a35c9 in executorch::extension::llm::Llama4Runner::prefill_tokens(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llama4_runner.cpp:133 #17 0x885774 in main (/data/users/larryliu/fbsource/buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/cria/benchmark/llama4/__generation_main__/generation_main+0x885774) #18 0x7fce2122c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #19 0x7fce2122c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3 #20 0x884c20 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116 AddressSanitizer can not provide additional info. AddressSanitizer: SEGV xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&) ==3544359==ABORTING ``` Test Plan: Imported from GitHub, without a `Test Plan:` line. Rollback Plan: Reviewed By: Gasoonjia Differential Revision: D75982682 Pulled By: larryliu0820

Differential Revision: D75911655 Pull Request resolved: #11344

Add base64.h

BNNS copy crashes the process when the dtypes differ (#11714). With the example in this PR (#11714), we crash the process on main. Here is the stack trace from LLDB: ``` Process 19234 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 libsystem_kernel.dylib`__pthread_kill: -> 0x190ac9388 <+8>: b.lo 0x190ac93a8 ; <+40> 0x190ac938c <+12>: pacibsp 0x190ac9390 <+16>: stp x29, x30, [sp, #-0x10]! 0x190ac9394 <+20>: mov x29, sp (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124 frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892 frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64 frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564 frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680 frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616 frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188 frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72 frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148 frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376 frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52 frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340 frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152 frame #16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296 frame #17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180 ``` With this PR, the process succeeds.

Summary: At runtime this format specifier is not correctly handled. The misformatted string get's passed to strlen and eventually causes an assertion. ``` #0 strlen () at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/machine/xtensa/strlen.S:59 pytorch#1 0x610bd83d in _svfprintf_r (data=<optimized out>, fp=<optimized out>, fmt0=<optimized out>, ap=...) at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/stdio/vfprintf.c:1380 pytorch#2 0x610ffcf4 in _vsnprintf_r (ptr=<optimized out>, size=256, fmt=0x20 <error: Cannot access memory at address 0x20>, str=<optimized out>, ap=...) at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/stdio/vsnprintf.c:66 pytorch#3 vsnprintf (str=0x14012c40 <irt_janus_workq_stack+4960> "Missing operator: [z", size=256, fmt=0x20 <error: Cannot access memory at address 0x20>, ap=...) at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/stdio/vsnprintf.c:41 pytorch#4 0x610d4ddd in executorch::runtime::internal::vlogf (level=<optimized out>, timestamp=<optimized out>, filename=0x14012c40 <irt_janus_workq_stack+4960> "Missing operator: [z", function=0x60fd5fd7 "resolve_operator", line=735, format=0x60fd6023 "Missing operator: [%zd] %s", args=...) at xplat/executorch/runtime/platform/log.cpp:88 pytorch#5 0x610ce2db in executorch::runtime::internal::logf (level=executorch::runtime::LogLevel::Error, timestamp=3330441403, filename=0x14012c40 <irt_janus_workq_stack+4960> "Missing operator: [z", function=0x14012d3e <irt_janus_workq_stack+5214> "\026\262\273\200\202", <incomplete sequence \306>, line=735, format=0x60fd6023 "Missing operator: [%zd] %s") at /execution-workspace/buck-out/v2/gen/fbsource/e7835b44f7cec64a/xplat/executorch/runtime/platform/__platform__/buck-headers/executorch/runtime/platform/log.h:140 pytorch#6 0x610d8b60 in executorch::runtime::Method::resolve_operator (this=<optimized out>, op_index=1, kernels=<optimized out>, kernel_index=<optimized out>, args=..., n_args=7) at xplat/executorch/runtime/executor/method.cpp:731 pytorch#7 0x60ff2d70 in executorch::runtime::Method::init (this=0x14012fa0 <irt_janus_workq_stack+5824>, s_plan=<optimized out>, external_data_map=<optimized out>) at xplat/executorch/runtime/executor/method.cpp:926 pytorch#8 0x610d8c33 in executorch::runtime::Method::load (s_plan=0xb21690b4, program=<optimized out>, memory_manager=0xabd400f0, event_tracer=0xad540000, external_data_map=0x0) at xplat/executorch/runtime/executor/method.cpp:761 pytorch#9 0x610db216 in executorch::runtime::Program::load_method (this=0xabd4000c, method_name=<optimized out>, memory_manager=0xabd400f0, event_tracer=0xad540000, named_data_map=<optimized out>) at xplat/executorch/runtime/executor/program.cpp:299 pytorch#10 0x60ff1a80 in MethodContainer::init (this=<optimized out>, modelBuffer=..., weightBuffer=..., methodName=<optimized out>, plannedMemoryBuffers=..., methodAllocator=..., tempAllocator=..., etDumpBuffer=..., debugBufferDataSink=0x0) at arvr/firmware/silicon/ml/executorch/method_container/src/MethodContainer.cpp:104 pytorch#11 0x610cdef9 in InferenceRunnerExecutorch::initializeExecutorchObjects (this=<optimized out>) at arvr/firmware/silicon/turing/tirt/inference/src/InferenceRunnerExecutorch.cpp:255 pytorch#12 0x610ce06a in InferenceRunnerExecutorch::evaluate (this=0x14013268 <irt_janus_workq_stack+6536>) at arvr/firmware/silicon/turing/tirt/inference/src/InferenceRunnerExecutorch.cpp:297 pytorch#13 0x610cd186 in execute_model (inferenceRuntimeContext=0x24227680) at arvr/firmware/silicon/turing/tirt/inference/src/InferenceRunner.cpp:60 pytorch#14 0x610ccf0c in tirt_engine_invoke (inference_request=0x24220000) at arvr/firmware/silicon/turing/tirt/engine/src/Engine.cpp:125 pytorch#15 0x610cce25 in tirt::dispatch::tirt_command_process (request=0x24220000) at arvr/firmware/silicon/turing/tirt/command_dispatch/src/tirt_dispatcher.cpp:71 pytorch#16 0x610ccae7 in irt_janus_msg_handler (sess=0x60fc5f80 <FDLADSP0::coleman_fdladsp0_cp_tirt_fdladsp0_session_views_cp_iaas_fdlamcu_to_tirt_fdladsp0>, ctx=0x0, header=0x140137fd <irt_janus_workq_stack+7965>, payload=0x24220000, status=<optimized out>) at arvr/firmware/silicon/turing/tirt/src/IrtIccJanus.cpp:143 --Type <RET> for more, q to quit, c to continue without paging-- pytorch#17 0x610c80ec in _janus_service_handle_message (sess=0x60fc5f80 <FDLADSP0::coleman_fdladsp0_cp_tirt_fdladsp0_session_views_cp_iaas_fdlamcu_to_tirt_fdladsp0>, service_info=0x140137f8 <irt_janus_workq_stack+7960>) at arvr/firmware/wearables/libs/janus/session/consumer.c:1099 pytorch#18 janus_service (sess=0x60fc5f80 <FDLADSP0::coleman_fdladsp0_cp_tirt_fdladsp0_session_views_cp_iaas_fdlamcu_to_tirt_fdladsp0>, method=JANUS_SERVICE_ONE) at arvr/firmware/wearables/libs/janus/session/consumer.c:2236 pytorch#19 0x610c9d9a in _work_handler (w=0x14009e08 <s_janus_workq_sessions+8>) at arvr/firmware/wearables/libs/janus/modules/janus_workq/src/workq.c:62 pytorch#20 0x61100409 in triggered_work_handler (work=0x14009e08 <s_janus_workq_sessions+8>) at third-party/zephyr/zephyr_rtos/v3.7.0/zephyr/kernel/poll.c:590 pytorch#21 0x610d139f in work_queue_main (workq_ptr=0x14000f98 <s_janus_workq+24>, p2=<optimized out>, p3=<optimized out>) at third-party/zephyr/zephyr_rtos/v3.7.0/zephyr/kernel/work.c:688 pytorch#22 0x610c1172 in z_thread_entry (entry=0x610d1344 <work_queue_main>, p1=0x14000f98 <s_janus_workq+24>, p2=0x14001038 <s_janus_workq+184>, p3=0xfffffffd) at third-party/zephyr/zephyr_rtos/v3.7.0/zephyr/lib/os/thread_entry.c:48 ``` Reviewed By: lucylq, JacobSzwejbka Differential Revision: D79776266

BNNS copy crashes the process when the dtypes differ (pytorch#11714). With the example in this PR (pytorch#11714), we crash the process on main. Here is the stack trace from LLDB: ``` Process 19234 stopped * thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 libsystem_kernel.dylib`__pthread_kill: -> 0x190ac9388 <+8>: b.lo 0x190ac93a8 ; <+40> 0x190ac938c <+12>: pacibsp 0x190ac9390 <+16>: stp x29, x30, [sp, #-0x10]! 0x190ac9394 <+20>: mov x29, sp (lldb) bt * thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 frame pytorch#1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296 frame pytorch#2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124 frame pytorch#3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892 frame pytorch#4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64 frame pytorch#5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame pytorch#6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564 frame pytorch#7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680 frame pytorch#8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616 frame pytorch#9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188 frame pytorch#10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72 frame pytorch#11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148 frame pytorch#12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376 frame pytorch#13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52 frame pytorch#14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340 frame pytorch#15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152 frame pytorch#16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296 frame pytorch#17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180 ``` With this PR, the process succeeds.

…rd-shader-library [Vulkan] Fix Ninja build failure by removing wildcard dependencies

Summary: ## Context PyTorch PR pytorch/pytorch#179754 (fixing pytorch/pytorch#178042) added a dtype validation check to the `aten.embedding` meta registration in `torch/_meta_registrations.py`: ```python torch._check( indices.dtype in (torch.long, torch.int32), lambda: ( "Expected tensor for argument #1 'indices' to have one of the following " f"scalar types: Long, Int; but got {indices.dtype} instead" ), ) ``` This aligns the meta function with the C++ implementation (`checkScalarTypes` in `Embedding.cpp`), which already enforced integer indices. Previously, no meta registration existed for `aten.embedding`, so FakeTensor tracing during `torch.export`/`torch.compile` silently accepted float indices, and AOTAutograd's DCE could remove the dead node before the C++ check ever fired. ## Problem `test_batched_export_with_backprop` in `test_static_attention.py` creates example token inputs using `torch.zeros()` without specifying a dtype: ```python # Before (defaults to torch.float32) torch.zeros(batch_size, input_len) torch.zeros(1, input_len) ``` During `torch.export.export()`, these float32 tensors flow into `self.tok_embeddings(tokens)` (an `nn.Embedding` layer in `llama_transformer.py`), which dispatches to `aten.embedding`. The new meta function dtype check rejects float32 indices, causing the export to fail. Note that the actual backprop loop already uses integer indices correctly via `torch.randint(config.vocab_size, (batch_size, input_len))` — only the export-tracing example inputs were wrong. ## Fix Add explicit `dtype=torch.long` to both `torch.zeros` calls used as token example inputs: ```python # After torch.zeros(batch_size, input_len, dtype=torch.long) torch.zeros(1, input_len, dtype=torch.long) ``` Differential Revision: D101547370

Add support for quantized LeakyReLU

e276b24

Summary: Also adds support for backend_config Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 509bd4c02eb7ff5d3d47762522debd827bee7240

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jun 28, 2023

digantdesai closed this Jun 28, 2023

digantdesai mentioned this pull request Oct 6, 2023

[TOSA] Replace Linear lowering of using Matmul with FullyConnected #616

Closed

adonnini mentioned this pull request Nov 29, 2023

Android app fails with ETensor rank is immutable error #1306

Closed

junpi3 mentioned this pull request Mar 11, 2024

[ET-VK] Support multiple UniformParamsBuffer #2348

Closed

junpi3 mentioned this pull request Mar 13, 2024

[ET-VK] Serialize list types from function args #2404

Closed

8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024

Mass migrate to pybind11 2.10.4 pytorch#1

34d1c3a

Reviewed By: itamaro Differential Revision: D51566750

8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024

Mass migrate to pybind11 2.10.4 pytorch#1 (pytorch#3763)

2c8a29d

Summary: Pull Request resolved: pytorch#3763 Reviewed By: itamaro Differential Revision: D51566750

BESTTOOLBOX mentioned this pull request Jul 12, 2024

Segmentation Fault when implementing llama/stories110M Android phone deployment #4237

Closed

haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 16, 2024

apply review comments #1

8cd43ed

haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 17, 2024

apply review comments #1

c80c123

sam-wother mentioned this pull request Aug 1, 2024

SDK + Inspector output time format is inconsistent with delegates #4504

Closed

justin-Kor mentioned this pull request Oct 31, 2024

The LLAVA model loaded successfully on the Samsung S24 Ultra. After loading, when an image is fed into the model, the app crashes. FYI I am using latest code with all the steps mentioned in Repository. #6189

Closed

gpchowdari mentioned this pull request Mar 3, 2025

torch._dynamo.exc.Unsupported: Unsupported: quantized nyi in meta tensors with fake tensor propagation. #8727

Closed

billmguo mentioned this pull request Mar 12, 2025

Fix num_iters > 5 Shiftpointer issue #9150

Merged

Vinaysukhesh98 mentioned this pull request Mar 14, 2025

LlaVA Model Loads Sucessfully Failing inference app crashes after image i/p adding logs for Reference #9233

Closed

eigen-k added a commit to eigen-k/executorch that referenced this pull request Jun 3, 2025

Use GraphBuilder in test_replace_ops_passes. pytorch#1

a0c58ce

Differential Revision: D75911655

24hari1998 mentioned this pull request Jun 4, 2025

PT2E Dynamic Quantization with XNNPACK Backend for Android fails Loading of method forward in runtime #11355

Closed

eigen-k added a commit to eigen-k/executorch that referenced this pull request Jun 4, 2025

Use GraphBuilder in test_replace_ops_passes. pytorch#1 (pytorch#11344)

3d47794

Summary: Pull Request resolved: pytorch#11344 Reviewed By: hsharma35 Differential Revision: D75911655

facebook-github-bot pushed a commit that referenced this pull request Jun 6, 2025

Use GraphBuilder in test_replace_ops_passes. #1

fbe2e58

Differential Revision: D75911655 Pull Request resolved: #11344

shoumikhin mentioned this pull request Jun 13, 2025

MPS delegate crashes on iOS 26 #11655

Open

TryCAEarvr mentioned this pull request Jun 17, 2025

Gradle Error while opening the project #11749

Closed

larryliu0820 added a commit that referenced this pull request Jul 2, 2025

Merge pull request #1 from pytorch-labs/add_base64

c173f9f

Add base64.h

vikasbalaga mentioned this pull request Jul 28, 2025

How to run a executorch model directly from memory instead of saving it as a disk file #12749

Open

mroreo mentioned this pull request Nov 7, 2025

testing: track the llama export times #15535

Open

pytorch-bot Bot pushed a commit that referenced this pull request Dec 29, 2025

Merge pull request #1 from Peddinti-Sriram-Bharadwaj/fix/ninja-wildca…

738dcc1

…rd-shader-library [Vulkan] Fix Ninja build failure by removing wildcard dependencies

kirklandsign pushed a commit that referenced this pull request Jan 14, 2026

Update README.md (#1)

af5c018

psiddh mentioned this pull request Jan 26, 2026

Cortex-M backend: Fix quantizer edge-cases #16870

Merged

lordlugo mentioned this pull request Feb 24, 2026

[Android/XNNPACK] SIGSEGV in XNNWeightsCache::look_up_or_insert during memcmp on MediaTek Dimensity 6100+ (Galaxy M15) #17669

Closed

This was referenced Apr 3, 2026

TurboQuant TQ4 KV cache compression for Qwen 3.5 MoE #18687

Merged

Support extra ops modes for LLM Models #18670

Open

SDPA decode perf improvements for qwen-3.5-35B-A3B #18759

Merged

This was referenced Apr 21, 2026

Add GPU-side Gumbel-max sampling for CUDA graph compatibility #18844

Open

Add Gemma 4 E2B/E4B support (text-only) #18695

Open

Arm backend: fix(arm): validate partitions for dependency cycles after Q/DQ de-tagging #18191

Open

psiddh mentioned this pull request Apr 23, 2026

Android ET APIs : Path to Stable (From Experimental) #18950

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for quantized LeakyReLU#1

Add support for quantized LeakyReLU#1
digantdesai wants to merge 1 commit intopytorch:mainfrom
digantdesai:export-D47043207

digantdesai commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

digantdesai commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants