Skip to content

Add support for quantized LeakyReLU#1

Closed
digantdesai wants to merge 1 commit intopytorch:mainfrom
digantdesai:export-D47043207
Closed

Add support for quantized LeakyReLU#1
digantdesai wants to merge 1 commit intopytorch:mainfrom
digantdesai:export-D47043207

Conversation

@digantdesai
Copy link
Copy Markdown
Contributor

Summary: Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

Summary: Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

fbshipit-source-id: 509bd4c02eb7ff5d3d47762522debd827bee7240
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jun 28, 2023
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D47043207

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D47043207

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D47043207

digantdesai added a commit to digantdesai/pytorch that referenced this pull request Jun 30, 2023
Summary:
Pull Request resolved: pytorch#104309

X-link: pytorch/executorch#1

Also adds support for backend_config

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Reviewed By: mcr229

Differential Revision: D47043207

fbshipit-source-id: 3e2f7b614713ae5c3fba6ea3056376f15826de17
facebook-github-bot pushed a commit that referenced this pull request Jun 30, 2023
Summary:
X-link: pytorch/pytorch#104309

Pull Request resolved: #1

Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

fbshipit-source-id: 51abd266bba7441c28578f6c58686a3d021d9d2a
junpi3 pushed a commit that referenced this pull request Mar 11, 2024
Before: Each node contains a `UniformParamsBuffer`.
After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`.

In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since
1. some are tensor-specific (e.g. image extents) and
2. others are operator-specific (e.g. alpha for binary ops).

Hence, we need **`std::vector`**.

We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**.

Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/)

[ghstack-poisoned]
junpi3 pushed a commit that referenced this pull request Mar 11, 2024
Before: Each node contains a `UniformParamsBuffer`.
After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`.

In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since
1. some are tensor-specific (e.g. image extents) and
2. others are operator-specific (e.g. alpha for binary ops).

Hence, we need **`std::vector`**.

We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**.

Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/)

ghstack-source-id: 218195447
Pull Request resolved: #2348
facebook-github-bot pushed a commit that referenced this pull request Mar 11, 2024
Summary:
bypass-github-export-checks

Pull Request resolved: #2348

Before: Each node contains a `UniformParamsBuffer`.
After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`.

In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since
1. some are tensor-specific (e.g. image extents) and
2. others are operator-specific (e.g. alpha for binary ops).

Hence, we need **`std::vector`**.

We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**.
ghstack-source-id: 218195447
exported-using-ghexport

Reviewed By: SS-JIA

Differential Revision: D54691831

fbshipit-source-id: 84ab9f777e247fd56234290ed7f7343b9701c73f
junpi3 pushed a commit that referenced this pull request Mar 13, 2024
In #2271, we already added
- IntList
- DoubleList
- BoolList
- ValueList

to the schema and the runtime's Value class. Their serialization was incomplete missing two components:
1. Receiving a list in `torch.fx.Node.args`.
2. Receiving a non-tensor in `torch.fx.Node`.

This change completes #1.


Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice.

If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166

Differential Revision: [D54708353](https://our.internmc.facebook.com/intern/diff/D54708353/)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Mar 13, 2024
Summary:
bypass-github-export-checks

Pull Request resolved: #2404

In #2271, we already added
- IntList
- DoubleList
- BoolList
- ValueList

to the schema and the runtime's Value class. Their serialization was incomplete missing two components:
1. Receiving a list in `torch.fx.Node.args`.
2. Receiving a non-tensor in `torch.fx.Node`.

This change completes #1.

Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice.

If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166
ghstack-source-id: 218539049
exported-using-ghexport

Reviewed By: SS-JIA

Differential Revision: D54708353

fbshipit-source-id: 8641647b515e201ea63db67115c01c1532ad6566
8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024
Reviewed By: itamaro

Differential Revision: D51566750
8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024
Summary: Pull Request resolved: pytorch#3763

Reviewed By: itamaro

Differential Revision: D51566750
facebook-github-bot pushed a commit that referenced this pull request May 29, 2024
Summary: Pull Request resolved: #3763

Reviewed By: itamaro, tarun292

Differential Revision: D51566750

fbshipit-source-id: 654c426d479833867e93083e9b55786abfc24a32
haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 16, 2024
haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 17, 2024
eigen-k added a commit to eigen-k/executorch that referenced this pull request Jun 3, 2025
facebook-github-bot pushed a commit that referenced this pull request Jun 4, 2025
Summary:
index should always be smaller than weight.size(0). Adding this check in `op_embedding`.

This is to avoid wild-addr-read error:

```
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3544359==ERROR: AddressSanitizer: SEGV on unknown address 0x7fce2364bc00 (pc 0x000002d225a0 bp 0x7ffffc792a40 sp 0x7ffffc792990 T0)
==3544359==The signal is caused by a READ memory access.
SCARINESS: 20 (wild-addr-read)
    #0 0x2d225a0 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175
    #1 0x2d22367 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #2 0x2d2223d in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #3 0x2d21d37 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #4 0x2d21bca in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #5 0x2d20f8f in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #6 0x2d20e13 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #7 0x2d20d06 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #8 0x2d226b7 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::KernelRuntimeContext&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:329
    #9 0x2d09bef in torch::executor::function::(anonymous namespace)::$_7::operator()(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) const buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:322
    #10 0x2d09a70 in torch::executor::function::(anonymous namespace)::$_7::__invoke(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:297
    #11 0x27d769b in executorch::runtime::Method::execute_instruction() xplat/executorch/runtime/executor/method.cpp:1306
    #12 0x27d8c55 in executorch::runtime::Method::execute() xplat/executorch/runtime/executor/method.cpp:1550
    #13 0x27b1e25 in executorch::extension::Module::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.cpp:261
    #14 0x27afe43 in executorch::extension::Module::forward(std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.h:340
    #15 0x27e0519 in executorch::extension::llm::LlmBackboneRunner::run(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llm_backbone_runner.cpp:58
    #16 0x27a35c9 in executorch::extension::llm::Llama4Runner::prefill_tokens(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llama4_runner.cpp:133
    #17 0x885774 in main (/data/users/larryliu/fbsource/buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/cria/benchmark/llama4/__generation_main__/generation_main+0x885774)
    #18 0x7fce2122c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #19 0x7fce2122c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    #20 0x884c20 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

AddressSanitizer can not provide additional info.
AddressSanitizer: SEGV xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&)
==3544359==ABORTING
```

Differential Revision: D75982682
eigen-k added a commit to eigen-k/executorch that referenced this pull request Jun 4, 2025
Summary: Pull Request resolved: pytorch#11344

Reviewed By: hsharma35

Differential Revision: D75911655
facebook-github-bot pushed a commit that referenced this pull request Jun 4, 2025
Summary:
index should always be smaller than weight.size(0). Adding this check in `op_embedding`.

This is to avoid wild-addr-read error:

```
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3544359==ERROR: AddressSanitizer: SEGV on unknown address 0x7fce2364bc00 (pc 0x000002d225a0 bp 0x7ffffc792a40 sp 0x7ffffc792990 T0)
==3544359==The signal is caused by a READ memory access.
SCARINESS: 20 (wild-addr-read)
    #0 0x2d225a0 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175
    #1 0x2d22367 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #2 0x2d2223d in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #3 0x2d21d37 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #4 0x2d21bca in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const::'lambda'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #5 0x2d20f8f in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const::'lambda0'()::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #6 0x2d20e13 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&)::$_0::operator()() const xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #7 0x2d20d06 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:303
    #8 0x2d226b7 in torch::executor::native::quantized_embedding_byte_dtype_out(executorch::runtime::KernelRuntimeContext&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, long, long, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::ScalarType>, executorch::runtime::etensor::Tensor&) xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:329
    #9 0x2d09bef in torch::executor::function::(anonymous namespace)::$_7::operator()(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) const buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:322
    #10 0x2d09a70 in torch::executor::function::(anonymous namespace)::$_7::__invoke(executorch::runtime::KernelRuntimeContext&, executorch::runtime::EValue**) buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/executorch/kernels/quantized/__generated_lib_combined__/out/RegisterCodegenUnboxedKernelsEverything.cpp:297
    #11 0x27d769b in executorch::runtime::Method::execute_instruction() xplat/executorch/runtime/executor/method.cpp:1306
    #12 0x27d8c55 in executorch::runtime::Method::execute() xplat/executorch/runtime/executor/method.cpp:1550
    #13 0x27b1e25 in executorch::extension::Module::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.cpp:261
    #14 0x27afe43 in executorch::extension::Module::forward(std::vector<executorch::runtime::EValue, std::allocator<executorch::runtime::EValue>> const&) xplat/executorch/extension/module/module.h:340
    #15 0x27e0519 in executorch::extension::llm::LlmBackboneRunner::run(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llm_backbone_runner.cpp:58
    #16 0x27a35c9 in executorch::extension::llm::Llama4Runner::prefill_tokens(std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&, std::shared_ptr<executorch::runtime::etensor::Tensor> const&) xplat/executorch/examples/models/fb/llama4/runner/llama4_runner.cpp:133
    #17 0x885774 in main (/data/users/larryliu/fbsource/buck-out/v2/gen/fbsource/ff19a7e6cb17a7b1/xplat/cria/benchmark/llama4/__generation_main__/generation_main+0x885774)
    #18 0x7fce2122c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #19 0x7fce2122c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    #20 0x884c20 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

AddressSanitizer can not provide additional info.
AddressSanitizer: SEGV xplat/executorch/kernels/quantized/cpu/op_embedding.cpp:175 in void torch::executor::native::(anonymous namespace)::embedding_byte_per_channel<signed char, c10::Half, float>(executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor const&, std::optional<executorch::runtime::etensor::Tensor> const&, executorch::runtime::etensor::Tensor const&, executorch::runtime::etensor::Tensor&)
==3544359==ABORTING
```


Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Reviewed By: Gasoonjia

Differential Revision: D75982682

Pulled By: larryliu0820
facebook-github-bot pushed a commit that referenced this pull request Jun 6, 2025
Differential Revision: D75911655

Pull Request resolved: #11344
larryliu0820 added a commit that referenced this pull request Jul 2, 2025
metascroy added a commit that referenced this pull request Aug 1, 2025
BNNS copy crashes the process when the dtypes differ
(#11714).

With the example in this PR
(#11714), we crash the
process on main. Here is the stack trace from LLDB:

```
Process 19234 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`__pthread_kill:
->  0x190ac9388 <+8>:  b.lo   0x190ac93a8    ; <+40>
    0x190ac938c <+12>: pacibsp 
    0x190ac9390 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x190ac9394 <+20>: mov    x29, sp
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124
    frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564
    frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680
    frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616
    frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188
    frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72
    frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148
    frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376
    frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52
    frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340
    frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152
    frame #16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296
    frame #17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180
```


With this PR, the process succeeds.
cmt0 added a commit to cmt0/executorch that referenced this pull request Aug 15, 2025
Summary:
At runtime this format specifier is not correctly handled. The misformatted string get's passed to strlen and eventually causes an assertion.
```
#0  strlen () at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/machine/xtensa/strlen.S:59
pytorch#1  0x610bd83d in _svfprintf_r (data=<optimized out>, fp=<optimized out>, fmt0=<optimized out>, ap=...)
    at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/stdio/vfprintf.c:1380
pytorch#2  0x610ffcf4 in _vsnprintf_r (ptr=<optimized out>, size=256, fmt=0x20 <error: Cannot access memory at address 0x20>, str=<optimized out>, ap=...)
    at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/stdio/vsnprintf.c:66
pytorch#3  vsnprintf (str=0x14012c40 <irt_janus_workq_stack+4960> "Missing operator: [z", size=256, fmt=0x20 <error: Cannot access memory at address 0x20>, ap=...)
    at /home/xpgcust/tree/RJ-2024.3/tb/p4root/Xtensa/SWConfig/../Target-libs/newlib/newlib/libc/stdio/vsnprintf.c:41
pytorch#4  0x610d4ddd in executorch::runtime::internal::vlogf (level=<optimized out>, timestamp=<optimized out>, filename=0x14012c40 <irt_janus_workq_stack+4960> "Missing operator: [z",
    function=0x60fd5fd7 "resolve_operator", line=735, format=0x60fd6023 "Missing operator: [%zd] %s", args=...) at xplat/executorch/runtime/platform/log.cpp:88
pytorch#5  0x610ce2db in executorch::runtime::internal::logf (level=executorch::runtime::LogLevel::Error, timestamp=3330441403,
    filename=0x14012c40 <irt_janus_workq_stack+4960> "Missing operator: [z", function=0x14012d3e <irt_janus_workq_stack+5214> "\026\262\273\200\202", <incomplete sequence \306>,
    line=735, format=0x60fd6023 "Missing operator: [%zd] %s")
    at /execution-workspace/buck-out/v2/gen/fbsource/e7835b44f7cec64a/xplat/executorch/runtime/platform/__platform__/buck-headers/executorch/runtime/platform/log.h:140
pytorch#6  0x610d8b60 in executorch::runtime::Method::resolve_operator (this=<optimized out>, op_index=1, kernels=<optimized out>, kernel_index=<optimized out>, args=..., n_args=7)
    at xplat/executorch/runtime/executor/method.cpp:731
pytorch#7  0x60ff2d70 in executorch::runtime::Method::init (this=0x14012fa0 <irt_janus_workq_stack+5824>, s_plan=<optimized out>, external_data_map=<optimized out>)
    at xplat/executorch/runtime/executor/method.cpp:926
pytorch#8  0x610d8c33 in executorch::runtime::Method::load (s_plan=0xb21690b4, program=<optimized out>, memory_manager=0xabd400f0, event_tracer=0xad540000, external_data_map=0x0)
    at xplat/executorch/runtime/executor/method.cpp:761
pytorch#9  0x610db216 in executorch::runtime::Program::load_method (this=0xabd4000c, method_name=<optimized out>, memory_manager=0xabd400f0, event_tracer=0xad540000,
    named_data_map=<optimized out>) at xplat/executorch/runtime/executor/program.cpp:299
pytorch#10 0x60ff1a80 in MethodContainer::init (this=<optimized out>, modelBuffer=..., weightBuffer=..., methodName=<optimized out>, plannedMemoryBuffers=..., methodAllocator=...,
    tempAllocator=..., etDumpBuffer=..., debugBufferDataSink=0x0) at arvr/firmware/silicon/ml/executorch/method_container/src/MethodContainer.cpp:104
pytorch#11 0x610cdef9 in InferenceRunnerExecutorch::initializeExecutorchObjects (this=<optimized out>) at arvr/firmware/silicon/turing/tirt/inference/src/InferenceRunnerExecutorch.cpp:255
pytorch#12 0x610ce06a in InferenceRunnerExecutorch::evaluate (this=0x14013268 <irt_janus_workq_stack+6536>) at arvr/firmware/silicon/turing/tirt/inference/src/InferenceRunnerExecutorch.cpp:297
pytorch#13 0x610cd186 in execute_model (inferenceRuntimeContext=0x24227680) at arvr/firmware/silicon/turing/tirt/inference/src/InferenceRunner.cpp:60
pytorch#14 0x610ccf0c in tirt_engine_invoke (inference_request=0x24220000) at arvr/firmware/silicon/turing/tirt/engine/src/Engine.cpp:125
pytorch#15 0x610cce25 in tirt::dispatch::tirt_command_process (request=0x24220000) at arvr/firmware/silicon/turing/tirt/command_dispatch/src/tirt_dispatcher.cpp:71
pytorch#16 0x610ccae7 in irt_janus_msg_handler (sess=0x60fc5f80 <FDLADSP0::coleman_fdladsp0_cp_tirt_fdladsp0_session_views_cp_iaas_fdlamcu_to_tirt_fdladsp0>, ctx=0x0,
    header=0x140137fd <irt_janus_workq_stack+7965>, payload=0x24220000, status=<optimized out>) at arvr/firmware/silicon/turing/tirt/src/IrtIccJanus.cpp:143
--Type <RET> for more, q to quit, c to continue without paging--
pytorch#17 0x610c80ec in _janus_service_handle_message (sess=0x60fc5f80 <FDLADSP0::coleman_fdladsp0_cp_tirt_fdladsp0_session_views_cp_iaas_fdlamcu_to_tirt_fdladsp0>,
    service_info=0x140137f8 <irt_janus_workq_stack+7960>) at arvr/firmware/wearables/libs/janus/session/consumer.c:1099
pytorch#18 janus_service (sess=0x60fc5f80 <FDLADSP0::coleman_fdladsp0_cp_tirt_fdladsp0_session_views_cp_iaas_fdlamcu_to_tirt_fdladsp0>, method=JANUS_SERVICE_ONE)
    at arvr/firmware/wearables/libs/janus/session/consumer.c:2236
pytorch#19 0x610c9d9a in _work_handler (w=0x14009e08 <s_janus_workq_sessions+8>) at arvr/firmware/wearables/libs/janus/modules/janus_workq/src/workq.c:62
pytorch#20 0x61100409 in triggered_work_handler (work=0x14009e08 <s_janus_workq_sessions+8>) at third-party/zephyr/zephyr_rtos/v3.7.0/zephyr/kernel/poll.c:590
pytorch#21 0x610d139f in work_queue_main (workq_ptr=0x14000f98 <s_janus_workq+24>, p2=<optimized out>, p3=<optimized out>) at third-party/zephyr/zephyr_rtos/v3.7.0/zephyr/kernel/work.c:688
pytorch#22 0x610c1172 in z_thread_entry (entry=0x610d1344 <work_queue_main>, p1=0x14000f98 <s_janus_workq+24>, p2=0x14001038 <s_janus_workq+184>, p3=0xfffffffd)
    at third-party/zephyr/zephyr_rtos/v3.7.0/zephyr/lib/os/thread_entry.c:48
```

Reviewed By: lucylq, JacobSzwejbka

Differential Revision: D79776266
agrima1304 pushed a commit to agrima1304/executorch that referenced this pull request Aug 26, 2025
BNNS copy crashes the process when the dtypes differ
(pytorch#11714).

With the example in this PR
(pytorch#11714), we crash the
process on main. Here is the stack trace from LLDB:

```
Process 19234 stopped
* thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`__pthread_kill:
->  0x190ac9388 <+8>:  b.lo   0x190ac93a8    ; <+40>
    0x190ac938c <+12>: pacibsp 
    0x190ac9390 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x190ac9394 <+20>: mov    x29, sp
(lldb) bt
* thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8
    frame pytorch#1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296
    frame pytorch#2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124
    frame pytorch#3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892
    frame pytorch#4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64
    frame pytorch#5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame pytorch#6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564
    frame pytorch#7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680
    frame pytorch#8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616
    frame pytorch#9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188
    frame pytorch#10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72
    frame pytorch#11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148
    frame pytorch#12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376
    frame pytorch#13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52
    frame pytorch#14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340
    frame pytorch#15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152
    frame pytorch#16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296
    frame pytorch#17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180
```


With this PR, the process succeeds.
pytorch-bot Bot pushed a commit that referenced this pull request Dec 29, 2025
…rd-shader-library

[Vulkan] Fix Ninja build failure by removing wildcard dependencies
kirklandsign pushed a commit that referenced this pull request Jan 14, 2026
meta-codesync Bot pushed a commit that referenced this pull request Apr 19, 2026
Summary:
## Context

PyTorch PR pytorch/pytorch#179754 (fixing pytorch/pytorch#178042) added a dtype validation check to the `aten.embedding` meta registration in `torch/_meta_registrations.py`:

```python
torch._check(
    indices.dtype in (torch.long, torch.int32),
    lambda: (
        "Expected tensor for argument #1 'indices' to have one of the following "
        f"scalar types: Long, Int; but got {indices.dtype} instead"
    ),
)
```

This aligns the meta function with the C++ implementation (`checkScalarTypes` in `Embedding.cpp`), which already enforced integer indices. Previously, no meta registration existed for `aten.embedding`, so FakeTensor tracing during `torch.export`/`torch.compile` silently accepted float indices, and AOTAutograd's DCE could remove the dead node before the C++ check ever fired.

## Problem

`test_batched_export_with_backprop` in `test_static_attention.py` creates example token inputs using `torch.zeros()` without specifying a dtype:

```python
# Before (defaults to torch.float32)
torch.zeros(batch_size, input_len)
torch.zeros(1, input_len)
```

During `torch.export.export()`, these float32 tensors flow into `self.tok_embeddings(tokens)` (an `nn.Embedding` layer in `llama_transformer.py`), which dispatches to `aten.embedding`. The new meta function dtype check rejects float32 indices, causing the export to fail.

Note that the actual backprop loop already uses integer indices correctly via `torch.randint(config.vocab_size, (batch_size, input_len))` — only the export-tracing example inputs were wrong.

## Fix

Add explicit `dtype=torch.long` to both `torch.zeros` calls used as token example inputs:

```python
# After
torch.zeros(batch_size, input_len, dtype=torch.long)
torch.zeros(1, input_len, dtype=torch.long)
```

Differential Revision: D101547370
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants