How to link custom ops?

Hi!

I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to include/link for ATen to pick up my ops.

I would greatly appreciate any help in trying to make it work.

### Overview:

Source code for the dynamic library containing the ops consists of 3 files: `lut_kernel.h`, `lut_kernel.cpp`, `lut_kernel_pytorch.cpp`. The files contain roughly this code:

```c++
// lut_kernel.h
#pragma once

#include <executorch/runtime/kernel/kernel_includes.h>

namespace torch {
namespace executor {

namespace native {

Tensor& code2x8_lut_matmat_out(
  RuntimeContext& ctx,
  const Tensor& input,
  const Tensor& codes,
  const Tensor& codebooks,
  const Tensor& scales,
  const optional<Tensor>& bias,
  Tensor& out
);
} // namespace native
} // namespace executor
} // namespace torch
```

```c++
// lut_kernel.cpp
#include "lut_kernel.h"

#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>

namespace torch {
  namespace executor {
    namespace native {
      Tensor& code2x8_lut_matmat_out(
        RuntimeContext& ctx,
        const Tensor& input,
        const Tensor& codes,
        const Tensor& codebooks,
        const Tensor& scales,
        const optional<Tensor>& bias,
        Tensor& out
      ) {
        // CALCULATIONS
        return out;
      }
    } // namespace native
  } // namespace executor
} // namespace torch

EXECUTORCH_LIBRARY(aqlm, "code2x8_lut_matmat.out", torch::executor::native::code2x8_lut_matmat_out);
```

```c++
// lut_kernel_pytorch.cpp
#include "lut_kernel.h"

#include <executorch/extension/aten_util/make_aten_functor_from_et_functor.h>
#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>

#include <torch/library.h>

namespace torch {
    namespace executor {
        namespace native {
            Tensor& code2x8_lut_matmat_out_no_context(
                ...
                Tensor& output
            ) {
                void* memory_pool = malloc(10000000 * sizeof(uint8_t));
                MemoryAllocator allocator(10000000, (uint8_t*)memory_pool);

                exec_aten::RuntimeContext context{nullptr, &allocator};
                return torch::executor::native::code2x8_lut_matmat_out(
                    context,
                    ...,
                    output
                );
            }

            at::Tensor code2x8_lut_matmat(
                ...
            ) {
                auto sizes = input.sizes().vec();
                sizes[sizes.size() - 1] = codes.size(1) * codebooks.size(2);
                auto out = at::empty(sizes,
                    at::TensorOptions()
                    .dtype(input.dtype())
                    .device(input.device())
                );

                WRAP_TO_ATEN(code2x8_lut_matmat_out_no_context, 5)(
                    ...,
                    out
                );
                return out;
            }
        } // namespace native
    } // namespace executor
} // namespace torch

TORCH_LIBRARY(aqlm, m) {
  m.def(
      "code2x8_lut_matmat(Tensor input, Tensor codes, "
      "Tensor codebooks, Tensor scales, Tensor? bias=None) -> Tensor"
  );
  m.def(
      "code2x8_lut_matmat.out(Tensor input, Tensor codes, "
      "Tensor codebooks, Tensor scales, Tensor? bias=None, *, Tensor(c!) out) -> Tensor(c!)"
  );
}

TORCH_LIBRARY_IMPL(aqlm, CompositeExplicitAutograd, m) {
  m.impl(
      "code2x8_lut_matmat", torch::executor::native::code2x8_lut_matmat
  );
  m.impl(
      "code2x8_lut_matmat.out",
      WRAP_TO_ATEN(torch::executor::native::code2x8_lut_matmat_out_no_context, 5)
    );
}
```

, which closely follows the executorch custom sdpa code.

I build it as two standalone dynamic libs: one `lut_kernel.cpp` with dependency only on `executorch` and `lut_kernel_pytorch.cpp` with additional `torch` dependency. I load the latter lib into pytorch as `torch.ops.load_library(f"../libaqlm_bindings.dylib")`.

### The problem: 

I wrote a small `nn.Module` that basically just calls the op. In pytorch it works well. `aten_dialect` for it looks like this:
```
ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_codes: "i8[3072, 128, 2]", p_codebooks: "f32[2, 256, 1, 8]", p_scales: "f32[3072, 1, 1, 1]", p_bias: "f32[3072]", input: "f32[s0, s1, 1024]"):
            input_1 = input
            
            # File: [/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74](https://file+.vscode-resource.vscode-cdn.net/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74) in forward, code: return torch.ops.aqlm.code2x8_lut_matmat(
            code2x8_lut_matmat: "f32[s0, s1, 1024]" = torch.ops.aqlm.code2x8_lut_matmat.default(input_1, p_codes, p_codebooks, p_scales, p_bias);  input_1 = p_codes = p_codebooks = p_scales = p_bias = None
            return (code2x8_lut_matmat,)
            
Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codes'), target='codes', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codebooks'), target='codebooks', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_scales'), target='scales', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_bias'), target='bias', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='input'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='code2x8_lut_matmat'), target=None)])
Range constraints: {s0: VR[1, 9223372036854775806], s1: VR[1, 9223372036854775806]}
```

But when calling `to_edge` I get an error saying that `Operator torch._ops.aqlm.code2x8_lut_matmat.default is not Aten Canonical`. 

I don't conceptually understand how the `EXECUTORCH_LIBRARY` macro from `lut_kernel.cpp` supposed to make it Aten Canonical. Should I somehow recompile executorch to include my kernel?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to link custom ops? #4510

Overview:

The problem:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to link custom ops? #4510

Description

Overview:

The problem:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions