Skip to content

Dev1/winskuo/gh /aihub remove code#2

Closed
winskuo-quic wants to merge 21 commits intomainfrom
dev1/winskuo/gh_/aihub_remove_code
Closed

Dev1/winskuo/gh /aihub remove code#2
winskuo-quic wants to merge 21 commits intomainfrom
dev1/winskuo/gh_/aihub_remove_code

Conversation

@winskuo-quic
Copy link
Copy Markdown
Collaborator

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

JacobSzwejbka and others added 21 commits April 24, 2026 15:25
Summary:

Attempt 3 to check numel and nbytes overflow. This time we defer
checking dynamic sized inputs until their size is realized.

Reviewed By: lucylq

Differential Revision: D98148157
…inear (pytorch#19117)

## Summary

The bilinear grid_sampler_2d portable kernel computes interpolation
weights via subtractions like `(ix_se - ix)` where both operands are
close integer-valued coordinates in pixel space. In fp16 (10 bits of
mantissa) that's classic catastrophic cancellation — the result has only
a handful of significant bits. The downstream weighted-sum accumulation
then loses further precision.

Measured on a unit test exercising interior grid points with fp16
inputs, the kernel drifts by ~0.1 absolute from an fp32 reference.
That's visible as incorrect depth / flow output near non-integer sample
points, which is most of them.

## Fix

An `AccType<CTYPE>` trait mapping `Half` and `BFloat16` to `float`,
leaving every other dtype unchanged. Used for intermediate coordinate,
weight computation, and `out_val` accumulation. Loads cast `CTYPE ->
ACC`; the final store casts `ACC -> CTYPE` once. Only internal math is
promoted; memory layout / public API / tensor dtypes are unchanged.

```cpp
template <typename CTYPE>
using AccType = std::conditional_t<
    std::is_same_v<CTYPE, executorch::aten::Half> ||
        std::is_same_v<CTYPE, executorch::aten::BFloat16>,
    float,
    CTYPE>;
```

## Effects

- **fp32 / Int / any non-half dtype**: `AccType<T>` is `T`, so the
generated code is byte-identical. No behavior change.
- **Half / BFloat16**: `max_abs` vs an fp32 reference drops from **~0.1
to 0** on the shapes I tested (N=1..2, C=7..64, H/W up to 96, both
`align_corners` values).
- **Perf**: a handful of fp16↔fp32 conversions per output element. Not
measurable at op level; well within the portable kernel's scalar cost
envelope.

## Scope

Only touches the bilinear interpolation path. The nearest-mode path
doesn't do weighted-sum accumulation and doesn't have the cancellation
issue — left alone in this change.

## Test plan

- [x] Builds clean for Android arm64 and host (Apple Clang 21).
- [x] Verified numerically via a standalone harness that runs the kernel
with matched fp32 / fp16 inputs and compares against an
fp32-then-downcast reference. All shapes pass within a single fp16 ULP
(or are bit-exact). fp32 tests remain bit-identical to the pre-change
kernel.
- [x] Existing `kernels/test/op_grid_sampler_2d_test.cpp` unit tests
continue to pass (both fp32 shapes that were previously tested, and the
fp16 path I'm specifically fixing).

Happy to add an fp16-specific test case to `op_grid_sampler_2d_test.cpp`
if useful for CI coverage here — just let me know the preferred
approach.

cc @larryliu0820 @manuelcandales
Differential Revision: D101728720

Pull Request resolved: pytorch#19052
### Summary
Fixes pytorch#18924
Extends `aten.bitwise_not` support in the MLX delegate to handle integer
tensors, not just boolean tensors.
Previously the handler only dispatched to `LogicalNotNode` for `bool`
and raised `NotImplementedError` for all other dtypes. This adds a
dedicated `BitwiseInvertNode` backed by `mlx::core::bitwise_invert`, and
updates the handler to dispatch based on dtype:
 - `bool` → `LogicalNotNode` (unchanged)
 - `int32`, `int64` → `BitwiseInvertNode`
#### Changes:
- `serialization/schema.fbs`: add `BitwiseInvertNode` table and append
to `OpNode` union
- `runtime/MLXInterpreter.h`: add `exec_bitwise_invert()` and dispatch
case
- `ops.py`: update `_bitwise_not_handler` to dispatch to
`BitwiseInvertNode` for integers
- `test/test_ops.py`: add `bitwise_not_int` test for `int32` and `int64`
### Test plan
All tests were ran on a machine with an Apple M1 Pro CPU, macOS 26.4.1.
- `python3 -m py_compile backends/mlx/ops.py
backends/mlx/test/test_ops.py`
- `python3 backends/mlx/serialization/generate.py`
- `python3 -m executorch.backends.mlx.test.run_all_tests
bitwise_not_int`
### Test output
```
============================================================
TEST SUMMARY
============================================================
Passed: 6
Failed: 0
============================================================
```
cc @metascroy

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Scott Roy <161522778+metascroy@users.noreply.github.com>
Differential Revision: D102070375

Pull Request resolved: pytorch#19074
Add Closeable interface so Module can be used with try-with-resources.
close() delegates to destroy(). Also make destroy() idempotent by
checking mHybridData.isValid() before calling resetNative(), satisfying
the Closeable contract.

This commit was authored with the help of Claude.
TrainingModule: implement Closeable, replace Log.e + silent empty
returns with IllegalStateException throws. Add checkNotDestroyed() guard
on all public methods.

SGD: throw IllegalStateException instead of bare RuntimeException when
optimizer is destroyed.

AsrModule: throw ExecutorchRuntimeException instead of bare
RuntimeException on transcription failure.

ExecuTorchRuntime.validateFilePath: throw IllegalArgumentException
instead of bare RuntimeException, with descriptive message.

JNI constructors: wrap ExecuTorchJni and ExecuTorchLlmJni constructor
bodies in try-catch so C++ exceptions become ExecutorchRuntimeException
instead of generic RuntimeException.

This commit was authored with the help of Claude.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Differential Revision: D102425537

Pull Request resolved: pytorch#19128
…_nhwc operator + tests

Differential Revision: D96507563

Pull Request resolved: pytorch#18479
…ch#19028 (pytorch#19133)

## Summary

Reverts the following Android PRs:
- pytorch#19099 — Android: consistent error types across all modules
- pytorch#19124 — Android: Module implements Closeable
- pytorch#19092 — Android: improve error diagnostics for LlmModule and
exceptions
- pytorch#19028 — Ignored Module tests: provide required input tensor

Authored with Claude.
Differential Revision: D102488314

Pull Request resolved: pytorch#19134
Differential Revision: D102385104

Pull Request resolved: pytorch#19122
Differential Revision: D102493794

Pull Request resolved: pytorch#19136
@winskuo-quic winskuo-quic changed the base branch from dev1/winskuo/gh_aihub_remove_readme to main April 27, 2026 02:47
@winskuo-quic winskuo-quic deleted the dev1/winskuo/gh_/aihub_remove_code branch April 27, 2026 02:47
@winskuo-quic winskuo-quic restored the dev1/winskuo/gh_/aihub_remove_code branch April 27, 2026 02:48
@winskuo-quic winskuo-quic deleted the dev1/winskuo/gh_/aihub_remove_code branch April 27, 2026 02:49
@winskuo-quic winskuo-quic restored the dev1/winskuo/gh_/aihub_remove_code branch April 27, 2026 02:49
@winskuo-quic winskuo-quic deleted the dev1/winskuo/gh_/aihub_remove_code branch April 27, 2026 02:50
@winskuo-quic winskuo-quic restored the dev1/winskuo/gh_/aihub_remove_code branch April 27, 2026 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants