Zephyr: Add MobileNetV2 image classification sample with Ethos-U NPU#19131
Zephyr: Add MobileNetV2 image classification sample with Ethos-U NPU#19131psiddh wants to merge 7 commits intopytorch:mainfrom
Conversation
Add a new Zephyr sample that runs a quantized INT8 MobileNetV2 model on Arm Ethos-U NPU using ExecuTorch. The sample classifies a static 224x224x3 RGB test image into 1000 ImageNet classes and prints the top-5 predictions. Validated end-to-end on Alif Ensemble E8 DevKit (Cortex-M55 + Ethos-U55 256 MAC) achieving 19ms inference with 100% NPU delegation (110 ops). This addresses part of pytorch#17654 (Zephyr: Expand samples and documentation) by adding a second sample app (MV2) beyond the existing hello-executorch. Authored with assistance from Claude.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19131
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 5 Cancelled Jobs, 2 Unrelated FailuresAs of commit 8728a45 with merge base b8f04aa ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Adds a new Zephyr sample application that runs a quantized MobileNetV2 image classification model with Arm Ethos‑U delegation via ExecuTorch, including build/test metadata and board-specific configuration.
Changes:
- Introduce
mv2-ethosuZephyr sample that loads an embedded.pteand runs inference on a static 224×224×3 RGB input, printing top‑5 classes. - Add Zephyr build configuration (Kconfig/prj.conf/CMake) to embed the model, size allocators, and selectively build portable ops when needed.
- Add board configs/overlay for Corstone-300/320 FVP to enable Ethos‑U and place SRAM appropriately for DMA access.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| zephyr/samples/mv2-ethosu/src/main.cpp | Sample runtime: load PTE, allocate arenas, set inputs, execute, print top‑5. |
| zephyr/samples/mv2-ethosu/sample.yaml | Zephyr sample metadata and build-only test entries for Corstone FVPs. |
| zephyr/samples/mv2-ethosu/prj.conf | Enables ExecuTorch/C++17 and increases stack/heap + allocator pool sizes. |
| zephyr/samples/mv2-ethosu/boards/mps4_corstone320_fvp.conf | Enables Zephyr Ethos‑U driver for Corstone-320 FVP. |
| zephyr/samples/mv2-ethosu/boards/mps3_corstone300_fvp.conf | Enables Zephyr Ethos‑U driver for Corstone-300 FVP. |
| zephyr/samples/mv2-ethosu/boards/mps3_corstone300_fvp.overlay | Routes Zephyr SRAM to ISRAM so Ethos‑U DMA can access scratch buffers. |
| zephyr/samples/mv2-ethosu/README.md | Usage docs: model export command, build commands, and expected output. |
| zephyr/samples/mv2-ethosu/Kconfig | Sample-specific Kconfig for allocator pool sizes (MV2 defaults). |
| zephyr/samples/mv2-ethosu/CMakeLists.txt | Embeds PTE into header, optional selective portable ops build, links delegate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove deprecated executorch_delegate_EthosUBackend_registered() call; linking executorch_delegate_ethos_u auto-registers the backend - Add model_pte_runtime ISRAM copy for Corstone FVP DMA accessibility - Fix ET_LOG formatting to pass linter (one arg per line) - Add explicit ScalarType::Byte case and error for unsupported types in print_top_k instead of silent uint8 fallback - Add comment explaining ISRAM copy vs direct MRAM access tradeoff
- Remove unused runner_util/inputs.h include from main.cpp - Add BSD license header to mv2_input.h for repo consistency
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use Error log level for all failure paths instead of Info - Replace ET_CHECK with graceful error log + return for get_outputs - Use Python3_EXECUTABLE instead of hardcoded python in CMake - Add RESULT_VARIABLE/ERROR_VARIABLE checks for gen_oplist.py - Fix grammar in CMake status message - Apply lint formatting suggestions
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ET_LOG( | ||
| Error, "Failed to prepare input %zu: 0x%" PRIx32, i, (uint32_t)err); | ||
| BufferCleanup cleanup({inputs, num_allocated}); | ||
| return err; | ||
| } |
There was a problem hiding this comment.
In the error path, BufferCleanup cleanup({inputs, num_allocated}); return err; will immediately run the destructor and call free() on buffers allocated from MemoryAllocator/ArmMemoryAllocator (static pool), not malloc(), which is undefined behavior. Use a cleanup strategy compatible with MemoryAllocator (e.g., no-op cleanup + allocator reset) or switch input allocations to malloc()/free() (or executorch::extension::prepare_input_tensors() which uses malloc()).
| CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE=1572864 | ||
| CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE=1572864 |
There was a problem hiding this comment.
With CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE=1572864 and CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE=1572864, the sample reserves ~3MB of static pools (plus model/runtime buffers). Corstone-300 FVP is documented as having 2MB SRAM, and this sample also applies the zephyr,sram = &isram overlay for Corstone-300, so the link is likely to overflow that SRAM region. Consider adding a Corstone-300-specific override with smaller pools / moving one pool to a different memory region, or dropping Corstone-300 from sample.yaml if it isn’t expected to build.
| CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE=1572864 | |
| CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE=1572864 | |
| # Keep the combined static ExecuTorch pools below the 2 MiB SRAM budget | |
| # used by constrained targets such as Corstone-300, leaving headroom for | |
| # stack, heap, model data, and runtime buffers. | |
| CONFIG_EXECUTORCH_METHOD_ALLOCATOR_POOL_SIZE=786432 | |
| CONFIG_EXECUTORCH_TEMP_ALLOCATOR_POOL_SIZE=786432 |
| ```bash | ||
| python -m executorch.backends.arm.scripts.aot_arm_compiler \ | ||
| --model_name=mv2_untrained \ | ||
| --quantize \ | ||
| --delegate \ | ||
| --target=ethos-u55-128 \ | ||
| --output=mv2_ethosu.pte |
There was a problem hiding this comment.
The model export command uses python -m executorch.backends.arm.scripts.aot_arm_compiler, but the existing Zephyr sample docs use the in-repo module path (python -m modules.lib.executorch.backends.arm.scripts.aot_arm_compiler ...) to avoid requiring a separate pip install. Consider aligning this README with the established Zephyr-sample invocation (or explicitly stating that a pip-installed executorch package is required for this command).
- Remove BufferCleanup UB: replace with plain Error return since buffers are arena-allocated (not malloc), so free() would be undefined behavior - Add Corstone-300 board-specific pool size overrides (768KB each) to fit within 2 MiB ISRAM budget - Fix README module path to use in-repo invocation - Apply lintrunner formatting to mv2_input.h
Add a new Zephyr sample that runs a quantized INT8 MobileNetV2 model on Arm Ethos-U NPU using ExecuTorch. The sample classifies a static 224x224x3 RGB test image into 1000 ImageNet classes and prints the top-5 predictions.
Validated end-to-end on Alif Ensemble E8 DevKit (Cortex-M55 + Ethos-U55 256 MAC) achieving 19ms inference with 100% NPU delegation (110 ops).
This addresses part of #17654 (Zephyr: Expand samples and documentation) by adding a second sample app (MV2) beyond the existing hello-executorch.