You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Vulkan backend produces all-zero outputs on a PowerVR D-Series GPU (Google Pixel 10 Pro). The same model files work correctly on macOS via MoltenVK and on Android via XNNPACK.
ExecuTorch: Built from source, branch fix/vulkan-texture-ubo-budget (my UBO budget fix from PR Fix Vulkan texture tensor UBO budget overflow #17294, rebased on upstream/main as of Feb 7 2026, commit ba2516cefa)
2100 = 8400 anchors / 4 (texel packing). The detection head outputs also exceed the limit. This is only checked in #ifdef VULKAN_DEBUG builds (Tensor.cpp:632-660), so release builds silently hit undefined behavior.
I exported with texture_limits: (2048, 2048, 2048) in VulkanPartitioner, but this only controls which ops get delegated — it doesn't account for texel packing that turns dimension 8400 into texture extent 2100.
3. Even in-limit tensors are zero
Output tensors 0-2 have extents within the 2048 limit (e.g., (80,80,16), (40,40,32), (20,20,64)), but they are also all zeros. This suggests either intermediate tensors also exceed limits, or one bad texture corrupts the entire command buffer state on PowerVR.
4. Execution mechanics work fine
332 nodes encode and submit, fence waits successfully
Input data is valid (verified non-zero values in staging buffer)
GPU "completes" work but staging buffers read back all zeros
5. Single command buffer didn't help
I tried forcing all dispatches into one command buffer (setting execute_threshold_node_count to UINT32_MAX). Same result — all zeros.
Related
PR Fix Vulkan texture tensor UBO budget overflow #17294 — my fix for a separate UBO budget crash (uniform data allocation has exceeded tensor uniform buffer size). That fix prevents a crash but does not affect the zero-output issue.
cases.py:1464-1470 — there's an existing TODO noting Android arm64 failures where "writes from the first or second shader dispatch being 'ignored'" which matches my symptoms exactly.
Questions
Is PowerVR expected to work at all with the Vulkan backend? Or is it currently untested/unsupported?
Could the texture_limits partitioner option be made aware of texel packing so it avoids delegating ops whose packed extents exceed maxImageDimension3D?
Should the texture extent check in Tensor.cpp:632-660 be enabled in release builds (not just debug)?
Any other suggestions for debugging this? Happy to add more traces or test patches.
Summary
The Vulkan backend produces all-zero outputs on a PowerVR D-Series GPU (Google Pixel 10 Pro). The same model files work correctly on macOS via MoltenVK and on Android via XNNPACK.
Environment
maxImageDimension3D = 2048)fix/vulkan-texture-ubo-budget(my UBO budget fix from PR Fix Vulkan texture tensor UBO budget overflow #17294, rebased on upstream/main as of Feb 7 2026, commitba2516cefa)What I Observe
[1, 84, 8400]is correct, but all confidence values are exactly0.0texture_limits: (2048, 2048, 2048)andstorage_type_override: BUFFERproduce the same zero resultsWhat I Found by Adding Tracing
I added
__android_log_printtraces toVulkanBackend.cpp,ComputeGraph.cpp, andStagingBuffer.cppto narrow things down. Key findings:1. GPU is PowerVR — no support in ExecuTorch
ExecuTorch's Vulkan backend only handles Adreno, Mali, NVIDIA, and SwiftShader. There is zero PowerVR-specific handling.
2. Output texture extents exceed
maxImageDimension3D2100 = 8400 anchors / 4 (texel packing). The detection head outputs also exceed the limit. This is only checked in
#ifdef VULKAN_DEBUGbuilds (Tensor.cpp:632-660), so release builds silently hit undefined behavior.I exported with
texture_limits: (2048, 2048, 2048)in VulkanPartitioner, but this only controls which ops get delegated — it doesn't account for texel packing that turns dimension 8400 into texture extent 2100.3. Even in-limit tensors are zero
Output tensors 0-2 have extents within the 2048 limit (e.g.,
(80,80,16),(40,40,32),(20,20,64)), but they are also all zeros. This suggests either intermediate tensors also exceed limits, or one bad texture corrupts the entire command buffer state on PowerVR.4. Execution mechanics work fine
HOST_VISIBLE | HOST_COHERENT | DEVICE_LOCAL)5. Single command buffer didn't help
I tried forcing all dispatches into one command buffer (setting
execute_threshold_node_counttoUINT32_MAX). Same result — all zeros.Related
uniform data allocation has exceeded tensor uniform buffer size). That fix prevents a crash but does not affect the zero-output issue.cases.py:1464-1470— there's an existing TODO noting Android arm64 failures where "writes from the first or second shader dispatch being 'ignored'" which matches my symptoms exactly.Questions
texture_limitspartitioner option be made aware of texel packing so it avoids delegating ops whose packed extents exceedmaxImageDimension3D?Tensor.cpp:632-660be enabled in release builds (not just debug)?cc @SS-JIA @manuelcandales @digantdesai @cbilgin