Pre-submission Checklist
Question Category
API / Development
Other Category
No response
Question
English isn't my first language, so I drafted this with AI help — but every result was measured by me on the hardware described. Full writeup +
reproducer: https://github.com/TSUMUGI-XE/b70-dual-tp2
Confound ruled out (control done). The test host normally runs pcie_acs_override=downstream,multifunction (for an unrelated VFIO passthrough),
which can make the kernel refuse peer P2P via pci_p2pdma_distance() and mimic this signature. I re-ran on a control boot with the override
removed: peer copy still returns 0x70000003, host-staged still passes — identical to baseline (control
log). The override is not the cause. On that boot the
kernel logged the reason: xe 0000:07:00.0: cannot be used for peer-to-peer DMA as the client and provider (0000:03:00.0) do not share an upstream bridge or whitelisted host bridge.
Summary
On a workstation with two Intel Arc Pro B70 (BMG), a Level-Zero direct device-to-device copy between the cards (single process, per-device L0
contexts) returns 0x70000003 (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY) instead of doing the peer copy. Clean error, not a hang. A host-staged
copy of the same buffers (dev0 → host → dev1) succeeds — so allocations/queues are fine, only the cross-device peer path fails.
This may well be expected behavior. I ruled out my own host config (control above); the kernel then refuses because the two cards are on separate
root ports with no shared PCIe switch and the host bridge isn't on pci_p2pdma_whitelist. So I'm not claiming a clear bug — I'm asking: (1) should
BMG + a consumer Intel host bridge route P2P through the root complex (whitelist gap to fix, or topology limit to document)? and (2) regardless, can
the unsupported case be reported as a queryable capability instead of OUT_OF_DEVICE_MEMORY, so frameworks stop probing-by-failure?
Environment
- GPUs: 2× Intel Arc Pro B70 (BMG), separate PCIe root ports
- Kernel: Linux 7.x, xe driver — reproduced on two builds (stock Ubuntu HWE 7.0.0-22 and a self-built 7.1-rc)
- compute-runtime / Level-Zero: 1.15.x (NEO 26.x, IGC 2.34.x)
- Single process,
ONEAPI_DEVICE_SELECTOR=level_zero:*, ZES_ENABLE_SYSMAN=1; IOMMU/VT-d on, GPUs in separate IOMMU groups
Reproducer
~80-line single file: repro/b70_p2p_copy_probe.cpp. Build/run:
source /opt/intel/oneapi/setvars.sh
icpx -fsycl -O2 b70_p2p_copy_probe.cpp -lze_loader -o b70_p2p_copy_probe
ONEAPI_DEVICE_SELECTOR=level_zero:* ZES_ENABLE_SYSMAN=1 ./b70_p2p_copy_probe
Output:
--- TEST 1: L0 direct copy dev0 -> dev1 ---
zeCommandListAppendMemoryCopy -> 0x70000003 (FAILED)
--- TEST 2: host-staged copy dev0 -> host -> dev1 ---
checksum: PASS
RESULT: l0_p2p=BROKEN(ze-error) host_staged=WORKS
Negative controls (what is NOT the cause)
- Host platform swap (different CPU/chipset, separate root ports, Gen4 x8/x8, separate IOMMU groups) → identical failure. PCIe bandwidth / root-port
layout / IOMMU grouping ruled out. Reproduces across two kernels.
pcie_acs_override removed control boot → still 0x70000003, kernel names the P2PDMA refusal (above). Override ruled out. Neither GPU exposes a
p2pdma sysfs node even with ACS off.
Related (not duplicates — different HW or failure mode)
Ask
- Is BMG + a consumer Intel host bridge eligible for
pci_p2pdma_whitelist (does the silicon route P2P through the root complex), or is
cross-root-port P2P genuinely unsupported here? Whitelist it, or document it?
- Either way, can compute-runtime report the capability (a
zeDeviceCanAccessPeer-style query) so frameworks choose host-staging deliberately
instead of probing-by-failure?
- Is
0x70000003 (OUT_OF_DEVICE_MEMORY) the intended code for "peer access refused by the kernel P2PDMA layer", or does it mask the more specific
reason the kernel already names?
Happy to run additional diagnostics or driver builds — I have the two B70 set up and can iterate.
Additional Notes
No response
Pre-submission Checklist
Question Category
API / Development
Other Category
No response
Question
Confound ruled out (control done). The test host normally runs
pcie_acs_override=downstream,multifunction(for an unrelated VFIO passthrough),which can make the kernel refuse peer P2P via
pci_p2pdma_distance()and mimic this signature. I re-ran on a control boot with the overrideremoved: peer copy still returns
0x70000003, host-staged still passes — identical to baseline (controllog). The override is not the cause. On that boot the
kernel logged the reason:
xe 0000:07:00.0: cannot be used for peer-to-peer DMA as the client and provider (0000:03:00.0) do not share an upstream bridge or whitelisted host bridge.Summary
On a workstation with two Intel Arc Pro B70 (BMG), a Level-Zero direct device-to-device copy between the cards (single process, per-device L0
contexts) returns
0x70000003(ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY) instead of doing the peer copy. Clean error, not a hang. A host-stagedcopy of the same buffers (dev0 → host → dev1) succeeds — so allocations/queues are fine, only the cross-device peer path fails.
This may well be expected behavior. I ruled out my own host config (control above); the kernel then refuses because the two cards are on separate
root ports with no shared PCIe switch and the host bridge isn't on
pci_p2pdma_whitelist. So I'm not claiming a clear bug — I'm asking: (1) shouldBMG + a consumer Intel host bridge route P2P through the root complex (whitelist gap to fix, or topology limit to document)? and (2) regardless, can
the unsupported case be reported as a queryable capability instead of
OUT_OF_DEVICE_MEMORY, so frameworks stop probing-by-failure?Environment
ONEAPI_DEVICE_SELECTOR=level_zero:*,ZES_ENABLE_SYSMAN=1; IOMMU/VT-d on, GPUs in separate IOMMU groupsReproducer
~80-line single file:
repro/b70_p2p_copy_probe.cpp. Build/run:Output:
Negative controls (what is NOT the cause)
layout / IOMMU grouping ruled out. Reproduces across two kernels.
pcie_acs_overrideremoved control boot → still0x70000003, kernel names the P2PDMA refusal (above). Override ruled out. Neither GPU exposes ap2pdmasysfs node even with ACS off.Related (not duplicates — different HW or failure mode)
OUT_OF_DEVICE_MEMORYclass on a 2-GPU Level-Zero context, but A770 (DG2) andurUSMDeviceAlloc, not a peer copy.zeInitabort, not a single-process peercopy.
Ask
pci_p2pdma_whitelist(does the silicon route P2P through the root complex), or iscross-root-port P2P genuinely unsupported here? Whitelist it, or document it?
zeDeviceCanAccessPeer-style query) so frameworks choose host-staging deliberatelyinstead of probing-by-failure?
0x70000003(OUT_OF_DEVICE_MEMORY) the intended code for "peer access refused by the kernel P2PDMA layer", or does it mask the more specificreason the kernel already names?
Happy to run additional diagnostics or driver builds — I have the two B70 set up and can iterate.
Additional Notes
No response