[ROCm/DCU] Remove BF16 workarounds: is_bfloat16_available, delete_pass, _keep_in_fp32_modules#5112
Open
oldzhu wants to merge 1 commit into
Open
Conversation
## Background
Three workarounds existed in PaddleX to work around Paddle HIP/ROCm BF16
limitations. These are now fixed in upstream Paddle (see PaddlePaddle/Paddle
PR: fix: enable BF16 support for layer_norm and conv2d fuse passes on HIP).
## Changes
### 1. paddlex/inference/utils/misc.py
Add 'dcu' to the device allowlist in is_bfloat16_available(). DCU is the
device_type for ROCm/HIP hardware in PaddleX; BF16 is supported on gfx1100+.
### 2. paddlex/inference/models/common/static_infer.py
Remove four scattered 'if paddle.is_compiled_with_rocm(): delete_pass()'
blocks that existed because fused_conv2d_add_act had no HIP kernel. The root
cause is fixed in Paddle (PADDLE_WITH_HIP guard in InitializePatterns()).
### 3. paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.py
a) Remove _keep_in_fp32_modules = ['visual', 'mlp_AR']. MIOpen BF16
convolution was validated correct on gfx1100/ROCm 7.2 (SNR 44 dB vs FP32).
The visual encoder (SigLIP) runs correctly in BF16.
b) Add temporary LayerNorm BF16 compatibility shim for ROCm. Paddle HIP wheel
versions <= 3.4.0.dev20260408 do not register phi::bfloat16 in the
layer_norm HIP kernel. The shim casts BF16->FP32->BF16 around LayerNorm.
Remove this shim once the upstream Paddle PR is merged and a new wheel ships.
### 4. paddlex/inference/models/common/transformers/utils.py
Add 'dcu' -> 'gpu' device mapping in device_guard(). paddle.set_device()
does not accept 'dcu:N'; it must be 'gpu:N' on ROCm hardware.
## Validation
Tested on AMD Radeon RX 7900 GRE (gfx1100) + ROCm 7.2.0 + Python 3.12:
- paddle.is_compiled_with_rocm() = True
- is_bfloat16_available('dcu:0') = True
- BF16 conv2d SNR = 44 dB (8/8 tests PASS)
- PaddleOCR-VL-1.5 full BF16 pipeline: load 14.6s, inference 202.8s, EXIT:0
- OCR output correct (5 layout blocks detected, text content verified)
|
Thanks for your contribution! |
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Remove ROCm BF16 workarounds in PaddleX now that the root causes are fixed in upstream Paddle (PaddlePaddle/Paddle#78760).
Closes #5111
Changes
1.
paddlex/inference/utils/misc.pyAdd
'dcu'to the device allowlist inis_bfloat16_available(). DCU is thedevice_typestring for ROCm/HIP hardware in PaddleX.2.
paddlex/inference/models/common/static_infer.pyRemove four duplicated
if paddle.is_compiled_with_rocm(): config.delete_pass('conv2d_add_act_fuse_pass'/'conv2d_add_fuse_pass')blocks. Root cause fixed in Paddle:PADDLE_WITH_HIPguard added toInitializePatterns()in both passes (PaddlePaddle/Paddle#78760).3.
paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.pyTwo changes:
_keep_in_fp32_modules = ['visual', 'mlp_AR']. MIOpen BF16 convolution is validated correct on gfx1100/ROCm 7.2 (SNR 44 dB vs FP32 reference, 8/8 tests PASS).LayerNorm.forwardBF16 compatibility shim for ROCm. Paddle HIP wheel (<=3.4.0.dev20260408) does not registerphi::bfloat16for layer_norm. The shim casts BF16→FP32→BF16 around LayerNorm. Remove this shim after Paddle PR #78760 merges and a new wheel ships.4.
paddlex/inference/models/common/transformers/utils.pyAdd
'dcu' → 'gpu'device mapping indevice_guard().paddle.set_device('dcu:N')is not accepted; must use'gpu:N'on ROCm hardware.Validation
Tested on AMD Radeon RX 7900 GRE (gfx1100) + ROCm 7.2.0 + Python 3.12:
is_bfloat16_available('dcu:0')_keep_in_fp32_modulesEvidence log: https://github.com/oldzhu/paddle-amd/blob/main/evidence/bf16_pipeline_validation_gfx1100.log
Related