[ROCm/DCU] Remove BF16 workarounds: is_bfloat16_available, delete_pass, _keep_in_fp32_modules by oldzhu · Pull Request #5112 · PaddlePaddle/PaddleX

oldzhu · 2026-04-23T03:19:25Z

Summary

Remove ROCm BF16 workarounds in PaddleX now that the root causes are fixed in upstream Paddle (PaddlePaddle/Paddle#78760).

Closes #5111

Changes

1. `paddlex/inference/utils/misc.py`

Add 'dcu' to the device allowlist in is_bfloat16_available(). DCU is the device_type string for ROCm/HIP hardware in PaddleX.

2. `paddlex/inference/models/common/static_infer.py`

Remove four duplicated if paddle.is_compiled_with_rocm(): config.delete_pass('conv2d_add_act_fuse_pass'/'conv2d_add_fuse_pass') blocks. Root cause fixed in Paddle: PADDLE_WITH_HIP guard added to InitializePatterns() in both passes (PaddlePaddle/Paddle#78760).

3. `paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.py`

Two changes:

Remove _keep_in_fp32_modules = ['visual', 'mlp_AR']. MIOpen BF16 convolution is validated correct on gfx1100/ROCm 7.2 (SNR 44 dB vs FP32 reference, 8/8 tests PASS).
Add temporary LayerNorm.forward BF16 compatibility shim for ROCm. Paddle HIP wheel (<=3.4.0.dev20260408) does not register phi::bfloat16 for layer_norm. The shim casts BF16→FP32→BF16 around LayerNorm. Remove this shim after Paddle PR #78760 merges and a new wheel ships.

4. `paddlex/inference/models/common/transformers/utils.py`

Add 'dcu' → 'gpu' device mapping in device_guard(). paddle.set_device('dcu:N') is not accepted; must use 'gpu:N' on ROCm hardware.

Validation

Tested on AMD Radeon RX 7900 GRE (gfx1100) + ROCm 7.2.0 + Python 3.12:

Check	Result
`is_bfloat16_available('dcu:0')`	✅ True
`_keep_in_fp32_modules`	✅ None (removed)
BF16 conv2d SNR vs FP32	✅ 44 dB
device_guard('dcu', 0)	✅ No error
PaddleOCR-VL-1.5 BF16 pipeline	✅ PASS — 202.8s, EXIT:0
OCR output	✅ Correct

Evidence log: https://github.com/oldzhu/paddle-amd/blob/main/evidence/bf16_pipeline_validation_gfx1100.log

## Background Three workarounds existed in PaddleX to work around Paddle HIP/ROCm BF16 limitations. These are now fixed in upstream Paddle (see PaddlePaddle/Paddle PR: fix: enable BF16 support for layer_norm and conv2d fuse passes on HIP). ## Changes ### 1. paddlex/inference/utils/misc.py Add 'dcu' to the device allowlist in is_bfloat16_available(). DCU is the device_type for ROCm/HIP hardware in PaddleX; BF16 is supported on gfx1100+. ### 2. paddlex/inference/models/common/static_infer.py Remove four scattered 'if paddle.is_compiled_with_rocm(): delete_pass()' blocks that existed because fused_conv2d_add_act had no HIP kernel. The root cause is fixed in Paddle (PADDLE_WITH_HIP guard in InitializePatterns()). ### 3. paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.py a) Remove _keep_in_fp32_modules = ['visual', 'mlp_AR']. MIOpen BF16 convolution was validated correct on gfx1100/ROCm 7.2 (SNR 44 dB vs FP32). The visual encoder (SigLIP) runs correctly in BF16. b) Add temporary LayerNorm BF16 compatibility shim for ROCm. Paddle HIP wheel versions <= 3.4.0.dev20260408 do not register phi::bfloat16 in the layer_norm HIP kernel. The shim casts BF16->FP32->BF16 around LayerNorm. Remove this shim once the upstream Paddle PR is merged and a new wheel ships. ### 4. paddlex/inference/models/common/transformers/utils.py Add 'dcu' -> 'gpu' device mapping in device_guard(). paddle.set_device() does not accept 'dcu:N'; it must be 'gpu:N' on ROCm hardware. ## Validation Tested on AMD Radeon RX 7900 GRE (gfx1100) + ROCm 7.2.0 + Python 3.12: - paddle.is_compiled_with_rocm() = True - is_bfloat16_available('dcu:0') = True - BF16 conv2d SNR = 44 dB (8/8 tests PASS) - PaddleOCR-VL-1.5 full BF16 pipeline: load 14.6s, inference 202.8s, EXIT:0 - OCR output correct (5 layout blocks detected, text content verified)

paddle-bot · 2026-04-23T03:19:31Z

Thanks for your contribution!

CLAassistant · 2026-04-23T03:19:31Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot Bot added the contributor External developers label Apr 23, 2026

oldzhu mentioned this pull request Apr 23, 2026

[HIP/ROCm] Enable BF16: register layer_norm bfloat16 kernel + guard conv2d fuse passes ROCm/Paddle#49

Open

luotao1 added the PaddlePaddle Hackathon label Apr 27, 2026

luotao1 self-assigned this Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm/DCU] Remove BF16 workarounds: is_bfloat16_available, delete_pass, _keep_in_fp32_modules#5112

[ROCm/DCU] Remove BF16 workarounds: is_bfloat16_available, delete_pass, _keep_in_fp32_modules#5112
oldzhu wants to merge 1 commit into
PaddlePaddle:developfrom
oldzhu:hip-bf16-remove-rocm-workarounds

oldzhu commented Apr 23, 2026

Uh oh!

paddle-bot Bot commented Apr 23, 2026

Uh oh!

CLAassistant commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oldzhu commented Apr 23, 2026

Summary

Changes

1. paddlex/inference/utils/misc.py

2. paddlex/inference/models/common/static_infer.py

3. paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.py

4. paddlex/inference/models/common/transformers/utils.py

Validation

Related

Uh oh!

paddle-bot Bot commented Apr 23, 2026

Uh oh!

CLAassistant commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `paddlex/inference/utils/misc.py`

2. `paddlex/inference/models/common/static_infer.py`

3. `paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.py`

4. `paddlex/inference/models/common/transformers/utils.py`