Skip to content

Restrict BF16 backward epilogue reorder to BF16_Optimizer#1

Merged
maxyu1115 merged 1 commit into
maxyu1115:fix/bf16-optimizer-grad-accum-boundary-leakfrom
tohtana:tohtana/update-fix-7985
Apr 29, 2026
Merged

Restrict BF16 backward epilogue reorder to BF16_Optimizer#1
maxyu1115 merged 1 commit into
maxyu1115:fix/bf16-optimizer-grad-accum-boundary-leakfrom
tohtana:tohtana/update-fix-7985

Conversation

@tohtana
Copy link
Copy Markdown

@tohtana tohtana commented Apr 28, 2026

Follow-up for deepspeedai#7985.

PR deepspeedai#7985 moved ZeROOptimizer.backward_epilogue() before gradient reduction so BF16_Optimizer includes the boundary microbatch grad in the fp32 reduction buffer. That ordering is only needed for BF16_Optimizer.

This change keeps the pre-allreduce epilogue only for BF16_Optimizer, while preserving the previous post-allreduce epilogue ordering for normal ZeRO optimizer paths.

Also adds a focused BF16 regression test for the original issue: the final accumulation microbatch must be included in the reduced fp32 gradient buffer.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@maxyu1115
Copy link
Copy Markdown
Owner

Ahhh I didn't test the other optimizers, had claude do some code review and concluded it was fine. Thanks for fixing it!

@maxyu1115 maxyu1115 merged commit d5e54e8 into maxyu1115:fix/bf16-optimizer-grad-accum-boundary-leak Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants