Fix backward compatibility in normalization by jlamypoirier · Pull Request #3 · ServiceNow/Fast-LLM

jlamypoirier · 2024-10-16T14:18:03Z

Hopefully this works.

- Add `_sdp_dim`/`_sdp_active` to `LanguageModelLoss.__init__` so GSPO's SDP branch doesn't AttributeError on the first non-test call. - Replace `document_index.max().item()` (and the SDP MAX all-reduce) with `len(kwargs[BlockKwargs.lengths])`: CPU-side, identical across SDP ranks, removes two GPU→CPU syncs per microbatch. - Decorate `fused_gspo_loss_forward_backward` with `@torch.compile` for parity with GRPO. The `num_segments == 1` test case skips on CPU since torch._inductor's CPU codegen mishandles `index_add_` into a size-1 buffer (atomic_add scatter). - Make `divisor` a required arg on `fused_gspo_loss_forward_backward`: the wrapper always overrides it with the global document count, and the previous local-rank default would silently mis-normalize under SDP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix backward compatibility in normalization

f98429a

jlamypoirier merged commit 87f23a0 into main Oct 16, 2024

jlamypoirier deleted the flat_backward_compatible branch October 16, 2024 14:37

tscholak added this to the 0.2.0 milestone Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix backward compatibility in normalization#3

Fix backward compatibility in normalization#3
jlamypoirier merged 1 commit into
mainfrom
flat_backward_compatible

jlamypoirier commented Oct 16, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jlamypoirier commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jlamypoirier commented Oct 16, 2024 •

edited

Loading