Skip to content

add README.md#4

Merged
tscholak merged 49 commits into
mainfrom
tscholak/readme
Oct 21, 2024
Merged

add README.md#4
tscholak merged 49 commits into
mainfrom
tscholak/readme

Conversation

@tscholak
Copy link
Copy Markdown
Collaborator

@tscholak tscholak commented Oct 16, 2024

  • add README.md
  • add/update examples:
    • Slurm
    • Kubernetes
  • add CI test, docker build, and docker push workflow
  • reworked documentation workflow

@tscholak tscholak marked this pull request as ready for review October 18, 2024 13:45
@tscholak tscholak requested a review from jlamypoirier October 18, 2024 13:46
Comment thread .github/workflows/run-tests.yaml Outdated
Comment thread examples/mistral-4-node-benchmark.yaml Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread tests/test_functional.py Outdated
Comment thread .github/workflows/ci.yaml Outdated
@tscholak tscholak changed the title add readme add README.md Oct 19, 2024
@tscholak tscholak requested a review from jlamypoirier October 21, 2024 16:29
@tscholak
Copy link
Copy Markdown
Collaborator Author

Everything works now. Can I ask for a final review, please?

@tscholak tscholak merged commit ac1447d into main Oct 21, 2024
@tscholak tscholak deleted the tscholak/readme branch October 21, 2024 16:55
jlamypoirier added a commit that referenced this pull request May 20, 2026
- Add `sp_group` arg to fused_gspo_loss_forward_backward and all-reduce the
  three segment buffers over it when sequence-parallel shards the sequence
  across the TP group; otherwise per-segment ratios use partial sums and
  produce silent corruption under SP. Wrapper passes `self._parallel_dim.group`
  when `_sequence_parallel` is active.
- Wire `num_labels_in_seq` through the GSPO test and assert
  `new_logprobs_fused` against the reference. Required aligning the reference
  to use scaled logits for new_logprobs (reusing `target_log_probabilities`),
  matching the kernel's behavior of reporting the loss-path log-probs.
- Drop the unreachable `max(num_segments, 1)` guard in the GSPO reference and
  the matching `divisor=max(num_segments, 1)` at the test call site.

SDP all-reduce branch coverage (review item 3) deferred to a follow-up adding
a `gspo_loss` flag to `tests/layers/test_lm_head.py` alongside the existing
GRPO config, with an SDP distributed variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants