Skip to content

perf: use deque for FIFO queues in sequence parallel, superoffload, and compile#7880

Merged
tohtana merged 1 commit intodeepspeedai:masterfrom
giulio-leone:fix/deque-fifo-queues
Mar 1, 2026
Merged

perf: use deque for FIFO queues in sequence parallel, superoffload, and compile#7880
tohtana merged 1 commit intodeepspeedai:masterfrom
giulio-leone:fix/deque-fifo-queues

Conversation

@giulio-leone
Copy link
Contributor

Problem

Three files use .pop(0) for FIFO queue processing, which is O(n) per removal:

  1. ulysses_sp.py: Micro-batch queue for sequence parallel data sharding
  2. superoffload_stage3.py: Parameter buffer for IPG gradient bucketing
  3. compile/backend.py: Compile pass schedule queue

Solution

Switch to collections.deque with .popleft() for O(1) front removal.

Changes

File Pattern
deepspeed/runtime/sequence_parallel/ulysses_sp.py micro_batches FIFO queue
deepspeed/runtime/superoffload/superoffload_stage3.py params_in_ipg_bucket_buffer drain loop
deepspeed/compile/backend.py remaining_schedule step-by-step consumption

@giulio-leone giulio-leone force-pushed the fix/deque-fifo-queues branch 2 times, most recently from 19b15d0 to 22d7251 Compare February 28, 2026 14:38
…nd compile

Three files drain lists front-to-back via .pop(0):
- ulysses_sp.py: micro_batches queue for sequence parallel data
- superoffload_stage3.py: params_in_ipg_bucket_buffer for gradient bucketing
- compile/backend.py: remaining_schedule for compile pass scheduling

Each .pop(0) is O(n); switching to collections.deque with .popleft()
gives O(1) front removal.

Signed-off-by: g97iulio1609 <giulio97.leone@gmail.com>
@giulio-leone
Copy link
Contributor Author

Friendly ping — CI is green and this is ready for review. Happy to address any feedback. Thanks!

@tohtana tohtana merged commit a15e557 into deepspeedai:master Mar 1, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants