Refactor: drop slot ring, make DistTaskSlotState storage dynamic by ChaoWao · Pull Request #563 · hw-native-sys/simpler

ChaoWao · 2026-04-15T03:40:02Z

Summary

Internal cleanup PR, follow-up to #560. Replaces the fixed-size slot pool (DIST_TASK_WINDOW_SIZE = 128 + DistTaskSlotState[] inside DistWorker) with dynamic storage owned by DistRing. Slot state lives entirely in the parent process's heap at L3 — Orchestrator and Scheduler read it directly; child workers only receive payloads through a mailbox, never the slot state — so the shmem-backed ring index L2 uses is unnecessary at this level.

Only the heap still needs a pre-sized region (MAP_SHARED | MAP_ANONYMOUS inherited across fork). Heap back-pressure behaviour is unchanged.

No user-visible change: heap_ring_size=… still works, OUTPUT auto-alloc / INOUT WaW semantics unchanged, back-pressure timeout still throws std::runtime_error.

What changed

DistRing owns three correlated per-task resources now:
- a monotonic int32_t task id (no window, no modulo wrap),
- the shared-memory heap slab (unchanged from Refactor: back orch.alloc with merged slot+heap DistRing + fork hygiene #560),
- the per-slot state as std::deque<std::unique_ptr<DistTaskSlotState>>. std::deque::push_back never invalidates existing pointers, so ring.slot_state(id) hands out a pointer that stays valid for the slot's lifetime without keeping the mutex held past the lookup.
init(heap_bytes, timeout_ms) drops the window_size parameter; the slot ring is gone entirely.
reset_to_empty(): new method called by DistOrchestrator::drain() right after active_tasks_ hits 0. Drops all slot states and zeroes task-id / heap cursors so each Worker.run() starts from task_id = 0 with bounded memory (per-run peak, not cumulative).
DistOrchestrator::init drops slots / num_slots. slot_state(id) delegates to ring.slot_state(id) with a nullptr-check.
DistScheduler::Config drops slots / num_slots, takes DistRing *ring instead. Every cfg_.slots[id] access becomes *cfg_.ring->slot_state(id).
DistWorker drops the std::unique_ptr<DistTaskSlotState[]> slots_ member; slot state lives inside the allocator now. DistWorker::init() is a straight pass-through.
DIST_TASK_WINDOW_SIZE constant removed from dist_types.h.

Tests

test_dist_ring rewritten:
- drops window-size test cases
- adds SlotAllocGrowsPastLegacyWindow — 2048 allocs past the old 128 cap, verifies no "window full" error
- adds SlotStateIsPointerStable — grabs a ptr to slot 0, allocs 1000 more slots, verifies ptr identity
- adds ResetToEmptyRequiresAllReleased + ResetToEmptyResetsCounters
- BackPressureThenReleaseUnblocks / TimeoutThrowsRuntimeError / ShutdownUnblocksAlloc now exercise heap back-pressure directly (the only source of back-pressure remaining)
test_dist_orchestrator / test_dist_scheduler fixtures drop the std::unique_ptr<DistTaskSlotState[]> slots member and access slot state via a short S(id) fixture helper that wraps ring.slot_state(id).

Test plan

cpput: 7/7 targets pass locally (cmake --build /tmp/dist_ut_build && ctest)
pyut: 100 passed, 3 deselected (torch-only; dev box has no torch). test_alloc_dep_wires_via_tensormap and friends unchanged — same INOUT-driven WaW path.
Linux CI

Plan reference

Follow-up of #560 (PR-H). Tracked as PR-I in the hierarchical-runtime plan under "Follow-up PR chain" / Allowed Exception #6 in the L2 Consistency Audit.

Removes the fixed DIST_TASK_WINDOW_SIZE slot pool and the per-slot array DistWorker used to carry. At L3 the slot state lives entirely in the parent process's heap -- never crossed into child workers -- so the ring index L2 uses to address shmem descriptors buys us nothing here. Only the heap needs a pre-sized region for MAP_SHARED fork inheritance. - DistRing: - init() drops window_size; takes only (heap_bytes, timeout_ms). - alloc() returns a monotonic task id; no back-pressure on slot count, only on heap space. - Owns the slot state pool as std::deque<std::unique_ptr<SlotState>>. push_back never invalidates existing pointers, so slot_state(id) returns a pointer that stays valid for the slot's lifetime without holding the mutex past the lookup. - released_ and slot_heap_end_ become std::vector<>, grown via push_back on alloc, indexed directly by task id. - advance_last_alive_locked no longer needs to undo the released bit (entries aren't recycled within a run; reset_to_empty clears them all at drain). - New reset_to_empty(): drops all slot state and zeroes counters. DistOrchestrator::drain() calls it right after active_tasks_ hits 0 so each Worker.run() starts from task id 0 with bounded memory. - DistOrchestrator::init drops slots/num_slots params. slot_state(id) delegates to ring.slot_state(id) with a nullptr->throw guard. - DistScheduler::Config drops slots/num_slots; takes DistRing* and reads slot state via ring->slot_state(id) at every access site. - DistWorker drops the std::unique_ptr<SlotState[]> member; slot state is now entirely in allocator_. DistWorker::init() is a straight passthrough to allocator_/orchestrator_/scheduler_. - dist_types.h: remove DIST_TASK_WINDOW_SIZE constant. Tests: - test_dist_ring rewritten: drop window_size tests, add SlotAllocGrowsPastLegacyWindow (2048 allocs past the old 128 cap), SlotStateIsPointerStable (push_back doesn't invalidate refs), ResetToEmptyRequiresAllReleased, ResetToEmptyResetsCounters. - test_dist_orchestrator / test_dist_scheduler fixtures drop the std::unique_ptr<SlotState[]> member and access via a local S(id) helper that calls ring.slot_state(id). Docs: - orchestrator.md section 5 rewritten to describe the three resources DistRing now owns (task id, heap, slot state) and the end-of-run reset contract. - roadmap.md Dispatch internals bullet updated. Plan (local, gitignored): PR-I moved to "in review"; Allowed Exception hw-native-sys#6 kept (explains why L3 doesn't need a shmem slot ring). No user-visible behaviour change: heap_ring_size still configurable via Worker ctor, OUTPUT auto-alloc / WaW tag semantics unchanged, back- pressure timeout still throws std::runtime_error on heap exhaustion.

ChaoWao · 2026-04-15T03:49:33Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the task management system to use monotonic task IDs and dynamic slot state storage within DistRing, removing the fixed-size window limitation and allowing memory reclamation via a new reset_to_empty method. Feedback identifies a critical exception safety issue in DistRing::alloc where partial state updates could leave the allocator inconsistent, and suggests documentation improvements for type clarity in std::deque declarations.

gemini-code-assist · 2026-04-15T03:58:19Z

    int32_t task_id = next_task_id_++;
-    DistTaskSlot slot = task_id & window_mask_;
-    released_[static_cast<size_t>(slot)] = 0;
-    slot_heap_end_[static_cast<size_t>(slot)] = heap_end;
-    return DistAllocResult{slot, heap_ptr, heap_end};
+    released_.push_back(0);
+    slot_heap_end_.push_back(heap_end);
+    slot_states_.emplace_back(std::make_unique<DistTaskSlotState>());
+    return DistAllocResult{task_id, heap_ptr, heap_end};


The sequence of state updates to released_, slot_heap_end_, and slot_states_ is not exception-safe. If a push_back or emplace_back call throws an exception (e.g., std::bad_alloc), the DistRing will be left in an inconsistent state. To ensure strong exception safety, the state modifications should be transactional, rolling back partial updates if a subsequent update fails. Additionally, ensure consistent type casting when assigning to int32_t to avoid narrowing conversion warnings.

int32_t task_id = static_cast<int32_t>(next_task_id_); released_.push_back(0); try { slot_heap_end_.push_back(heap_end); try { slot_states_.emplace_back(std::make_unique<DistTaskSlotState>()); } catch (...) { slot_heap_end_.pop_back(); throw; } } catch (...) { released_.pop_back(); throw; } next_task_id_++; return DistAllocResult{task_id, heap_ptr, heap_end};

References

Always check the return values of resource allocation functions and handle potential failures gracefully, for example, by rolling back partial allocations.

Be consistent in type casting when assigning size_t values to int32_t variables across similar methods to avoid compiler warnings about narrowing conversions.

gemini-code-assist · 2026-04-15T03:58:19Z

+   so it has to be pre-sized. `heap_ring_size` on the Worker ctor
+   controls the total bytes (default 1 GiB).
+3. The **per-task slot state** (`DistTaskSlotState`) — stored in a
+   `std::deque<std::unique_ptr<...>>`. `std::deque::push_back` never


For clarity and completeness in the documentation, it's better to specify the full type in the std::deque declaration instead of using ....

Suggested change

`std::deque<std::unique_ptr<...>>`. `std::deque::push_back` never

std::deque<std::unique_ptr<DistTaskSlotState>>. std::deque::push_back never

gemini-code-assist · 2026-04-15T03:58:20Z

+ *      the heap must be `mmap(MAP_SHARED)` and forked into child workers,
+ *      which forces a pre-sized region.
+ *   3. The per-task scheduling state (`DistTaskSlotState`). Stored in a
+ *      `std::deque<std::unique_ptr<...>>` so push_back never invalidates


For improved clarity in the documentation, it would be better to use the full type name instead of ... in the std::deque declaration.

Suggested change

* `std::deque<std::unique_ptr<...>>` so push_back never invalidates

* 3. The per-task scheduling state (DistTaskSlotState). Stored in a

* std::deque<std::unique_ptr<DistTaskSlotState>> so push_back never invalidates

gemini-code-assist

Code Review

This pull request refactors the distributed task management system by replacing the fixed-size task slot ring with a monotonic task ID system and dynamic state storage in DistRing. This change removes the legacy window size constraint and provides stable pointers to task states. However, a critical race condition was identified in DistOrchestrator::drain(), where reset_to_empty() is called before the scheduler thread has fully finished accessing task state, potentially leading to a use-after-free.

gemini-code-assist · 2026-04-15T04:07:26Z

+    // Every slot is CONSUMED (active_tasks_ == 0 ⇒ allocator last_alive_ ==
+    // next_task_id_). Drop all per-slot state so the next Worker.run()
+    // starts from task_id = 0 with no accumulated memory.
+    allocator_->reset_to_empty();


There appears to be a critical race condition here. The drain() function, running on the user's thread, calls allocator_->reset_to_empty() as soon as active_tasks_ becomes zero. However, the scheduler thread might still be processing tasks and accessing slot state data.

Specifically, active_tasks_ is decremented in on_consumed, which is called from the scheduler thread. The scheduler thread might continue executing code within on_task_complete (e.g., accessing consumers of a completed task) after active_tasks_ has reached zero and unblocked drain().

When reset_to_empty() is called, it deallocates all DistTaskSlotState objects. If the scheduler thread is concurrently trying to access one of these objects via a pointer obtained from ring->slot_state(), it will result in a use-after-free.

To fix this, you need to ensure the scheduler is quiescent (i.e., not in the middle of processing any task logic) before reset_to_empty() is called. This might involve more sophisticated synchronization between the orchestrator's drain() and the scheduler thread.

References

If a component is accessed by multiple threads (e.g., Orchestrator and Scheduler), it requires a mutex to protect its operations to prevent data races.

A producer should only be skipped or resources released if it is in the CONSUMED state, not just COMPLETED, to prevent use-after-free issues.

ChaoWao merged commit c1b979d into hw-native-sys:main Apr 15, 2026
15 checks passed

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

ChaoWao deleted the refactor/dynamic-slot-storage branch April 16, 2026 03:26

ChaoWao mentioned this pull request Apr 17, 2026

Refactor: drop Dist prefix from runtime types, files, and constants #587

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: drop slot ring, make DistTaskSlotState storage dynamic#563

Refactor: drop slot ring, make DistTaskSlotState storage dynamic#563
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/dynamic-slot-storage

ChaoWao commented Apr 15, 2026

Uh oh!

ChaoWao commented Apr 15, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	`std::deque<std::unique_ptr<...>>`. `std::deque::push_back` never
	std::deque<std::unique_ptr<DistTaskSlotState>>. std::deque::push_back never

	* `std::deque<std::unique_ptr<...>>` so push_back never invalidates
	* 3. The per-task scheduling state (DistTaskSlotState). Stored in a
	* std::deque<std::unique_ptr<DistTaskSlotState>> so push_back never invalidates

Conversation

ChaoWao commented Apr 15, 2026

Summary

What changed

Tests

Test plan

Plan reference

Uh oh!

ChaoWao commented Apr 15, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant