Skip to content

Add: L4+ recursive Worker composition via _l3_worker_loop#583

Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:feat/l4-recursive-worker
Apr 16, 2026
Merged

Add: L4+ recursive Worker composition via _l3_worker_loop#583
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:feat/l4-recursive-worker

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Apr 16, 2026

Summary

  • Worker(level=4) registers Worker(level=3) children via add_worker(). On first run(), parent forks a child process per L3 child; the child inits the inner Worker, then enters _l3_worker_loop which polls the mailbox for (cid, config, args_blob), looks up the orch fn in the COW-inherited Python registry, and calls inner_worker.run(orch_fn, args, cfg).
  • DistWorker::run() callback mechanism for THREAD mode: set_run_callback stores a Python callable; the binding layer acquires the GIL and reconstructs TaskArgs from TaskArgsView.
  • _init_distributed / _start_distributed generalize the former _init_level3 / _start_level3 for all levels >= 3. DistWorker(self.level) replaces hardcoded DistWorker(3).
  • read_args_from_blob(blob_ptr): nanobind binding to reconstruct TaskArgs from a mailbox blob.
  • 13 new tests covering lifecycle, validation, single/multiple dispatches, L4 with own subs, multiple runs, and L3-owns-its-own-Orchestrator.
  • Docs updated: task-flow.md §9 (L4→L3→L2 walkthrough), distributed_level_runtime.md §4, worker-manager.md §5.1 (nested fork ordering), roadmap.md (PR-F landed, PR-G includes ChipCallConfigCallConfig).

Testing

  • pip install . builds cleanly
  • All 24 existing tests pass (no regressions)
  • 13 new tests pass: test_l4_recursive.py
  • grep -rn "DistWorker(3)" python/simpler/ returns 0
  • Linux CI

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements L4+ recursive composition for the distributed runtime, allowing Worker instances to manage lower-level Worker children. Key changes include the introduction of a mailbox-polling loop for process-based recursion, support for thread-based recursion via GIL-acquired callbacks, and generalized initialization logic for all levels above L3. Feedback identifies a critical memory layout mismatch in mailbox offsets for ChipCallConfig and a performance issue where the worker polling loop busy-waits, consuming excessive CPU. Additionally, while the infrastructure for THREAD-mode recursion has been added, it is not yet fully integrated into the Worker class initialization logic.

Comment thread python/simpler/worker.py
Comment thread python/simpler/worker.py
Comment thread python/simpler/worker.py
…r::run

Worker(level=4) can now register Worker(level=3) children via
add_worker(). On first run, the parent forks a child process per L3
child; the child inits the inner Worker, then enters _l3_worker_loop
which polls the shared-memory mailbox for (cid, config, args_blob),
looks up the orch function in the COW-inherited Python registry, and
calls inner_worker.run(orch_fn, args, cfg).

Key changes:
- _l3_worker_loop: Python child loop for L3-as-L4-child (PROCESS mode)
- _read_config_from_mailbox: reconstruct ChipCallConfig from mailbox
- Worker.add_worker(): register un-init'd child Workers (level >= 4)
- _init_distributed / _start_distributed: generalize former _init_level3
  / _start_level3 for all levels >= 3
- DistWorker(self.level) replaces hardcoded DistWorker(3)
- DistWorker::run() invokes a Python callback for THREAD mode (approach
  b: set_run_callback with GIL acquisition in the binding layer)
- read_args_from_blob: nanobind binding to reconstruct TaskArgs from
  mailbox blob pointer
- 13 new tests: lifecycle, validation, single/multiple dispatches, L4
  with own subs, multiple runs, L3 owns its Orchestrator
- Docs updated: task-flow §9, distributed_level_runtime §4,
  worker-manager §5.1, roadmap PR-F landed + PR-G includes
  ChipCallConfig → CallConfig rename

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChaoWao ChaoWao force-pushed the feat/l4-recursive-worker branch from c5d9a9e to 8ee3cfd Compare April 16, 2026 09:09
@ChaoWao ChaoWao merged commit 0c58819 into hw-native-sys:main Apr 16, 2026
15 checks passed
@ChaoWao ChaoWao deleted the feat/l4-recursive-worker branch April 16, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant