Add: L4+ recursive Worker composition via _l3_worker_loop#583
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 16, 2026
Merged
Add: L4+ recursive Worker composition via _l3_worker_loop#583ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao merged 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements L4+ recursive composition for the distributed runtime, allowing Worker instances to manage lower-level Worker children. Key changes include the introduction of a mailbox-polling loop for process-based recursion, support for thread-based recursion via GIL-acquired callbacks, and generalized initialization logic for all levels above L3. Feedback identifies a critical memory layout mismatch in mailbox offsets for ChipCallConfig and a performance issue where the worker polling loop busy-waits, consuming excessive CPU. Additionally, while the infrastructure for THREAD-mode recursion has been added, it is not yet fully integrated into the Worker class initialization logic.
…r::run Worker(level=4) can now register Worker(level=3) children via add_worker(). On first run, the parent forks a child process per L3 child; the child inits the inner Worker, then enters _l3_worker_loop which polls the shared-memory mailbox for (cid, config, args_blob), looks up the orch function in the COW-inherited Python registry, and calls inner_worker.run(orch_fn, args, cfg). Key changes: - _l3_worker_loop: Python child loop for L3-as-L4-child (PROCESS mode) - _read_config_from_mailbox: reconstruct ChipCallConfig from mailbox - Worker.add_worker(): register un-init'd child Workers (level >= 4) - _init_distributed / _start_distributed: generalize former _init_level3 / _start_level3 for all levels >= 3 - DistWorker(self.level) replaces hardcoded DistWorker(3) - DistWorker::run() invokes a Python callback for THREAD mode (approach b: set_run_callback with GIL acquisition in the binding layer) - read_args_from_blob: nanobind binding to reconstruct TaskArgs from mailbox blob pointer - 13 new tests: lifecycle, validation, single/multiple dispatches, L4 with own subs, multiple runs, L3 owns its Orchestrator - Docs updated: task-flow §9, distributed_level_runtime §4, worker-manager §5.1, roadmap PR-F landed + PR-G includes ChipCallConfig → CallConfig rename Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c5d9a9e to
8ee3cfd
Compare
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Worker(level=4)registersWorker(level=3)children viaadd_worker(). On firstrun(), parent forks a child process per L3 child; the child inits the inner Worker, then enters_l3_worker_loopwhich polls the mailbox for(cid, config, args_blob), looks up the orch fn in the COW-inherited Python registry, and callsinner_worker.run(orch_fn, args, cfg).DistWorker::run()callback mechanism for THREAD mode:set_run_callbackstores a Python callable; the binding layer acquires the GIL and reconstructsTaskArgsfromTaskArgsView._init_distributed/_start_distributedgeneralize the former_init_level3/_start_level3for all levels >= 3.DistWorker(self.level)replaces hardcodedDistWorker(3).read_args_from_blob(blob_ptr): nanobind binding to reconstructTaskArgsfrom a mailbox blob.task-flow.md§9 (L4→L3→L2 walkthrough),distributed_level_runtime.md§4,worker-manager.md§5.1 (nested fork ordering),roadmap.md(PR-F landed, PR-G includesChipCallConfig→CallConfig).Testing
pip install .builds cleanlytest_l4_recursive.pygrep -rn "DistWorker(3)" python/simpler/returns 0