Add: parallel for iteration isolation in tensormap and orchestrator#551
Add: parallel for iteration isolation in tensormap and orchestrator#551zhusy54 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces parallel-for iteration isolation for the PTO2 runtime. It adds lifecycle hooks to the runtime operations, implements RAII guards and macros for parallel scopes, and updates the PTO2TensorMap lookup logic to filter out tensor entries from previous iterations using per-ring local task IDs. These changes are applied consistently across the a2a3 and a5 runtime paths. I have no feedback to provide.
…a2a3/a5) Introduces PTO2_PARALLEL_FOR macro and supporting orchestrator APIs (pto2_parallel_for_begin/end, pto2_parallel_scope_begin/end) to isolate tensormap lookups per loop iteration. Initializes iter_start_local_ids to -1 in PTO2TensorMap::init. Updates alternating_matmul_add, batch_paged_attention, benchmark_bgemm, and paged_attention_unroll scene tests to use PTO2_PARALLEL_FOR.
硬件性能测试结果在 Ascend NPU (device-8) 上对 本 PR(parallel) 与 main 分支(0745dee1) 进行了基准测试对比。 测试环境
对比结果
各轮次明细(Trimmed Avg, µs)
分析
|
Summary
PTO2_PARALLEL_FOR/PTO2_PARALLEL_SCOPEmacros and RAII guards that bracket each loop iteration with a scope-level dependency filteriter_start_local_idsper ring inPTO2TensorMapso that tensor-map lookups skip entries produced in prior iterations on the same ring, preventing false cross-iteration dependencies when independent loop iterations submit tasks concurrentlyparallel_for_begin/endandparallel_scope_begin/endops intoPTO2RuntimeOpsvtableChanges
pto_orchestration_api.h: newparallel_for_begin/endandparallel_scope_begin/endops, inline wrappers, RAII guards,PTO2_PARALLEL_FOR/PTO2_PARALLEL_SCOPEmacros (a2a3 + a5)pto_orchestrator.h/.cpp: implementpto2_parallel_for/scope_begin/endusing existing scope stack + iter_start filter bookkeepingpto_tensormap.h/.cpp: additer_start_local_ids[ring]array, initialise to -1, filter stale entries during lookuppto_ring_buffer.h: exposenext_local_id()for snapshot at scope entrypto_runtime2.h/.cpp: wire new ops intoPTO2RuntimeOpsvtableTesting