feat(runtime): Orchestration PTOParam error handling#306
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the robustness of parameter validation within the orchestration runtime. By transitioning from compile-time assertions to a runtime error-flagging system for Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request refactors PTOParam validation by replacing assert statements with explicit error handling. Methods like add_input and add_scalar now set an has_error flag and error_msg within the PTOParam struct if invalid parameters (e.g., null buffer addresses, exceeding capacity) are detected. The pto2_submit_mixed_task function in the Orchestrator checks this flag, logs a fatal error, sets a new PTO2_ERROR_INVALID_PARAM in shared memory, and prevents further task submissions. The assert.h include was removed. A new documentation file scheduler-orchestration-error-handling.md was added to detail the error handling mechanisms for Scheduler and Orchestrator. A review comment suggests refactoring duplicated tensor capacity checks in PTOParam methods into a shared helper function to improve maintainability.
18b76b8 to
ef517ed
Compare
PTOParam validation previously used assert() which is stripped in release builds. Replace all assert() calls with an error-flag mechanism that integrates with the orchestration error path (LOG_ERROR + orch_error_code + fatal + emergency_shutdown), ensuring validation works in all builds. - Add has_error/error_msg fields and set_error() helper to PTOParam - Merge ordering and capacity checks into check_add_tensor_valid() - Convert all assert() in add_input/add_output/add_inout/add_scalar to set_error() with descriptive messages - Add PTO2_ERROR_INVALID_PARAM (5) error code - Validate params.has_error at pto2_submit_mixed_task entry - Update error handling documentation
| if (!check_add_tensor_valid()) { | ||
| return; | ||
| } | ||
| assert(tensor_count < MAX_TENSORS && "Too many tensor params"); |
…HEAD) Synchronize A5 tensormap_and_ringbuffer runtime and platform with a2a3 improvements introduced after 56a2c61. Follows the sync pattern established in hw-native-sys#250 and hw-native-sys#300. Platform (src/a5/platform/): - 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp), PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp, CMakeLists.txt - b903e7b: sync perf_profiling.h for multi-ring support - 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp - dd7ada4: standardize register init and exit handshake in aicore_executor.cpp - 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h and SUBMIT_BY_CLUSTER.md - a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h - cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp) - b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md, aicpu_executor.cpp, perf_profiling.h) - 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp) - 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler, orchestrator, ring buffer, executor, RUNTIME_LOGIC.md - d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator - 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling - f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp) - 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md - 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory) - 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory, pto_tensormap, pto_orchestrator, runtime2) - 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality - 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling Examples & tests (examples/a5/, tests/device_tests/a5/): - 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels - b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips - 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
…HEAD) Synchronize A5 tensormap_and_ringbuffer runtime and platform with a2a3 improvements introduced after 56a2c61. Follows the sync pattern established in hw-native-sys#250 and hw-native-sys#300. Platform (src/a5/platform/): - 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp), PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp, CMakeLists.txt - b903e7b: sync perf_profiling.h for multi-ring support - 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp - dd7ada4: standardize register init and exit handshake in aicore_executor.cpp - 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h and SUBMIT_BY_CLUSTER.md - a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h - cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp) - b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md, aicpu_executor.cpp, perf_profiling.h) - 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp) - 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler, orchestrator, ring buffer, executor, RUNTIME_LOGIC.md - d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator - 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling - f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp) - 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md - 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory) - 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory, pto_tensormap, pto_orchestrator, runtime2) - 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality - 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling Examples & tests (examples/a5/, tests/device_tests/a5/): - 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels - b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips - 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
Synchronize A5 tensormap_and_ringbuffer runtime and platform with a2a3 improvements introduced after 56a2c61. Follows the sync pattern established in hw-native-sys#250 and hw-native-sys#300. Platform (src/a5/platform/): - 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp), PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp, CMakeLists.txt - b903e7b: sync perf_profiling.h for multi-ring support - 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp - dd7ada4: standardize register init and exit handshake in aicore_executor.cpp - 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp - 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h and SUBMIT_BY_CLUSTER.md - a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h - cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp) - b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md, aicpu_executor.cpp, perf_profiling.h) - 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp) - 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler, orchestrator, ring buffer, executor, RUNTIME_LOGIC.md - d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator - 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling - f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp) - 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md - 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory) - 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory, pto_tensormap, pto_orchestrator, runtime2) - 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality - 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling - e4348eb (hw-native-sys#315): move reclamation state into owning data structures - c17770e (hw-native-sys#320): encapsulate dep pool operations and reorder orchestrator pipeline - 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor Tests (tests/device_tests/a5/): - 439ccd4 (hw-native-sys#322): unify paged attention golden cases across test variants Examples & tests (examples/a5/, tests/device_tests/a5/): - 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels - b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips - 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
Synchronize A5 tensormap_and_ringbuffer runtime and platform with a2a3 improvements introduced after 56a2c61. Follows the sync pattern established in #250 and #300. Platform (src/a5/platform/): - 2f58a2f (#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp), PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp, CMakeLists.txt - b903e7b: sync perf_profiling.h for multi-ring support - 334d355 (#254): sync performance_collector_aicore.h for slim dispatch Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 334d355 (#254): slim dispatch payload in aicore_executor.cpp - dd7ada4: standardize register init and exit handshake in aicore_executor.cpp - 2f58a2f (#267): AICPU affinity gate in aicpu_executor.cpp - 83473ba (#323): replace block core assignment with round-robin in AICPU executor Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - e2e38b9 (#249): cluster-based mixed-task dispatch; add pto_submit_types.h and SUBMIT_BY_CLUSTER.md - a842263 (#255): separate local ready queue by CoreType in pto_scheduler.h - cf6462c (#268): consolidate per-task state into PTO2TaskSlotState (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp) - b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md, aicpu_executor.cpp, perf_profiling.h) - 5d92137 (#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp) - 54d082c (#281): replace task_id with slot-state pointer across scheduler, orchestrator, ring buffer, executor, RUNTIME_LOGIC.md - d305376 (#277): add scope deadlock detection in pto_orchestrator - 1e41a3a (#274): per-thread orchestrator phase profiling - f5da078 (#275): progress-aware ring buffer spin detection (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp) - 10f6415 (#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md - 9c158e0 (#291): emergency shutdown on fatal error (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory) - 94f39ff (#301): refactor PTOParam to aggregated container with parallel arrays (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory, pto_tensormap, pto_orchestrator, runtime2) - 15e6034 (#308): refactor Tensor fields and pto_tensormap for cache locality - 77a81aa (#306): replace PTOParam assert with orchestration error handling - e4348eb (#315): move reclamation state into owning data structures - c17770e (#320): encapsulate dep pool operations and reorder orchestrator pipeline - 83473ba (#323): replace block core assignment with round-robin in AICPU executor Tests (tests/device_tests/a5/): - 439ccd4 (#322): unify paged attention golden cases across test variants Examples & tests (examples/a5/, tests/device_tests/a5/): - 8cf8981 (#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels - b88eed3 (#302): optimize paged attention pipeline, eliminate GM round-trips - 94f39ff (#301) + 15e6034 (#308): update orchestration to new PTOParam API
…ing (hw-native-sys#306) PTOParam validation previously used assert() which is stripped in release builds. Replace all assert() calls with an error-flag mechanism that integrates with the orchestration error path (LOG_ERROR + orch_error_code + fatal + emergency_shutdown), ensuring validation works in all builds. - Add has_error/error_msg fields and set_error() helper to PTOParam - Merge ordering and capacity checks into check_add_tensor_valid() - Convert all assert() in add_input/add_output/add_inout/add_scalar to set_error() with descriptive messages - Add PTO2_ERROR_INVALID_PARAM (5) error code - Validate params.has_error at pto2_submit_mixed_task entry - Update error handling documentation Co-authored-by: zhusy54 <zhusiyu1@hisilicon.com>
…e-sys#314) Synchronize A5 tensormap_and_ringbuffer runtime and platform with a2a3 improvements introduced after 56a2c61. Follows the sync pattern established in hw-native-sys#250 and hw-native-sys#300. Platform (src/a5/platform/): - 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp), PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp, CMakeLists.txt - b903e7b: sync perf_profiling.h for multi-ring support - 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp - dd7ada4: standardize register init and exit handshake in aicore_executor.cpp - 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp - 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h and SUBMIT_BY_CLUSTER.md - a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h - cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp) - b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md, aicpu_executor.cpp, perf_profiling.h) - 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp) - 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler, orchestrator, ring buffer, executor, RUNTIME_LOGIC.md - d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator - 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling - f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp) - 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md - 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory) - 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory, pto_tensormap, pto_orchestrator, runtime2) - 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality - 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling - e4348eb (hw-native-sys#315): move reclamation state into owning data structures - c17770e (hw-native-sys#320): encapsulate dep pool operations and reorder orchestrator pipeline - 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor Tests (tests/device_tests/a5/): - 439ccd4 (hw-native-sys#322): unify paged attention golden cases across test variants Examples & tests (examples/a5/, tests/device_tests/a5/): - 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels - b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips - 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
…e-sys#314) Synchronize A5 tensormap_and_ringbuffer runtime and platform with a2a3 improvements introduced after 56a2c61. Follows the sync pattern established in hw-native-sys#250 and hw-native-sys#300. Platform (src/a5/platform/): - 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp), PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp, CMakeLists.txt - b903e7b: sync perf_profiling.h for multi-ring support - 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp - dd7ada4: standardize register init and exit handshake in aicore_executor.cpp - 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp - 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h and SUBMIT_BY_CLUSTER.md - a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h - cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp) - b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md, aicpu_executor.cpp, perf_profiling.h) - 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp) - 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler, orchestrator, ring buffer, executor, RUNTIME_LOGIC.md - d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator - 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling - f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp) - 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md - 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory) - 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory, pto_tensormap, pto_orchestrator, runtime2) - 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality - 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling - e4348eb (hw-native-sys#315): move reclamation state into owning data structures - c17770e (hw-native-sys#320): encapsulate dep pool operations and reorder orchestrator pipeline - 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor Tests (tests/device_tests/a5/): - 439ccd4 (hw-native-sys#322): unify paged attention golden cases across test variants Examples & tests (examples/a5/, tests/device_tests/a5/): - 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels - b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips - 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
Summary
assert()calls inPTOParamwith a deferred error-flag mechanism (has_error/error_msg) that works in both debug and release buildsLOG_ERROR→orch_error_code→fatal→emergency_shutdownassert()was stripped in release builds (NDEBUG), allowing invalid parameters to silently pass throughKey Changes
pto_types.h: Addhas_error/error_msgfields andset_error()helper to PTOParam; convert allassert()inadd_input/add_output/add_inout/add_scalar/add_scalars/add_scalars_i32/copy_scalars_fromtoset_error()with descriptive messagespto_orchestrator.cpp: Addparams.has_errorvalidation atpto2_submit_mixed_taskentry, before any resource allocation, following the same pattern as existing scope deadlock detectionpto_runtime2_types.h: AddPTO2_ERROR_INVALID_PARAM (5)error code in orchestrator error range (1-99)scheduler-orchestration-error-handling.md: Update error code table and error setting locations with the new error typeError Scenarios Covered
params.add_scalar(v); params.add_input(t);add_input/add_output/add_inout called after add_scalaradd_input/add_output/add_inoutcallsToo many tensor params (exceeds MAX_TENSORS=32)add_input(t)wheret.buffer.addr == 0INPUT tensor must have a non-NULL buffer addressadd_inout(t)wheret.buffer.addr == 0INOUT tensor must have a non-NULL buffer addressToo many scalar params (exceeds MAX_SCALARS=128)copy_scalars_fromwith invalid rangeSource scalar range out of boundsExpected output includes: