Skip to content

feat(runtime): Orchestration PTOParam error handling#306

Merged
poursoul merged 1 commit into
hw-native-sys:mainfrom
zhusy54:orch-err
Mar 17, 2026
Merged

feat(runtime): Orchestration PTOParam error handling#306
poursoul merged 1 commit into
hw-native-sys:mainfrom
zhusy54:orch-err

Conversation

@zhusy54

@zhusy54 zhusy54 commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replace all assert() calls in PTOParam with a deferred error-flag mechanism (has_error / error_msg) that works in both debug and release builds
  • Integrate PTOParam validation into the orchestration error path: LOG_ERRORorch_error_codefatalemergency_shutdown
  • Previously, assert() was stripped in release builds (NDEBUG), allowing invalid parameters to silently pass through

Key Changes

  • pto_types.h: Add has_error/error_msg fields and set_error() helper to PTOParam; convert all assert() in add_input/add_output/add_inout/add_scalar/add_scalars/add_scalars_i32/copy_scalars_from to set_error() with descriptive messages
  • pto_orchestrator.cpp: Add params.has_error validation at pto2_submit_mixed_task entry, before any resource allocation, following the same pattern as existing scope deadlock detection
  • pto_runtime2_types.h: Add PTO2_ERROR_INVALID_PARAM (5) error code in orchestrator error range (1-99)
  • scheduler-orchestration-error-handling.md: Update error code table and error setting locations with the new error type

Error Scenarios Covered

Scenario Trigger Error Message
Scalar before tensor params.add_scalar(v); params.add_input(t); add_input/add_output/add_inout called after add_scalar
Too many tensors More than 32 add_input/add_output/add_inout calls Too many tensor params (exceeds MAX_TENSORS=32)
NULL input address add_input(t) where t.buffer.addr == 0 INPUT tensor must have a non-NULL buffer address
NULL inout address add_inout(t) where t.buffer.addr == 0 INOUT tensor must have a non-NULL buffer address
Too many scalars More than 128 scalar values Too many scalar params (exceeds MAX_SCALARS=128)
Scalar copy out of bounds copy_scalars_from with invalid range Source scalar range out of bounds

Expected output includes:

[ERROR] FATAL: Invalid PTOParam Detected!
[ERROR] Error: add_input/add_output/add_inout called after add_scalar: all tensors must be added before any scalars
[ERROR]   tensor_count: 0, scalar_count: 1
[ERROR] This is a bug in the orchestration code.
...
[ERROR] Thread 0: Fatal error (code=5), sending EXIT_SIGNAL to all cores
[WARN]  Emergency shutdown: sending exit signal to all initialized cores

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of parameter validation within the orchestration runtime. By transitioning from compile-time assertions to a runtime error-flagging system for PTOParam objects, it ensures that malformed parameters are reliably detected and handled, preventing silent failures in production environments. The changes establish a clear error propagation path, allowing the system to gracefully manage and report issues related to invalid task parameters.

Highlights

  • Enhanced PTOParam Validation: Replaced assert() calls in PTOParam methods with a deferred error-flagging mechanism (has_error and error_msg) to ensure parameter validation occurs consistently in both debug and release builds.
  • Integrated Error Handling: Integrated the new PTOParam validation into the orchestration error path, causing LOG_ERROR, orch_error_code setting, and fatal flag activation upon detection of invalid parameters.
  • New Error Code: Introduced PTO2_ERROR_INVALID_PARAM (5) to specifically identify PTOParam construction errors within the orchestrator's error range.
  • Documentation Update: Updated the scheduler-orchestration-error-handling.md document to reflect the new error code and its handling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/dev/scheduler-orchestration-error-handling.md
    • Added a new documentation file detailing error handling for Scheduler and Orchestration, including the new PTO2_ERROR_INVALID_PARAM.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
    • Implemented PTOParam validation at the task submission entry point, logging errors and setting a fatal flag if parameters are invalid.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
    • Defined a new error code, PTO2_ERROR_INVALID_PARAM, to signify issues with PTOParam construction.
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
    • Refactored PTOParam to replace assert() calls with a deferred error-flagging mechanism (has_error, error_msg, set_error), ensuring runtime validation.
Activity
  • The pull request was created by zhusy54, outlining the summary and key changes in the description.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors PTOParam validation by replacing assert statements with explicit error handling. Methods like add_input and add_scalar now set an has_error flag and error_msg within the PTOParam struct if invalid parameters (e.g., null buffer addresses, exceeding capacity) are detected. The pto2_submit_mixed_task function in the Orchestrator checks this flag, logs a fatal error, sets a new PTO2_ERROR_INVALID_PARAM in shared memory, and prevents further task submissions. The assert.h include was removed. A new documentation file scheduler-orchestration-error-handling.md was added to detail the error handling mechanisms for Scheduler and Orchestrator. A review comment suggests refactoring duplicated tensor capacity checks in PTOParam methods into a shared helper function to improve maintainability.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
@zhusy54 zhusy54 force-pushed the orch-err branch 3 times, most recently from 18b76b8 to ef517ed Compare March 17, 2026 11:26
PTOParam validation previously used assert() which is stripped in release
builds. Replace all assert() calls with an error-flag mechanism that
integrates with the orchestration error path (LOG_ERROR + orch_error_code
+ fatal + emergency_shutdown), ensuring validation works in all builds.

- Add has_error/error_msg fields and set_error() helper to PTOParam
- Merge ordering and capacity checks into check_add_tensor_valid()
- Convert all assert() in add_input/add_output/add_inout/add_scalar to
  set_error() with descriptive messages
- Add PTO2_ERROR_INVALID_PARAM (5) error code
- Validate params.has_error at pto2_submit_mixed_task entry
- Update error handling documentation
if (!check_add_tensor_valid()) {
return;
}
assert(tensor_count < MAX_TENSORS && "Too many tensor params");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensor的count约束丢失了

@poursoul poursoul merged commit 77a81aa into hw-native-sys:main Mar 17, 2026
5 checks passed
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Mar 18, 2026
…HEAD)

Synchronize A5 tensormap_and_ringbuffer runtime and platform with
a2a3 improvements introduced after 56a2c61. Follows the sync pattern
established in hw-native-sys#250 and hw-native-sys#300.

Platform (src/a5/platform/):
- 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp),
  PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp,
  CMakeLists.txt
- b903e7b: sync perf_profiling.h for multi-ring support
- 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp
- dd7ada4: standardize register init and exit handshake in aicore_executor.cpp
- 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h
  and SUBMIT_BY_CLUSTER.md
- a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h
- cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState
  (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp)
- b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md,
  aicpu_executor.cpp, perf_profiling.h)
- 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp)
- 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler,
  orchestrator, ring buffer, executor, RUNTIME_LOGIC.md
- d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator
- 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling
- f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection
  (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp)
- 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md
- 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error
  (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory)
- 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays
  (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory,
  pto_tensormap, pto_orchestrator, runtime2)
- 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality
- 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling

Examples & tests (examples/a5/, tests/device_tests/a5/):
- 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels
- b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips
- 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Mar 19, 2026
…HEAD)

Synchronize A5 tensormap_and_ringbuffer runtime and platform with
a2a3 improvements introduced after 56a2c61. Follows the sync pattern
established in hw-native-sys#250 and hw-native-sys#300.

Platform (src/a5/platform/):
- 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp),
  PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp,
  CMakeLists.txt
- b903e7b: sync perf_profiling.h for multi-ring support
- 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp
- dd7ada4: standardize register init and exit handshake in aicore_executor.cpp
- 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h
  and SUBMIT_BY_CLUSTER.md
- a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h
- cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState
  (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp)
- b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md,
  aicpu_executor.cpp, perf_profiling.h)
- 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp)
- 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler,
  orchestrator, ring buffer, executor, RUNTIME_LOGIC.md
- d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator
- 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling
- f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection
  (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp)
- 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md
- 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error
  (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory)
- 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays
  (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory,
  pto_tensormap, pto_orchestrator, runtime2)
- 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality
- 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling

Examples & tests (examples/a5/, tests/device_tests/a5/):
- 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels
- b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips
- 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
ChaoZheng109 added a commit to ChaoZheng109/simpler that referenced this pull request Mar 19, 2026
Synchronize A5 tensormap_and_ringbuffer runtime and platform with
a2a3 improvements introduced after 56a2c61. Follows the sync pattern
established in hw-native-sys#250 and hw-native-sys#300.

Platform (src/a5/platform/):
- 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp),
  PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp,
  CMakeLists.txt
- b903e7b: sync perf_profiling.h for multi-ring support
- 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp
- dd7ada4: standardize register init and exit handshake in aicore_executor.cpp
- 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp
- 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h
  and SUBMIT_BY_CLUSTER.md
- a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h
- cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState
  (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp)
- b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md,
  aicpu_executor.cpp, perf_profiling.h)
- 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp)
- 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler,
  orchestrator, ring buffer, executor, RUNTIME_LOGIC.md
- d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator
- 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling
- f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection
  (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp)
- 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md
- 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error
  (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory)
- 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays
  (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory,
  pto_tensormap, pto_orchestrator, runtime2)
- 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality
- 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling
- e4348eb (hw-native-sys#315): move reclamation state into owning data structures
- c17770e (hw-native-sys#320): encapsulate dep pool operations and reorder orchestrator pipeline
- 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor

Tests (tests/device_tests/a5/):
- 439ccd4 (hw-native-sys#322): unify paged attention golden cases across test variants

Examples & tests (examples/a5/, tests/device_tests/a5/):
- 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels
- b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips
- 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
ChaoZheng109 added a commit that referenced this pull request Mar 19, 2026
Synchronize A5 tensormap_and_ringbuffer runtime and platform with
a2a3 improvements introduced after 56a2c61. Follows the sync pattern
established in #250 and #300.

Platform (src/a5/platform/):
- 2f58a2f (#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp),
  PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp,
  CMakeLists.txt
- b903e7b: sync perf_profiling.h for multi-ring support
- 334d355 (#254): sync performance_collector_aicore.h for slim dispatch

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 334d355 (#254): slim dispatch payload in aicore_executor.cpp
- dd7ada4: standardize register init and exit handshake in aicore_executor.cpp
- 2f58a2f (#267): AICPU affinity gate in aicpu_executor.cpp
- 83473ba (#323): replace block core assignment with round-robin in AICPU executor

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- e2e38b9 (#249): cluster-based mixed-task dispatch; add pto_submit_types.h
  and SUBMIT_BY_CLUSTER.md
- a842263 (#255): separate local ready queue by CoreType in pto_scheduler.h
- cf6462c (#268): consolidate per-task state into PTO2TaskSlotState
  (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp)
- b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md,
  aicpu_executor.cpp, perf_profiling.h)
- 5d92137 (#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp)
- 54d082c (#281): replace task_id with slot-state pointer across scheduler,
  orchestrator, ring buffer, executor, RUNTIME_LOGIC.md
- d305376 (#277): add scope deadlock detection in pto_orchestrator
- 1e41a3a (#274): per-thread orchestrator phase profiling
- f5da078 (#275): progress-aware ring buffer spin detection
  (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp)
- 10f6415 (#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md
- 9c158e0 (#291): emergency shutdown on fatal error
  (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory)
- 94f39ff (#301): refactor PTOParam to aggregated container with parallel arrays
  (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory,
  pto_tensormap, pto_orchestrator, runtime2)
- 15e6034 (#308): refactor Tensor fields and pto_tensormap for cache locality
- 77a81aa (#306): replace PTOParam assert with orchestration error handling
- e4348eb (#315): move reclamation state into owning data structures
- c17770e (#320): encapsulate dep pool operations and reorder orchestrator pipeline
- 83473ba (#323): replace block core assignment with round-robin in AICPU executor

Tests (tests/device_tests/a5/):
- 439ccd4 (#322): unify paged attention golden cases across test variants

Examples & tests (examples/a5/, tests/device_tests/a5/):
- 8cf8981 (#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels
- b88eed3 (#302): optimize paged attention pipeline, eliminate GM round-trips
- 94f39ff (#301) + 15e6034 (#308): update orchestration to new PTOParam API
PKUZHOU pushed a commit to PKUZHOU/simpler that referenced this pull request Mar 31, 2026
…ing (hw-native-sys#306)

PTOParam validation previously used assert() which is stripped in release
builds. Replace all assert() calls with an error-flag mechanism that
integrates with the orchestration error path (LOG_ERROR + orch_error_code
+ fatal + emergency_shutdown), ensuring validation works in all builds.

- Add has_error/error_msg fields and set_error() helper to PTOParam
- Merge ordering and capacity checks into check_add_tensor_valid()
- Convert all assert() in add_input/add_output/add_inout/add_scalar to
  set_error() with descriptive messages
- Add PTO2_ERROR_INVALID_PARAM (5) error code
- Validate params.has_error at pto2_submit_mixed_task entry
- Update error handling documentation

Co-authored-by: zhusy54 <zhusiyu1@hisilicon.com>
PKUZHOU pushed a commit to PKUZHOU/simpler that referenced this pull request Mar 31, 2026
…e-sys#314)

Synchronize A5 tensormap_and_ringbuffer runtime and platform with
a2a3 improvements introduced after 56a2c61. Follows the sync pattern
established in hw-native-sys#250 and hw-native-sys#300.

Platform (src/a5/platform/):
- 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp),
  PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp,
  CMakeLists.txt
- b903e7b: sync perf_profiling.h for multi-ring support
- 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp
- dd7ada4: standardize register init and exit handshake in aicore_executor.cpp
- 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp
- 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h
  and SUBMIT_BY_CLUSTER.md
- a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h
- cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState
  (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp)
- b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md,
  aicpu_executor.cpp, perf_profiling.h)
- 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp)
- 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler,
  orchestrator, ring buffer, executor, RUNTIME_LOGIC.md
- d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator
- 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling
- f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection
  (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp)
- 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md
- 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error
  (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory)
- 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays
  (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory,
  pto_tensormap, pto_orchestrator, runtime2)
- 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality
- 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling
- e4348eb (hw-native-sys#315): move reclamation state into owning data structures
- c17770e (hw-native-sys#320): encapsulate dep pool operations and reorder orchestrator pipeline
- 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor

Tests (tests/device_tests/a5/):
- 439ccd4 (hw-native-sys#322): unify paged attention golden cases across test variants

Examples & tests (examples/a5/, tests/device_tests/a5/):
- 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels
- b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips
- 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
PKUZHOU pushed a commit to PKUZHOU/simpler that referenced this pull request Mar 31, 2026
…e-sys#314)

Synchronize A5 tensormap_and_ringbuffer runtime and platform with
a2a3 improvements introduced after 56a2c61. Follows the sync pattern
established in hw-native-sys#250 and hw-native-sys#300.

Platform (src/a5/platform/):
- 2f58a2f (hw-native-sys#267): add AICPU thread affinity (platform_aicpu_affinity.h/cpp),
  PLATFORM_MAX_AICPU_THREADS_JUST_FOR_LAUNCH, device_runner, kernel.cpp,
  CMakeLists.txt
- b903e7b: sync perf_profiling.h for multi-ring support
- 334d355 (hw-native-sys#254): sync performance_collector_aicore.h for slim dispatch

Runtime host_build_graph (src/a5/runtime/host_build_graph/):
- 334d355 (hw-native-sys#254): slim dispatch payload in aicore_executor.cpp
- dd7ada4: standardize register init and exit handshake in aicore_executor.cpp
- 2f58a2f (hw-native-sys#267): AICPU affinity gate in aicpu_executor.cpp
- 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor

Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/):
- e2e38b9 (hw-native-sys#249): cluster-based mixed-task dispatch; add pto_submit_types.h
  and SUBMIT_BY_CLUSTER.md
- a842263 (hw-native-sys#255): separate local ready queue by CoreType in pto_scheduler.h
- cf6462c (hw-native-sys#268): consolidate per-task state into PTO2TaskSlotState
  (pto_runtime2_types.h, pto_scheduler.cpp, pto_orchestrator.cpp)
- b903e7b: multi-ring buffer architecture (pto_shared_memory, MULTI_RING.md,
  aicpu_executor.cpp, perf_profiling.h)
- 5d92137 (hw-native-sys#264): DepListPool ring buffer reclamation (pto_ring_buffer.h/cpp)
- 54d082c (hw-native-sys#281): replace task_id with slot-state pointer across scheduler,
  orchestrator, ring buffer, executor, RUNTIME_LOGIC.md
- d305376 (hw-native-sys#277): add scope deadlock detection in pto_orchestrator
- 1e41a3a (hw-native-sys#274): per-thread orchestrator phase profiling
- f5da078 (hw-native-sys#275): progress-aware ring buffer spin detection
  (pto_ring_buffer.h, pto_orchestrator.cpp, runtime_maker.cpp)
- 10f6415 (hw-native-sys#284): tighten PTO2_PROFILING macro guards; sync profiling_levels.md
- 9c158e0 (hw-native-sys#291): emergency shutdown on fatal error
  (aicpu_executor, pto_orchestration_api.h, pto_orchestrator, pto_shared_memory)
- 94f39ff (hw-native-sys#301): refactor PTOParam to aggregated container with parallel arrays
  (pto_types.h, pto_runtime2_types.h, pto_scheduler, pto_shared_memory,
  pto_tensormap, pto_orchestrator, runtime2)
- 15e6034 (hw-native-sys#308): refactor Tensor fields and pto_tensormap for cache locality
- 77a81aa (hw-native-sys#306): replace PTOParam assert with orchestration error handling
- e4348eb (hw-native-sys#315): move reclamation state into owning data structures
- c17770e (hw-native-sys#320): encapsulate dep pool operations and reorder orchestrator pipeline
- 83473ba (hw-native-sys#323): replace block core assignment with round-robin in AICPU executor

Tests (tests/device_tests/a5/):
- 439ccd4 (hw-native-sys#322): unify paged attention golden cases across test variants

Examples & tests (examples/a5/, tests/device_tests/a5/):
- 8cf8981 (hw-native-sys#293): replace PipeSyncFunc with FULL_MEMORY_BARRIER in kernels
- b88eed3 (hw-native-sys#302): optimize paged attention pipeline, eliminate GM round-trips
- 94f39ff (hw-native-sys#301) + 15e6034 (hw-native-sys#308): update orchestration to new PTOParam API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants