Skip to content

Refactor: migrate A5 examples and tests to SceneTestCase format #577

Merged
ChaoWao merged 4 commits intohw-native-sys:mainfrom
doraemonmj:pytest
Apr 17, 2026
Merged

Refactor: migrate A5 examples and tests to SceneTestCase format #577
ChaoWao merged 4 commits intohw-native-sys:mainfrom
doraemonmj:pytest

Conversation

@doraemonmj
Copy link
Copy Markdown
Contributor

  • Replace golden.py + kernel_config.py with unified test_*.py files
    using @scene_test decorator and SceneTestCase base class
  • Covers examples/a5/{host_build_graph,tensormap_and_ringbuffer} (14 examples)
    and tests/st/a5/{host_build_graph,tensormap_and_ringbuffer} (3 tests)
  • Add a5sim to platforms for all cases that support simulation
  • Cross-directory kernel references use relative paths (../)

全量 Case 表

# runtime 用例名 Case 位置 sim onboard dtype 精度 (R/A) block_dim thread_num 迁移情况
1 host dump_tensor default tests/st/a5/host_build_graph/dump_tensor/ Y Y fp32 N/A 3 3 不需要修改
2 host paged_attention (st) Case1 tests/st/a5/host_build_graph/paged_attention/ Y bf16 1e-3/1e-3 24 3 已修改,不需迁移
3 host paged_attention (st) Case2 tests/st/a5/host_build_graph/paged_attention/ Y bf16 1e-3/1e-3 24 3 已修改,不需迁移
4 host paged_attention (example) Case1 examples/a5/host_build_graph/paged_attention/ Y fp16 1e-2/1e-2 3 3 已合并
5 host paged_attention (example) Case2 examples/a5/host_build_graph/paged_attention/ Y fp16 1e-2/1e-2 3 3 已合并
6 tmrb explicit_fatal (st) default tests/st/a5/tensormap_and_ringbuffer/explicit_fatal/ Y N/A N/A 24 4 不需要修改
7 tmrb paged_attention (st) Case1 tests/st/a5/tensormap_and_ringbuffer/paged_attention/ Y bf16 1e-3/1e-3 24 4 修改并迁移
8 tmrb paged_attention (st) Case2 tests/st/a5/tensormap_and_ringbuffer/paged_attention/ Y bf16 1e-3/1e-3 24 4 修改并迁移
9 tmrb paged_attention (st) Case3 tests/st/a5/tensormap_and_ringbuffer/paged_attention/ Y bf16 1e-3/1e-3 24 4 修改并迁移
10 tmrb paged_attention_unroll (st) Case1 tests/st/a5/tensormap_and_ringbuffer/paged_attention_unroll/ Y bf16 1e-3/1e-3 36 4 已修改,无需迁移
11 tmrb paged_attention_unroll (st) Case2 tests/st/a5/tensormap_and_ringbuffer/paged_attention_unroll/ Y bf16 1e-3/1e-3 36 4 已修改,无需迁移
12 tmrb paged_attention_unroll (st) Case3 tests/st/a5/tensormap_and_ringbuffer/paged_attention_unroll/ Y bf16 1e-3/1e-3 36 4 已修改,无需迁移
13 tmrb bgemm (example) default examples/a5/tensormap_and_ringbuffer/bgemm/ Y Y fp32 1e-3/1e-3 3 4 已修改,无需迁移
14 tmrb mixed_example (example) case1 examples/a5/tensormap_and_ringbuffer/mixed_example/ Y Y fp32 1e-3/1e-3 3 4 合并,修改为bf16和1e-3
15 tmrb mixed_example (example) case2 examples/a5/tensormap_and_ringbuffer/mixed_example/ Y Y fp32 1e-3/1e-3 3 4 合并,修改为bf16和1e-3
16 tmrb paged_attention (example) Case1 examples/a5/tensormap_and_ringbuffer/paged_attention/ Y fp16 1e-2/1e-2 24 4 合并,修改为bf16和1e-3
17 tmrb paged_attention (example) Case2 examples/a5/tensormap_and_ringbuffer/paged_attention/ Y fp16 1e-2/1e-2 24 4 合并,修改为bf16和1e-3
18 tmrb paged_attention (example) CaseVarSeq2 examples/a5/tensormap_and_ringbuffer/paged_attention/ Y fp16 1e-2/1e-2 24 4 合并,修改为bf16和1e-3
19 tmrb paged_attention (example) CaseVarSeq4 examples/a5/tensormap_and_ringbuffer/paged_attention/ Y fp16 1e-2/1e-2 24 4 合并,修改为bf16和1e-3
20 tmrb spmd_basic (example) default examples/a5/tensormap_and_ringbuffer/spmd_basic/ Y N/A 0/0 24 4 修改并迁移
21 tmrb spmd_multiblock_aiv (example) default examples/a5/tensormap_and_ringbuffer/spmd_multiblock_aiv/ Y N/A 0/0 24 4 修改并迁移
22 tmrb spmd_multiblock_mix (example) default examples/a5/tensormap_and_ringbuffer/spmd_multiblock_mix/ Y N/A 0/0 24 4 修改并迁移
23 tmrb spmd_starvation (example) default examples/a5/tensormap_and_ringbuffer/spmd_starvation/ Y N/A 0/0 24 4 修改并迁移
24 tmrb spmd_sync_start (example) default examples/a5/tensormap_and_ringbuffer/spmd_sync_start/ Y N/A 0/0 24 4 修改并迁移
25 tmrb spmd_sync_start_aiv (example) default examples/a5/tensormap_and_ringbuffer/spmd_sync_start_aiv/ Y N/A 0/0 24 4 修改并迁移
26 tmrb spmd_sync_start_edge (example) default examples/a5/tensormap_and_ringbuffer/spmd_sync_start_edge/ Y N/A 0/0 24 4 修改并迁移
27 tmrb spmd_sync_start_stress (example) default examples/a5/tensormap_and_ringbuffer/spmd_sync_start_stress/ Y N/A 0/0 24 4 修改并迁移

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces production-scale paged attention support for the A5 platform, refactoring kernels to use bfloat16 and implementing runtime dispatch for various tile configurations. It also adds a comprehensive suite of SPMD and mixed-core execution tests. Feedback highlights a critical data race in the orchestration logic due to improper scope guard usage, potential compilation failures on x86 simulation environments from ARM-specific assembly, and a regression in Grouped Query Attention (GQA) support. Additionally, improvements were suggested regarding test reproducibility through manual seeding and more accurate profiling by reading system counter frequency at runtime.

Comment thread examples/a5/tensormap_and_ringbuffer/bgemm/test_bgemm.py
@doraemonmj doraemonmj force-pushed the pytest branch 4 times, most recently from 7263b0b to 8613843 Compare April 17, 2026 03:26
@doraemonmj doraemonmj changed the title [WIP] Refactor: migrate A5 examples and tests to SceneTestCase format Refactor: migrate A5 examples and tests to SceneTestCase format Apr 17, 2026
majin0824 added 3 commits April 17, 2026 14:33
- Replace golden.py + kernel_config.py with unified test_*.py files
  using @scene_test decorator and SceneTestCase base class
- Covers examples/a5/{host_build_graph,tensormap_and_ringbuffer} (14 examples)
  and tests/st/a5/{host_build_graph,tensormap_and_ringbuffer} (3 tests)
- Add a5sim to platforms for all cases that support simulation
- Cross-directory kernel references use relative paths (../)
…d attention

- Move spmd_*, mixed_example from examples/tmr/ to tests/st/tmr/
- Remove duplicate HBG paged_attention from examples/ (already in tests/st/)
- Remove old TMR paged_attention from tests/st/ (kept in examples/ as evolving reference)
- Upgrade TMR paged_attention: fp16 -> bfloat16, multi-tile dispatch (16x128, 64x64),
  production-scale cases (batch=256, head_dim=128/256), tighter tolerances (1e-3)
- Add small-tile (16,16,16) dispatch path to HBG paged_attention kernels
  with SmallCase1/SmallCase2 sim-compatible test cases
… migration process

- During the previous use case migration process, some kernels lacked the definition of function names.

- This submission has completed the missing names in the aic and aiv modules of test_*.py to maintain the integrity and consistency of the code.
@doraemonmj doraemonmj force-pushed the pytest branch 3 times, most recently from 68d1e36 to e56a0e3 Compare April 17, 2026 07:38
- Delete examples/a2a3/bgemm (fixed-config), move benchmark_bgemm
  from tests/st to examples/a2a3 with a Bgemm64 case covering the
  old example config (tile=64, grid_k=4, block_dim=3)
- Add platform guards for aarch64 timer asm in a5 paged_attention
  orchestration files (mrs cntvct_el0 → rdtsc on x86_64)
@ChaoWao ChaoWao merged commit 8b8ea90 into hw-native-sys:main Apr 17, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants