Skip to content

docs(skills): device-log triage, DFX-tools skill, rebuild footgun#1163

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:docs/onboard-dfx-skills
Jun 26, 2026
Merged

docs(skills): device-log triage, DFX-tools skill, rebuild footgun#1163
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:docs/onboard-dfx-skills

Conversation

@ChaoWao

@ChaoWao ChaoWao commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Doc/skill-only changes capturing onboard-debugging lessons (no code paths touched):

  • .claude/rules/running-onboard.md — redirect the AICPU/CCECPU device log out
    of the shared ~/ascend/log/debug/ via ASCEND_PROCESS_LOG_PATH (export
    --env); add a 507018 triage table so the generic host code is classified
    from the device log: deadlock-detect (Task Allocator Deadlock) vs SPIN-timeout
    (Timeout (N cycles)) vs OS op-timeout (HandleTaskTimeout, 3s) vs
    forward-progress stall (log_stall_diagnostics). "3s kill ≠ deadlock".
  • .claude/skills/dfx-analyze/SKILL.md (new) — reach for simpler's built-in DFX
    tools (device_log_timing, swimlane_converter, sched_overhead_analysis,
    deps_viewer, dump_viewer) instead of hand-rolling timing/instrumentation in the
    runtime; points at simpler_setup/tools/README.md.
  • .claude/rules/codestyle.md — rule Simplify AicpuExecutor API and unify naming conventions #7: never log on AICPU hot paths (floods
    device_log → trips the op-timeout, masking the behavior under study).
  • .claude/skills/multi-repo-setup/SKILL.md — footgun: a non-editable
    pip install . / --force-reinstall can silently skip the runtime .so rebuild;
    verify the built .so changed or cmake --build the cache + sync both load
    locations.

Testing

  • Documentation/skill files only — no build, sim, or hardware impact.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c34776f1-70e5-43c0-95f4-c65a6942ccab

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR updates internal guidance for AICPU logging, per-run device-log isolation, 507018 triage, DFX analysis tooling, and rebuild verification in multi-repo setups.

Changes

Runtime diagnostics and rebuild guidance

Layer / File(s) Summary
AICPU logging rule
.claude/rules/codestyle.md
Adds rule #7 stating that AICPU hot paths must avoid unconditional logging and must gate diagnostics.
Device logs and 507018 triage
.claude/rules/running-onboard.md
Adds ASCEND_PROCESS_LOG_PATH guidance, a 507018 signature-based triage table, a new anti-pattern, and a quick reference for isolated device logs.
DFX analysis guide
.claude/skills/dfx-analyze/SKILL.md
Adds the dfx-analyze skill page with tool-selection guidance, prerequisite flags, and device-log/DFX artifact location notes.
Multi-repo rebuild verification
.claude/skills/multi-repo-setup/SKILL.md
Adds a rebuild-and-copy workflow for re-editing Simple C++ and verifying the runtime .so used by multi-repo installs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • hw-native-sys/simpler#987 — Rebuild/verification workflow for re-editing Simple C++ overlaps with the multi-repo setup guidance added here.
  • hw-native-sys/simpler#990 — Also updates 507018 triage and task-submit isolation guidance, matching the new device-log-based troubleshooting flow.
  • hw-native-sys/simpler#1055 — Revises the same multi-repo setup and binary-skew handling, including rebuild and load-location details.

Poem

🐰 I hop where the device logs glow,
and keep the hot-path chatter low.
One run, one path, one tidy trace,
with DFX tools in their proper place.
Fresh .so now hops along just so.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the documentation-only changes across device-log triage, DFX tools, and rebuild guidance.
Description check ✅ Passed The description is clearly related to the documented onboarding and debugging changes in the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Claude rules and skills documentation. It adds guidelines against logging on AICPU hot paths, explains how to redirect device logs and triage 507018 errors, introduces a new skill for analyzing DFX data using built-in tools, and documents how to handle runtime compilation issues during multi-repo setups. The review feedback suggests avoiding the use of angle brackets for placeholders in Markdown bash code blocks to prevent potential shell syntax errors or input redirection parsing, recommending safe, quoted, or standard shell variable placeholders instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread .claude/rules/running-onboard.md
Comment thread .claude/rules/running-onboard.md
Comment thread .claude/skills/dfx-analyze/SKILL.md
Comment thread .claude/skills/multi-repo-setup/SKILL.md

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
.claude/skills/multi-repo-setup/SKILL.md (2)

144-144: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Quote the find output to handle paths with spaces.

The for d in $(find ...) construct will word-split on spaces. While typical venv paths don't contain spaces, quoting with while IFS= read -r d or mapfile is more robust:

find .venv build/lib -path "*onboard*tensormap_and_ringbuffer*$(basename "$SO")" -print0 | \
  while IFS= read -r -d '' d; do cp "$SO" "$d"; done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/multi-repo-setup/SKILL.md at line 144, The copy loop in the
multi-repo setup script uses command substitution with find output, which can
word-split paths that contain spaces. Update the shell logic around the find/cp
loop to use a safe iterator such as a null-delimited find pipeline with while
IFS= read -r -d '' or mapfile, and keep the cp step inside that loop so each
path from the find results is handled intact.

141-146: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Clarify the {arch} placeholder in the build directory path.

The BD variable uses {arch} which isn't defined in this snippet. Readers need to know this is a placeholder (e.g., aarch64 or x86_64) to construct a valid path. Consider adding a brief note or example.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/multi-repo-setup/SKILL.md around lines 141 - 146, Clarify the
undefined {arch} placeholder in the BD build path example by adding a brief note
or inline example in the same section, using the surrounding build steps (BD,
SO, and the cmake/cp commands) to show that {arch} should be replaced with a
real target architecture such as aarch64 or x86_64. Keep the instruction close
to the existing shell snippet so readers can construct the correct build
directory path without guessing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.claude/skills/multi-repo-setup/SKILL.md:
- Line 144: The copy loop in the multi-repo setup script uses command
substitution with find output, which can word-split paths that contain spaces.
Update the shell logic around the find/cp loop to use a safe iterator such as a
null-delimited find pipeline with while IFS= read -r -d '' or mapfile, and keep
the cp step inside that loop so each path from the find results is handled
intact.
- Around line 141-146: Clarify the undefined {arch} placeholder in the BD build
path example by adding a brief note or inline example in the same section, using
the surrounding build steps (BD, SO, and the cmake/cp commands) to show that
{arch} should be replaced with a real target architecture such as aarch64 or
x86_64. Keep the instruction close to the existing shell snippet so readers can
construct the correct build directory path without guessing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfa20061-91b4-4993-b9d7-1070fd47a1cb

📥 Commits

Reviewing files that changed from the base of the PR and between abc62d8 and 02b2854.

📒 Files selected for processing (4)
  • .claude/rules/codestyle.md
  • .claude/rules/running-onboard.md
  • .claude/skills/dfx-analyze/SKILL.md
  • .claude/skills/multi-repo-setup/SKILL.md

…tgun

- running-onboard: redirect device logs via ASCEND_PROCESS_LOG_PATH (export == --env);
  add a 507018 triage table (deadlock-detect vs SPIN-timeout vs OS op-timeout vs
  forward-progress stall) so a generic host 507018 is classified from the device log.
- new dfx-analyze skill: reach for simpler's built-in DFX tools (device_log_timing,
  swimlane_converter, sched_overhead_analysis, deps_viewer, dump_viewer) instead of
  hand-rolling timing/instrumentation in the runtime.
- codestyle rule hw-native-sys#7: never log on AICPU hot paths (floods device_log -> op-timeout,
  masking the behavior under study).
- multi-repo-setup: non-editable reinstall can silently skip the runtime .so rebuild;
  verify the built .so changed or cmake --build the cache + sync to both load locations.
@ChaoWao ChaoWao force-pushed the docs/onboard-dfx-skills branch from 02b2854 to e393623 Compare June 26, 2026 01:56
@ChaoWao ChaoWao merged commit 096fa4d into hw-native-sys:main Jun 26, 2026
14 checks passed
@ChaoWao ChaoWao deleted the docs/onboard-dfx-skills branch June 26, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant