docs(skills): device-log triage, DFX-tools skill, rebuild footgun#1163
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR updates internal guidance for AICPU logging, per-run device-log isolation, 507018 triage, DFX analysis tooling, and rebuild verification in multi-repo setups. ChangesRuntime diagnostics and rebuild guidance
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the Claude rules and skills documentation. It adds guidelines against logging on AICPU hot paths, explains how to redirect device logs and triage 507018 errors, introduces a new skill for analyzing DFX data using built-in tools, and documents how to handle runtime compilation issues during multi-repo setups. The review feedback suggests avoiding the use of angle brackets for placeholders in Markdown bash code blocks to prevent potential shell syntax errors or input redirection parsing, recommending safe, quoted, or standard shell variable placeholders instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.claude/skills/multi-repo-setup/SKILL.md (2)
144-144: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueQuote the
findoutput to handle paths with spaces.The
for d in $(find ...)construct will word-split on spaces. While typical venv paths don't contain spaces, quoting withwhile IFS= read -r dormapfileis more robust:find .venv build/lib -path "*onboard*tensormap_and_ringbuffer*$(basename "$SO")" -print0 | \ while IFS= read -r -d '' d; do cp "$SO" "$d"; done🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/multi-repo-setup/SKILL.md at line 144, The copy loop in the multi-repo setup script uses command substitution with find output, which can word-split paths that contain spaces. Update the shell logic around the find/cp loop to use a safe iterator such as a null-delimited find pipeline with while IFS= read -r -d '' or mapfile, and keep the cp step inside that loop so each path from the find results is handled intact.
141-146: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueClarify the
{arch}placeholder in the build directory path.The
BDvariable uses{arch}which isn't defined in this snippet. Readers need to know this is a placeholder (e.g.,aarch64orx86_64) to construct a valid path. Consider adding a brief note or example.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/multi-repo-setup/SKILL.md around lines 141 - 146, Clarify the undefined {arch} placeholder in the BD build path example by adding a brief note or inline example in the same section, using the surrounding build steps (BD, SO, and the cmake/cp commands) to show that {arch} should be replaced with a real target architecture such as aarch64 or x86_64. Keep the instruction close to the existing shell snippet so readers can construct the correct build directory path without guessing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.claude/skills/multi-repo-setup/SKILL.md:
- Line 144: The copy loop in the multi-repo setup script uses command
substitution with find output, which can word-split paths that contain spaces.
Update the shell logic around the find/cp loop to use a safe iterator such as a
null-delimited find pipeline with while IFS= read -r -d '' or mapfile, and keep
the cp step inside that loop so each path from the find results is handled
intact.
- Around line 141-146: Clarify the undefined {arch} placeholder in the BD build
path example by adding a brief note or inline example in the same section, using
the surrounding build steps (BD, SO, and the cmake/cp commands) to show that
{arch} should be replaced with a real target architecture such as aarch64 or
x86_64. Keep the instruction close to the existing shell snippet so readers can
construct the correct build directory path without guessing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dfa20061-91b4-4993-b9d7-1070fd47a1cb
📒 Files selected for processing (4)
.claude/rules/codestyle.md.claude/rules/running-onboard.md.claude/skills/dfx-analyze/SKILL.md.claude/skills/multi-repo-setup/SKILL.md
…tgun - running-onboard: redirect device logs via ASCEND_PROCESS_LOG_PATH (export == --env); add a 507018 triage table (deadlock-detect vs SPIN-timeout vs OS op-timeout vs forward-progress stall) so a generic host 507018 is classified from the device log. - new dfx-analyze skill: reach for simpler's built-in DFX tools (device_log_timing, swimlane_converter, sched_overhead_analysis, deps_viewer, dump_viewer) instead of hand-rolling timing/instrumentation in the runtime. - codestyle rule hw-native-sys#7: never log on AICPU hot paths (floods device_log -> op-timeout, masking the behavior under study). - multi-repo-setup: non-editable reinstall can silently skip the runtime .so rebuild; verify the built .so changed or cmake --build the cache + sync to both load locations.
02b2854 to
e393623
Compare
Summary
Doc/skill-only changes capturing onboard-debugging lessons (no code paths touched):
.claude/rules/running-onboard.md— redirect the AICPU/CCECPU device log outof the shared
~/ascend/log/debug/viaASCEND_PROCESS_LOG_PATH(export≡--env); add a507018triage table so the generic host code is classifiedfrom the device log: deadlock-detect (
Task Allocator Deadlock) vs SPIN-timeout(
Timeout (N cycles)) vs OS op-timeout (HandleTaskTimeout, 3s) vsforward-progress stall (
log_stall_diagnostics). "3s kill ≠ deadlock"..claude/skills/dfx-analyze/SKILL.md(new) — reach for simpler's built-in DFXtools (
device_log_timing,swimlane_converter,sched_overhead_analysis,deps_viewer,dump_viewer) instead of hand-rolling timing/instrumentation in theruntime; points at
simpler_setup/tools/README.md..claude/rules/codestyle.md— rule Simplify AicpuExecutor API and unify naming conventions #7: never log on AICPU hot paths (floodsdevice_log→ trips the op-timeout, masking the behavior under study)..claude/skills/multi-repo-setup/SKILL.md— footgun: a non-editablepip install ./--force-reinstallcan silently skip the runtime.sorebuild;verify the built
.sochanged orcmake --buildthe cache + sync both loadlocations.
Testing