Skip to content

Disable CMD_REBOOT: prevents agent death on serial boot mode#15

Merged
widgetii merged 7 commits intomasterfrom
fix/disable-reboot
Mar 31, 2026
Merged

Disable CMD_REBOOT: prevents agent death on serial boot mode#15
widgetii merged 7 commits intomasterfrom
fix/disable-reboot

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

Watchdog reset re-enters bootrom when boot pin is set to serial download mode, killing the agent with no way to recover except physical power cycle.

  • Agent: CMD_REBOOT now returns ACK_FLASH_ERROR instead of triggering watchdog
  • Python: client.reboot() raises RuntimeError with explanation
  • Use selfupdate() to reload agent code without power cycle

Context

Discovered during flash dump verification — accidentally sent CMD_REBOOT, lost the agent, required power cycle to get it back.

Test plan

  • All CI checks pass locally
  • CI on PR

🤖 Generated with Claude Code

widgetii and others added 7 commits March 31, 2026 15:42
Watchdog reset re-enters bootrom when boot pin is set to serial,
requiring physical power cycle. Agent now rejects CMD_REBOOT with
ACK_FLASH_ERROR. Python client.reboot() raises RuntimeError.

Use selfupdate() to reload agent code without power cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After sf read, reads first 16 bytes via md.b and checks for garbage
patterns (all-zeros, all-0xFF, all-0x55). Raises RuntimeError early
instead of dumping megabytes of bad data.

This catches cases where sf probe wasn't run or sf read failed
silently, which produces uninitialized RAM content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The boot log may contain flash size info (e.g. "16MB") which lets
detect_flash() skip sf probe. But without sf probe in the current
U-Boot session, the SPI subsystem isn't initialized and sf read
silently writes nothing to RAM — producing a dump full of 0x55
(uninitialized DDR).

Fix: always run sf probe 0 before sf read, regardless of how
flash size was detected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Transport errors (serial disconnect, timeout) now caught and retried
up to 5 times per block instead of killing the entire dump. After
each error, sends empty line to recover U-Boot prompt.

Resume support: if output file already exists with partial data,
continues from the last complete 4KB block boundary instead of
starting over from scratch.

Fixes: 1.5-hour dump lost at 87% due to transient serial error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sf read timeout increased from 30s to 120s to handle slow SPI clocks.
Strict check for "Read: OK" in response — raises error instead of
proceeding with potentially incomplete RAM data.

Root cause of md.b dump corruption: sf read may have timed out before
completing, leaving RAM with partial flash data + uninitialized 0xFF
regions that md.b then dutifully dumped as "valid" data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 new tests with MockTransport covering:
- Sanity check catches all-0x55 and all-0xFF garbage
- Sanity check passes valid ARM vector header
- sf probe error raises RuntimeError
- sf read without "Read: OK" raises RuntimeError
- Resume from partial file skips completed blocks

34 total flashdump tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit e7fbb85 into master Mar 31, 2026
13 checks passed
@widgetii widgetii deleted the fix/disable-reboot branch March 31, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant