contrib: openipc-bisect — host-side firmware bisect driver#2117
Merged
Conversation
POSIX sh + jq + ssh driver that performs a binary search across dated nightly builds via #2114's sysupgrade --build=<id>. State lives on the workstation in \$XDG_STATE_HOME/openipc/bisect/<host>.json so a brick mid-bisect (UART recovery required, per kaeru 'uart-recovery-via-uboot-tftp-recipe') cannot lose progress — recover the camera by any means and 'openipc-bisect bad' (or 'skip') resumes the loop. Subcommands: start <host> [--good=<id|sha>] [--bad=<id|sha>] [--platform=<id>] good | bad | skip — mark current candidate; flash next median status — window size, verdicts, rounds remaining reset — flash back to channels.nightly, clear state resume — re-attach after host restart/disconnect Defaults: --bad → channels.nightly (current rolling tip) --good → oldest build in the manifest window for this platform --platform → autodetected from the camera's fw_printenv soc + /etc/os-release BUILD_OPTION Ref normalisation accepts: exact build_id, short sha (matches the trailing -<short> on build_id), full sha (matches manifest .sha), and 'channels.{nightly,latest}' keywords. Verified locally: live manifest at https://openipc.github.io/firmware/ manifest.json parses; resolve_channel, builds_for_platform, and normalize_ref all return expected results against the real schema published by #2112. End-to-end loop will become exercisable once at least 2 dated nightlies exist (currently only nightly-20260520-887328c on master). PR-D of six. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 21, 2026
widgetii
added a commit
that referenced
this pull request
May 23, 2026
… run Four bugs surfaced when running the first real convergence loop on openipc-hi3520dv200.dlab.torturelabs.com (4-build window) the morning after PR #2117 landed. None of them would have been caught by the jq-against-static-manifest dry-runs done at PR time; they only emerge under real flash+reboot cycles. Fixes: 1. **`status` had a jq syntax error.** `(log/log(2)) | floor + 1` — jq has no `log` function (that's Python's math module). Status crashed at the JSON-construction step. Fix: compute ceil(log2(window_size)) in awk before invoking jq and pass via --argjson. 2. **`pick_next` returned "" when 1 unverified candidate remained.** Threshold was `<= 1` instead of `== 0`. A real bisect with the wrong verdict cadence would terminate early and miss the last build that needed testing. Threshold corrected to `== 0`; with 1 unverified the index math `length / 2 | floor` correctly returns 0, selecting the lone unverified build. 3. **SSH lacked ServerAliveInterval / ServerAliveCountMax.** When sysupgrade reboots the camera, dropbear is killed without a graceful TCP close. The host's `ssh root@$host "sysupgrade ..."` in remote_flash() then sat on a zombie TCP connection until kernel keepalive (~2 hours) — `iterate()` never reached `wait_for_camera`. Added `-o ServerAliveInterval=15 -o ServerAliveCountMax=3` to the default SSH_OPTS so the host detects the dead session in ~45s and the iteration progresses normally. 4. **`start <host>` rejected `root@host`.** The contract was bare hostname (the script always SSHes as root), but the form everyone reaches for in OpenIPC docs — including the wiki article shipped alongside the original PR — is `root@host`. Now strips a leading `user@` prefix in cmd_start before everything downstream. End-to-end test that found these (2026-05-23 on openipc-hi3520dv200.dlab.torturelabs.com, 4-build window): * start picked nightly-20260522-7d32f00 (median) → camera reboot → UART noise interrupted u-boot autoboot → camera stuck at u-boot prompt → host process killed manually → user recovered camera via UART. State file on host stayed intact across the brick. After recovery, `openipc-bisect resume` correctly re-attached and prompted for verdict — exactly the brick-survivability promise. * `good` verdict narrowed window to a single element and printed "Bisect complete. First bad build: nightly-20260523-7a2c1b3". After these fixes the next end-to-end run (5+ builds in manifest) should be hands-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-D of six in the nightly-build redesign. Adds
contrib/openipc-bisect, a host-side POSIX shell driver that performs a binary search across dated nightly builds using #2114'ssysupgrade --build=<id>.Key design property: state lives on the workstation, never on the camera. A brick mid-bisect (kernel/driver regression that prevents boot → UART/TFTP recovery required, per kaeru
uart-recovery-via-uboot-tftp-recipe) cannot lose progress — recover the camera by any means andopenipc-bisect bad(orskip) resumes the loop.Subcommands
Defaults
--bad→channels.nightly(current rolling tip frommanifest.json)--good→ oldest build in the manifest window for this platform--platform→ autodetected from the camera'sfw_printenv soc+/etc/os-release BUILD_OPTIONRef normalization
Accepts: exact
build_id, short sha (matches trailing-<short>on build_id), full sha (matchesmanifest.builds[].sha), andchannels.{nightly,latest}keywords.Verified locally
sh -nclean.https://openipc.github.io/firmware/manifest.jsonparses.resolve_channel,builds_for_platform, andnormalize_refall return expected results against the real schema published by ci/nightly: manifest aggregator + 90-build retention sweep #2112.Caveats
--build=<id>strictly through the firmware manifest. fpv-variant cameras (whose firmware comes from OpenIPC/builder per kaeruopenipc-firmware-vs-builder-variant-split) won't be bisectable through this tool until a parallel manifest exists for that ecosystem.Dependencies
sh,jq,curl,ssh. No new rootfs dependency on the camera.Test plan
OPENIPC_BISECT_WAIT=180 contrib/openipc-bisect start openipc-hi3520dv200.dlab.torturelabs.com --good=<id-2-back>and convergence over 1-2 rounds with the expected window narrowing.
openipc-bisect resumere-attaches from~/.local/state/openipc/bisect/<host>.json.openipc-bisect badcontinues.openipc-bisect resetreturns the camera to the rolling nightly channel.🤖 Generated with Claude Code