Skip to content

DPUs in NIC mode shouldn't go into/get stuck in BFB provisioning states #2115

@chet

Description

@chet

As titled. I had fired off site-explorer copy-bfb-to-dpu-rshim to it as part of trying to do some firmware work on on the DPU. The problem is, in NIC mode, there's no Arm OS, so the copy to the rshim worked (like start_bfb_copy), BfbCopyInProgress worked, BfbPlatformPowercycle worked, but then BfbInstallationWait gets hung forever as part of check_dpu_console_install_complete, because it can't microcom into the Arm side to grep for what it's grepping for, lol (like it looks for "login:", `"Running bfb_post_install from bf.cfg", "total 100% complete", etc), and none of this ever works.

We need to make sure:

  • If a DPU is in NIC mode, don't even allow this to happen (so we don't even get into the state(s) to begin with).
  • If happen to fire this off on a machine that is in DPU mode, and while it's installing, someone flips it to NIC mode and reboots (also guilty of this), then we'd again be stuck in a state where BfbInstallationWait would fail, and we'd be stuck again, so the states themselves should be mode-aware.

The fix in this case was I just manually pushed the state in the DB to Completed to bump it along, to which I then ran into another bump in the road, but this is something we should address.

Metadata

Metadata

Assignees

Labels

bluefieldRelated to Bluefield provisioning / lifecyclehost ingestionIssue relating to ingestion or lifecycle of a host.

Type

No fields configured for Bug.

Projects

Status
Code Complete

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions