Skip to content

Try to gracefully handle FLASH operations queue becoming full, instead of silently failing and allow OTA DFU process to be recoverable, in case of a communication error in the DUAL_BANK configuration#377

Merged
hathach merged 11 commits intoadafruit:masterfrom
ejtagle:master
Feb 2, 2026

Conversation

@ejtagle
Copy link
Contributor

@ejtagle ejtagle commented Jan 26, 2026

This pull request addresses 5 issues that were found while testing:

  1. When trying to queue a new FLASH erase or write operation, sometimes, depending on timing, the queue of operations was full, leading to silent discarding of the operation on 'release' builds, or even bootloader crashes or misbehaviour, if the discarded operation was to update the bootloader configuration.
    With this patch, if that condition is detected (the write/erase queue of commands was completely full), wait until space becomes available - This won't fix the case when BLE comms sends firmware packets faster that the bootloader is able to flash them (so, it is still required to use PRN=8), but makes the bootloader behave properly if uploading at fast speeds.

  2. The other thing that was fixed is an error introduced by github copilot (a suggested refactoring) that was incorrect: On the function dfu_bl_image_validate, a fix for switching from a dual_bank to a single_bank was introduced. Basically (https://devzone.nordicsemi.com/f/nordic-q-a/16590/dfu-change-bootloader-to-single-bank), the problem with updating the bootloader from a dual to single bank bootloader is that the new bootloader (single_bank) expects the firmware image to be at the start of BANK 0, while it's actually loaded to BANK1 by the old bootloader (double_bank). After the update the new single bank bootloader verifies that the image was correctly written to flash by comparing itself with the firmware image that it expects to be located in BANK0. Since the actual firmware image is in BANK 1 this check will fail and the new bootloader updates itself with whatever data located in BANK0 using the MBR, and that happens in a loop, preventing the user application to be launched, and the only possible recovery of that situation is by using a JTAG programmer.
    The fix is to modify the dfu_bl_image_validate() function run verification on both addresses (BANK0 and BANK1), and if ANY of both banks contains the bootloader, then avoid copying itself
    Originally, the code read:

return sd_mbr_command(&sd_mbr_cmd_1) && sd_mbr_command(&sd_mbr_cmd_2);

sd_mbr_command() will return 0 (=NRF_SUCCESS) if the comparision matches, and !=0 if it does not. So, any of both comparisons returning SUCCESS means the dfu_bl_image_validate() will return SUCCESS (=0) as we are dealing with a logical AND operation. Copilot wrongly refactored it as if both comparisons must be SUCCESSful for the function to return success.
This is also fixed by this patch

  1. When a DUAL_BANK bootloader is being used, and an OTA DFU is being carried, and the firmware transfer is interrupted or fails (for any reason), the bootloader is left in a "wait for more firmware packets" state. The only way to recover from such state, and attempt a new firmware transfer, is to send the reset command (that is exactly what the Nordic DFU does in this situation), and it is expected the bootloader to reenter the bootloader mode after a system reset: This exactly what happens in a SINGLE_BANK bootloader, as the main application firmware was erased, and there is no user application to jump to.
    But on the DUAL_BANK bootloader case, there is a user application (the last previously working one), so, without this patch, the bootloader was reseting the system and the user application was launched, thus preventing the Nordic DFU app to complete the firmware transfer. This specific case is now handled and the bootloader will reset and restart in OTA DFU mode, allowing the OTA DFU to complete successfully

  2. Allow the bootloader to default to OTA DFU instead of Serial/UF2 DFU (for field deployed devices where no user accessible USB ports are available) so devices are recoverable

  3. Clear pending interrupts and exceptions before launching the application. Sometimes (it is rare but it happens) with Softdevice 6.1.1, the bootloader jumps to the application and the device resets. This seems to be caused by a pending interrupt/exception ... The SDK12+ bootloader cleans them before jumping, make this bootloader do the same

  4. Add ability to use an external PA/LNA to increase BLE range (disabled by default, configurable from the board.h specific to the board targeted)

…t to bootloader to allow the DFU app to recover previously interrupted transfers, instead of reverting to the last working firmware and launching it
@ejtagle ejtagle changed the title Try to gracefully handle FLASH operations queue becoming full, instead of silently failing Try to gracefully handle FLASH operations queue becoming full, instead of silently failing and allow OTA DFU process to be recoverable, in case of error in the DUAL_BANK configuration Jan 28, 2026
@ejtagle ejtagle changed the title Try to gracefully handle FLASH operations queue becoming full, instead of silently failing and allow OTA DFU process to be recoverable, in case of error in the DUAL_BANK configuration Try to gracefully handle FLASH operations queue becoming full, instead of silently failing and allow OTA DFU process to be recoverable, in case of a communication error in the DUAL_BANK configuration Jan 28, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses four critical issues in the bootloader's OTA DFU functionality for nRF52 devices, focusing on improving reliability and recoverability during firmware updates.

Changes:

  • Fixed a spelling error (RECIEVED → RECEIVED) in variable names
  • Implemented graceful handling of full flash operation queues by adding retry logic with SOC event processing
  • Corrected the dual-to-single bank bootloader validation logic that was preventing proper bootloader updates
  • Added recovery mechanism for interrupted OTA DFU sessions in dual-bank configurations by tracking connection state and forcing re-entry into bootloader mode on reset

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/usb/uf2/uf2cfg.h Corrected spelling of BOOTLOADER_ADDR_NEW_RECEIVED macro
src/usb/msc_uf2.c Fixed spelling and initialized restart_into_bootloader flag
src/pstorage_platform.h Increased flash command queue size from 10 to 18 to reduce queue full conditions
src/main.c Added OTA connection tracking and bootloader re-entry logic for failed OTA DFU recovery
lib/sdk11/components/libraries/bootloader_dfu/dfu_types.h Added restart_into_bootloader field to dfu_update_status_t structure
lib/sdk11/components/libraries/bootloader_dfu/dfu_single_bank.c Added queue-full retry logic and fixed bootloader validation to check both banks
lib/sdk11/components/libraries/bootloader_dfu/dfu_dual_bank.c Added queue-full retry logic, fixed bootloader validation, and whitespace cleanup
lib/sdk11/components/libraries/bootloader_dfu/bootloader.h Added bootloader_must_reset_to_self() function declaration and proc_soc extern
lib/sdk11/components/libraries/bootloader_dfu/bootloader.c Implemented bootloader re-entry tracking and queue-full retry logic for settings save
Makefile Added DEFAULT_TO_OTA_DFU configuration option (with incorrect flag syntax)
CMakeLists.txt Added DEFAULT_TO_OTA_DFU CMake option with correct syntax

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ejtagle and others added 5 commits January 31, 2026 16:17
…o the user application. Was rarely causing locks
Yes, the '-D' was missing

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@hathach hathach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect thank you. I add some bracket to single if (I change style recently), also rename ota_was_active to ota_was_connected which is more closed. Everything else is great. Thank you again for an great follow up

@hathach hathach merged commit 583d67b into adafruit:master Feb 2, 2026
100 of 102 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants