Huge improvement in stability,reliability and speed, restored dual_bank support and signed firmware verification#375
Conversation
…ompatibility with Nordic DFU apps) and WriteWithResponse (for improved reliability)
…completed operations to the DFU app are lost
… that was causing the DFU firmware update to sometimes crash
…ncrease DFU write speed
… (non OTA), use them instead of repeating code
…UT does nothing, in about 3 minutes, the bootloader will exit and start executing the user application
… firmware, so if a firmware update fails, the bootloader can revert to the previously working firmware)
…ader was installed. Without this patch, the SINGLE BANK bootloader will end in a continuous loop and no recovery without a JTAG programmer is possible
…e to restart in DFU mode
…ing DUAL_BANK=1 to the Makefile in the make command line
…lt), the bootloader will reject and not flash any firmware that is not digitally signed with the proper key
dhalbert
left a comment
There was a problem hiding this comment.
Please don't substitute tabs for spaces in the C code. A lot of the new code is using tabs for indentation.
I saw that, I didn't know those tabs were "on purpose". Again, if you want to keep those tabs, i am fine with that also (In fact, i myself prefer tabs, but as most of the codebase was using spaces... ;) ) |
… but https://stackoverflow.com/questions/32588325/ios-bluetooth-le-code-6-the-connection-has-timed-out-unexpectedly suggests this is the best to solve the ConnectionTimedOutUnexpectedly on IOS
|
this is big patch handling multiple unrelated issues, it would be better to submit each one separately so they are smaller and could be possibly merged separately(?)
BTW I am not maintainer of this project so this is just a suggestion, feel free to ignore it |
I thought about that, but there are dependencies between patches, and reviewing them one by one, means opening a pull request after the previous one was merged. That means probably more than a year in time of reviews? ... Are you willing to spend/wait that time ? .. ;) Besides, my repo is also available, in "open source" projects, the risk is that a folk could become the main repository, if it fixes problems the previous main repository does not fix, and does not introduce new bugs. I truly want to contribute to the main repo, don't want to be the "new unofficial" arduino bootloader repo for the nrf52! |
There was a problem hiding this comment.
Pull request overview
This PR significantly improves Bluetooth DFU reliability and adds support for dual-bank firmware updates and signed firmware verification. The changes fix a critical race condition in the Nordic bootloader that caused random flash update failures, allowing higher MTU values and improving transfer speeds.
Changes:
- Fixed race condition in Nordic bootloader causing flash update failures
- Increased MTU from 20 to 247 and HCI RX buffer queue size from 8 to 16
- Added dual-bank and signed firmware support (both disabled by default)
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/usb/usb_desc.c | Conditionally compiles MSC/UF2 descriptors based on SIGNED_FW flag |
| src/usb/usb.c | Conditionally initializes UF2 module when not using signed firmware |
| src/usb/tusb_config.h | Conditionally enables MSC device class |
| src/sdk_config.h | Increases HCI RX buffer queue size from 8 to 16 |
| src/main.c | Increases BLE parameters (MTU, event length, queue sizes) and enables 2M PHY |
| src/flash_nrf5x.h | Updates function signatures for erase and write operations |
| src/flash_nrf5x.c | Refactors flash write to handle page boundaries correctly |
| src/dfu_init.c | Adds signed firmware verification using tinycrypt library |
| lib/tinycrypt | Adds tinycrypt submodule for cryptographic operations |
| lib/sdk11/components/libraries/bootloader_dfu/dfu_types.h | Adds dual-bank region definitions |
| lib/sdk11/components/libraries/bootloader_dfu/dfu_transport_ble.c | Implements latency management to prioritize flash writes |
| lib/sdk11/components/libraries/bootloader_dfu/dfu_single_bank.c | Adds DFU timeout handling and refactors erase operations |
| lib/sdk11/components/libraries/bootloader_dfu/dfu_dual_bank.c | New file implementing dual-bank DFU support |
| lib/sdk11/components/libraries/bootloader_dfu/dfu_bank_internal.h | Adds DFU timeout interval definition |
| lib/sdk11/components/drivers_nrf/pstorage/pstorage_raw.c | Fixes race condition in flash operation handling |
| lib/sdk11/components/ble/ble_services/ble_dfu/ble_dfu.c | Retries notification sends on resource errors |
| README.md | Documents dual-bank and signed firmware features |
| Makefile | Adds build support for signed firmware and dual-bank options |
| .gitmodules | Adds tinycrypt submodule |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/sdk11/components/libraries/bootloader_dfu/dfu_single_bank.c
Outdated
Show resolved
Hide resolved
lib/sdk11/components/libraries/bootloader_dfu/dfu_transport_ble.c
Outdated
Show resolved
Hide resolved
|
Hm, it has been a while and Nordic may update their nrf dfu app. I couldn't dfu with latest ios and android app both with this pr and current tip of master. I may miss/forgot something. @ejtagle could you provide a bit more on your test setup especially os and app version PS: I see we need to change the DFU app's Number of Packet to something small, seems to be |
Yes, PRN must be set to 8 or less. The Main reason is that writing to flash is handled asynchronously, so most newer phones are able to send data faster than the bootloader is able to write them: eventually the bootloader runs out of buffers, and flashing fails. Enabling prn=8 forces the Mobile app to wait until data is actually written to flash |
well, not exactly, when nrf52 is writing to flash the cpu is completely halted including interrupts The CPU is halted if the CPU executes code from the flash while the NVMC is writing to the flash
softdevice is scheduling flash writes only when it would not collide with BLE timing, so increasing MTU, making connection interval shorter,allowing more packets per interval and otherwise tuning BLE for speed is counterproductive in this case. So to allow more stable experience even without enabling notifications it would be probably better to make BLE timings more conservative and find good balance betwen flash and BLE speed EDIT: notifications add even more BLE traffic so there is even less spare time for flash writes, so it would be best if PRN could be turned off, then the upload speed could be better |
You are absolutely right, and at the same time, wrong... Trust me, i've tried what you say, and was unable to increasre speed: Let me try to explain ... -Execution is halted while writing/erasing flash: TRUE Why: because what softdevice schedules is the call to write a block of flash, not each byte, each block. If the block is bigger, then there are less blocks to write, so transferring the whole firmware image takes less blocks and writing is much faster. Regarding PRN... Yes, notificaciones take bandwidth, so they slowdown firmware Transfer. BUT writes are asynchronous to the reception of packets, there is no way to notify the Mobile app that the write has completed (besides the PRN notification), so we end with the same problem... We need to límit the write speed and the PRN is the only mechanism allowed by the nordic DFU protocol... Each packet write ends calling dfu_data_pkt_handle , and that function calls pstorage_store... That schedules a flash write. So, the bigger the packets, the less writes to flash and the faster transfers we got |
that is not how the hardware works, writing is done by CPU in 32bit values one by one (and checking READY(NEXT) register before or after each write), there is no DMA that would do it for you with CPU being free to do something else = it cannot write 'blocks' that would be somehow faster like you say. If you send bigger block softdevice splits it internally into smaller parts according to timings so it can be sure it will not miss any important interrupt or event when CPU is halted by writing. So the speed of flash writing has some upper limit (see tWRITE timing in docs linked above) and is slowed down only by softdevice deciding that it is not safe to write next 32bit value now. The scheduling of flash writes internally is similar to how softdevice Radio Timeslot API works. In fact it would be possible to do the flash writing ourselves by our custom code inside timeslot api callback since the code there has full access to all hardware and can poke NVMC registers directly. In ideal state when softdevice is not bothered by BLE restrictions, the CPU writing speed into the flash is constant no matter how big or small 'blocks' are and so it does not make sense to accept data much faster. Even if you have some buffers in RAM, they will fill up and you need to wait for writing them in the end anyway so total time before the device flashing is done and it can safely reboot into new firmware will be the same.
You can also limit writing speed by configuring MTU and connection interval so the phone is simply not able to send the data much faster than the device is able to write into flash. |
|
There is nice summary of the flash writing issue here https://docs.nordicsemi.com/bundle/sds_s140/page/SDS/s1xx/flash_mem_api/flash_mem_api.html |
So, basically, the only thing that the softdevice must do is to calculate how much time will be needed to write / erase data to flash, and if halting the cpu for that time would interfere with the scheduling of reception/transmission of ble packets, and if that happens, split writes into several parts... Erases can't be split, but writes can be (and the bootloader does not do erase while receiving firmware packets, it erases the full flash before that. |
| */ | ||
| void pstorage_sys_event_handler(uint32_t sys_evt) | ||
| { | ||
| // Do not bother processing events we are not interested in |
There was a problem hiding this comment.
out of curiosity, is this the main race condition that causes the crash sometime while DFUing ?
hathach
left a comment
There was a problem hiding this comment.
perfect !! Thank you very much, I don't mind adding dual bank + signed firmware support as long as it is disabled by default since keep the current behavior intact. The rest changes look great, It is still not clear to me which part is the race condition (please elaborate for the reference), but it surely got resolved. And I am happy to merge this when ci passed.
PS: I made a typo fix along with other clean up, update readme to state PRN <= 8 and other minor clean up. Also fix the dfu_bl_image_validate() return value as well.
actually for 52840 they can, they cared to add it as well as READYNEXT for faster writing, 52832 did not have it, so let's hope the implemented both in S140 to ease the pain of flash writing :-) EDIT: oh, the READYNEXT is broken when running from flash and won't help anyway so it is not used https://docs.nordicsemi.com/bundle/errata_nRF52840_Rev2/page/ERR/nRF52840/Rev2/latest/anomaly_840_233.html |
This pull request fixes a race condition that was present in the original Nordic bootloader code that caused FLASH update failures randomly, and was the main culprit of the forced MTU=20 in the code.
Also, readded the DUAL BANK support (disabled by default) and SIGNED FIRMWARE verification (also disabled by default)
Without those fixes, it was very easy to crash the DFU process (tested on both android and ios)... 1 of 10 times it failed.
With this patch, it was impossible to crash (tested for about 1 month of daily/hourly updates, more than 300 updates, not a single crash!)
To be precise, when i say update, i mean Bluetooth DFU update
Hope this will be accepted! - Any doubts, just ask!