Fix rapid binary logs (IDFGH-17270) by jthoward64-lexcelon · Pull Request #36 · espressif/esp-idf-monitor

jthoward64-lexcelon · 2026-02-20T14:05:53Z

Description

I've been using binary logging for the massive stack use reduction, but I've found that when a lot of logs come in at once, it causes monitor to get out of sync and just spam the console with binary garbage until it possible recovers when it syncs up again. This PR allows the binary log find_frames method to try and gracefully recover (or at least not lose track altogether) from a miss.

Before:

I (303) [REDACTED]
I (303) [REDACTED]
I (303) [REDACTED]
I (303) [REDACTED]
I (304) [REDACTED]
I (304) [REDACTED]
.�<��0�
        D<�0<�2<�s<�jU
                      <�}0<��
                             7<�1<�Uz
                                     A�<�n1p.�<���.�<��2        d
                                                                 �<�2<�U�
                                                                         <�}2<��?
                                                                                 7<�2<��a+P<�U2<�0(�$ <��2<��<
                                                                                                              ,�#�<��3<�xL<
                                                                                                                           ,
                                                                                                                            Ep<��3<����<�53<
                                                                                                                                            a�
                                                                                                                                              ��<�53�.�<��3
�.�<��4
       �
         D<�4<��<�h<�jv
                       �<�4<��<�j�
                                  B<�n4�.�<��4

After:

I (302) [REDACTED]
I (303) [REDACTED]
I (303) [REDACTED]
I (303) [REDACTED]
I (303) [REDACTED]
I (304) [REDACTED]
I (304) [REDACTED]
E (304) [REDACTED]
I (304) [REDACTED]
I (304) [REDACTED]
I (305) [REDACTED]
I (305) [REDACTED]
E (305) [REDACTED]
E (306) [REDACTED]
I (306) [REDACTED]
I (306) [REDACTED]
I (306) [REDACTED]
E (306) [REDACTED]
E (306) [REDACTED]
E (307) [REDACTED]
I (307) [REDACTED]
W (307) [REDACTED]
I (307) [REDACTED]

Both of those are the same firmware, without this change the logger is useless after 304, with it, it continues logging without issue.

Testing

Tested with my current project in an ESP-IDF 6 docker container

Checklist

Before submitting a Pull Request, please ensure the following:

🚨 This PR does not introduce breaking changes.
All CI checks (GH Actions) pass.
Documentation is updated as needed.
Tests are updated or added as necessary.
Code is well-commented, especially in complex areas.
Git history is clean — commits are squashed to the minimum necessary.

github-actions · 2026-02-20T14:06:30Z

	Messages
📖	🎉 Good Job! All checks are passing!

👋 Hello jthoward64-lexcelon, we appreciate your contribution to this project!

Click to see more instructions ...

This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...

We do welcome contributions in the form of bug reports, feature requests and pull requests.

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
4. If the change is approved and passes the tests it is merged into the default branch.

Generated by 🚫 dangerJS against ac5f499

peterdragun · 2026-02-20T14:15:36Z

Hi @jthoward64-lexcelon, thank you for contributing. By any chance, would you have an app that I could use to reproduce this issue?

Copilot

Pull request overview

Improves robustness of ESP-IDF Monitor’s binary log decoding during high-throughput bursts by making frame detection re-sync instead of bailing out on the first mismatch, reducing the chance of the console getting “stuck” printing binary garbage.

Changes:

Update binary log frame scanning to skip bytes on CRC/control-structure mismatches and keep searching for the next valid frame.
Simplify binary-log handling in the serial input path (removes fallback behavior on decoding errors).
Adjust binary log message encoding to use default string encoding when producing output bytes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
esp_idf_monitor/base/serial_handler.py	Removes the `ValueError` fallback around binary log decoding and always returns after attempting binlog conversion.
esp_idf_monitor/base/binlog.py	Changes `find_frames` to re-sync by scanning forward on invalid CRC/control parsing, and tweaks message encoding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

esp_idf_monitor/base/serial_handler.py

esp_idf_monitor/base/binlog.py

jthoward64-lexcelon · 2026-02-20T15:26:16Z

Hi @jthoward64-lexcelon, thank you for contributing. By any chance, would you have an app that I could use to reproduce this issue?

No unfortunately all my ESP-IDF repos are proprietary, but I can reliably reproduce the issue by enabling V2 binary logs and either logging very long lines or very rapidly

peterdragun · 2026-02-23T16:19:31Z

@jthoward64-lexcelon I am sorry, but I am struggling to reproduce the issue. Without that, it is hard to evaluate if the code is correct and covers all the corner cases - it would also be nice to include some tests to avoid having such issues in the future.

Would you be able to create a minimal app that would be able to reproduce this issue without using any of your proprietary code? It would be helpful to also mention what your OS is, which chip you are using, or any other details which might be relevant (like using VM/WSL, etc.)

Here is the example I tried it on (using the latest IDF on ESP32-C5 v1.0, using the UART port, with binlog turned on for the app only):

#include <stdio.h>
#include "esp_log.h"
#include "esp_system.h"

static const char *TAG = "example";

void app_main(void)
{
    for (int i = 0; i < 1000; i++){
        ESP_LOGI(TAG, "Lorem ipsum dolor sit amet consectetur adipiscing elit. Quisque faucibus ex sapien vitae pellentesque sem placerat. In id cursus mi pretium tellus duis convallis. Tempus leo eu aenean sed diam urna tempor. Pulvinar vivamus fringilla lacus nec metus bibendum egestas. Iaculis massa nisl malesuada lacinia integer nunc posuere. Ut hendrerit semper vel class aptent taciti sociosqu. Ad litora torquent per conubia nostra inceptos himenaeos.Lorem ipsum dolor sit amet consectetur adipiscing elit. Quisque faucibus ex sapien vitae pellentesque sem placerat. In id cursus mi pretium tellus duis convallis. Tempus leo eu aenean sed diam urna tempor. Pulvinar vivamus fringilla lacus nec metus bibendum egestas. Iaculis massa nisl malesuada lacinia integer nunc posuere. Ut hendrerit semper vel class aptent taciti sociosqu. Ad litora torquent per conubia nostra inceptos himenaeos.Lorem ipsum dolor sit amet consectetur adipiscing elit. Quisque faucibus ex sapien vitae pellentesque sem placerat. In id cursus mi pretium tellus duis convallis. Tempus leo eu aenean sed diam urna tempor. Pulvinar vivamus fringilla lacus nec metus bibendum egestas. Iaculis massa nisl malesuada lacinia integer nunc posuere. Ut hendrerit semper vel class aptent taciti sociosqu. Ad litora torquent per conubia nostra inceptos himenaeos.Lorem ipsum dolor sit amet consectetur adipiscing elit. Quisque faucibus ex sapien vitae pellentesque sem placerat. In id cursus mi pretium tellus duis convallis. Tempus leo eu aenean sed diam urna tempor. Pulvinar vivamus fringilla lacus nec metus bibendum egestas. Iaculis massa nisl malesuada lacinia integer nunc posuere. Ut hendrerit semper vel class aptent taciti sociosqu. Ad litora torquent per conubia nostra inceptos himenaeos.Lorem ipsum dolor sit amet consectetur adipiscing elit. Quisque faucibus ex sapien vitae pellentesque sem placerat. In id cursus mi pretium tellus duis convallis. Tempus leo eu aenean sed diam urna tempor. Pulvinar vivamus fringilla lacus nec metus bibendum egestas. Iaculis massa nisl malesuada lacinia integer nunc posuere. Ut hendrerit semper vel class aptent taciti sociosqu. Ad litora torquent per conubia nostra inceptos himenaeos.");
    }
    ESP_LOGI(TAG, "Restarting now.");
    fflush(stdout);
    esp_restart();
}

jthoward64-lexcelon · 2026-02-23T16:32:10Z

Yeah I'll throw something together. For reference I'm on IDF 6, ESP32-C3, USB-JTAG, Linux dev container

jthoward64-lexcelon · 2026-02-23T21:26:36Z

Ok, not quite the same presentation, but I don't have the time to replicate the exact cause in my codebase, but this repo has two examples that have the same underlying cause and are fixed in this PR. One reboots the esp32 mid-log which causes a message to get caught in the buffer

I (225) repro: msg 294 val=2058 ratio=0.980000 addr=0x498 flag=even extra=43755
I (225) repro: msg 295 val=2065 ratio=0.983333 addr=0x49c flag=odd extra=43754
I (226) repro: msg 296 val=2072 ratio=0.986667 addr=0x4a0 flag=even extra=43749
I (226) repro: msg 297 val=2079 ratio=0.990000 addr=0x4a4 flag=odd extra=43748
I (227) repro: msg 298 val=2086 ratio=0.993333 addr=0x4a8 flag=even extra=43751

,R<?
�-?��~K��<?���ESP-ROM:esp32c3-api1-20210207
Build:Feb  7 2021
rst:0xc (RTC_SW_CPU_RST),boot:0xd (SPI_FAST_FLASH_BOOT)
Saved PC:0x4038425e
--- 0x4038425e: esp_restart_noos at /opt/esp/idf/components/esp_system/port/soc/esp32c3/system_internal.c:115
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fcd5830,len:0x1400
load:0x403cbf10,len:0x1240
--- 0x403cbf10: esp_bootloader_get_description at /opt/esp/idf/components/esp_bootloader_format/esp_bootloader_desc.c:40
load:0x403ce710,len:0x2f64
--- 0x403ce710: esp_flash_encryption_enabled at /opt/esp/idf/components/bootloader_support/src/flash_encrypt.c:93
entry 0x403cbf1a
--- 0x403cbf1a: call_start_cpu0 at /opt/esp/idf/components/bootloader/subproject/main/bootloader_start.c:27
I (24) boot: ESP-IDF v6.0-beta1-1452-g48e7e0618d 2nd stage bootloader
I (24) boot: compile time Feb 20 2026 01:26:26
I (25) boot: chip revision: v0.4
I (25) boot: efuse block revision: v1.3
I (26) qio_mode: Enabling default flash chip QIO
I (26) boot.esp32c3: SPI Speed      : 80MHz
I (27) boot.esp32c3: SPI Mode       : QIO
I (27) boot.esp32c3: SPI Flash Size : 4MB

The second one (almost always around message 8000), just spams logs from multiple tasks all at once. This one takes longer than the first repro, but is much closer to how I see it in my codebase

                                          $6<? _�:�?�`M
                                                       $6<? _�:�?�
=�
  $6<? _�:�?�Q�p
                $6<? _�:�?ٙ��8
                             $6<? _�:�?��G�A
                                            $6<? _�:�?�(���
                                                           $6<? _�:�?�p��+
                                                                          $6<? _�:�?޸Q�
                                                                                       $6<? _�:�?��
                                                                                                   $6<? _�:�?��
                                                                                                               $6<? _�:��G� 
                                                                                                                            $6<? _�:�?�� Q
                                                                                                                                          $6<? _�:�?�\ <
                                                                                                                                                        $6<? _�:�?�33@
                                                                                                                                                                      $6<? _�:�?��
@�
  $6<? _�:� ?�z�@ 
                  $6<? _�:�!?��`!
                                 6<? _�:�"?�`"�
                                               $6<? _�:�#?�ff`#�
                                                                $6<? _�:�$?�
=�$
   $6<? _�:�%?��%&
                  $6<? _�:�&?�Q�&
                                 $6<? _�:�'?��'�
                                                $6<? _�:�(?陙�(J
                                                                $6<? _�:�)?�=p�)e
                                                                                 $6<? _�:�*?��G�*�
I (24574) repro: burst message 15044 value=44 ratio=0.8800 ptr=0x2c                               $6<? _�:�+?��+
I (24574) repro: burst message 15045 value=45 ratio=0.9000 ptr=0x2d
I (24575) repro: burst message 15046 value=46 ratio=0.9200 ptr=0x2e
I (24575) repro: burst message 15047 value=47 ratio=0.9400 ptr=0x2f
I (24575) repro: burst message 15048 value=48 ratio=0.9600 ptr=0x30
I (24575) repro: burst message 15049 value=49 ratio=0.9800 ptr=0x31

This one also crashes a bit (don't particularly care why tbh), but it works to show the issue (eventually)

And, while I can't share it, I have been using this patch in my own development for the past couple days and haven't seen a single desync, whereas it was usually very frequent.

jthoward64-lexcelon · 2026-03-01T21:14:53Z

Anything else I need to do to have this looked at? Or is that repro not working?

peterdragun · 2026-03-02T07:33:06Z

Yes, I managed to reproduce the issue (with a small hack, see below). Thank you for providing it.

Regarding the first log that you posted - I was able to reproduce it, but using your fix, I am missing the start of the boot log, see below:

I (100) repro: msg 299 val=2093 ratio=0.996667 addr=0x4ac flag=odd extra=43750
-api1-20210207
Build:Feb  7 2021
rst:0xc (RTC_SW_CPU_RST),boot:0xd (SPI_FAST_FLASH_BOOT)
Saved PC:0x40384418

Here, the second line should be: ESP-ROM:esp32c3-api1-20210207. IMO, missing data is a bigger issue than the fact that there is some undecoded/garbage data.

Regarding the second issue, I was struggling a bit to reproduce it. Looks like it highly depends on the used system. I have tried this on macOS and on two other colleagues' Linux machines. All the messages are always decoded correctly, even without this fix. In the end, I was able to reproduce this by adding a small delay (time.sleep(0.01)) right after read in serial_reader.py. (This should simulate your issue, where the read buffer is overflowing with the data coming from the chip)

I think if we solve issues like the one that I mentioned before, we would be able to accept this. So I would suggest printing the "garbage" data anyway, as it has been until now. The new logic regarding the resync command LGTM.

peterdragun · 2026-03-02T09:15:55Z

I have tried to solve the issue with missing text in the log. Now all the data that is not part of succesfull binary frame is printed (including some garbage bytes), please let me know if the following works for you as well:

diff --git a/esp_idf_monitor/base/binlog.py b/esp_idf_monitor/base/binlog.py
index efd7d5c413..9ec11465eb 100644
--- a/esp_idf_monitor/base/binlog.py
+++ b/esp_idf_monitor/base/binlog.py
@@ -239,9 +239,11 @@ class BinaryLog:
         which we reject so that scan-ahead continues instead of prematurely breaking."""
         return 1 <= control.level <= 5 and control.version == 0 and control.pkg_len >= 15
 
-    def find_frames(self, data: bytes) -> Tuple[List[bytes], bytes]:
+    def find_frames(self, data: bytes) -> Tuple[List[bytes], bytes, bytes]:
         frames: List[bytes] = []
+        leaked_chunks: List[bytes] = []
         i = 0
+        idx_after_last_frame = 0
         idx_partial_frame = None  # start index of a plausible but incomplete frame
 
         while i < len(data):
@@ -264,7 +266,11 @@ class BinaryLog:
                         continue
                     frame = data[start_idx : start_idx + control.pkg_len]
                     if control.pkg_len != 0 and self.crc8(frame) == 0:
+                        # Collect non-frame bytes before this frame
+                        if start_idx > idx_after_last_frame:
+                            leaked_chunks.append(data[idx_after_last_frame:start_idx])
                         frames.append(frame)
+                        idx_after_last_frame = start_idx + control.pkg_len
                         i += control.pkg_len - 1
                     else:
                         # CRC mismatch – skip this byte and try to re-sync
@@ -278,24 +284,29 @@ class BinaryLog:
 
             i += 1
 
-        # Decide what to carry forward for the next call:
-        # 1. Stopped early on a plausible partial frame → stash from that point so
-        #    the next read can complete it.
-        # 2. Broke at the < 15-byte minimum-length check and the remaining bytes
-        #    start with a binary frame marker → stash them for the same reason.
-        #    If they don't start with a marker they are noise/text and must not be
-        #    carried forward as binary data.
-        # 3. Everything else → return b'' so stale data doesn't keep the caller
-        #    locked in binary mode across a device reset.
+        # Trailing non-frame data (e.g. boot log text after last valid frame)
         if idx_partial_frame is not None:
-            return frames, data[idx_partial_frame:]
-        if i < len(data) and self.detected(data[i]):
-            return frames, data[i:]
-        return frames, b''
+            # Stopped on partial frame: leaked text is from last frame to partial start
+            if idx_partial_frame > idx_after_last_frame:
+                leaked_chunks.append(data[idx_after_last_frame:idx_partial_frame])
+            remaining = data[idx_partial_frame:]
+        elif i < len(data) and self.detected(data[i]):
+            # Remaining bytes start with binary marker; include up to there as leaked
+            if i > idx_after_last_frame:
+                leaked_chunks.append(data[idx_after_last_frame:i])
+            remaining = data[i:]
+        else:
+            # No partial frame; everything after last frame is non-binary
+            if len(data) > idx_after_last_frame:
+                leaked_chunks.append(data[idx_after_last_frame:])
+            remaining = b''
+
+        leaked_text = b''.join(leaked_chunks)
+        return frames, remaining, leaked_text
 
-    def convert_to_text(self, data: bytes) -> Tuple[List[bytes], bytes]:
+    def convert_to_text(self, data: bytes) -> Tuple[List[bytes], bytes, bytes]:
         messages: List[bytes] = []
-        frames, incomplete_fragment = self.find_frames(data)
+        frames, incomplete_fragment, leaked_text = self.find_frames(data)
         for pkg_msg in frames:
             elf_path = self.source_of_message(pkg_msg[0])
             msg = Message(self.debug, elf_path, pkg_msg)
@@ -303,7 +314,7 @@ class BinaryLog:
                 messages += self.format_buffer_message(msg)
             else:
                 messages.append(self.format_message(msg))
-        return messages, incomplete_fragment
+        return messages, incomplete_fragment, leaked_text
 
     def format_message(self, message: Message) -> bytes:
         try:
diff --git a/esp_idf_monitor/base/serial_handler.py b/esp_idf_monitor/base/serial_handler.py
index 606d39ef7d..79c01fdd16 100644
--- a/esp_idf_monitor/base/serial_handler.py
+++ b/esp_idf_monitor/base/serial_handler.py
@@ -204,11 +204,23 @@ class SerialHandler:
 
         sp = self.splitdata(data)
         if self.binary_log_detected:
-            text_lines, self._last_line_part = self.binlog.convert_to_text(sp[0])
+            text_lines, self._last_line_part, leaked_text = self.binlog.convert_to_text(sp[0])
             for line in text_lines:
                 self.print_colored(line)
                 self.logger.handle_possible_pc_address_in_line(line)
                 self.monitor_cmd_executor.execute_from_log_line(line)
+            if leaked_text:
+                leaked_lines = leaked_text.splitlines(keepends=True) or [b'']
+                if leaked_lines and not (
+                    leaked_lines[-1].endswith(b'\n') or leaked_lines[-1].endswith(b'\r')
+                ):
+                    incomplete = leaked_lines.pop()
+                    if not self._last_line_part:
+                        self._last_line_part = incomplete
+                for line in leaked_lines:
+                    self.print_colored(line)
+                    self.logger.handle_possible_pc_address_in_line(line)
+                    self.monitor_cmd_executor.execute_from_log_line(line)
             return
 
         for line in sp:

This essentially combines the old approach with the new logic that you added.

jthoward64-lexcelon · 2026-03-02T13:47:40Z

Sorry, on US time; thanks for the patch! And yes it does seem to work for me

Closes espressif#36

peterdragun · 2026-03-04T07:59:38Z

@jthoward64-lexcelon Thank you again for contributing and helping out with the reproducer. I had to change a couple of things because of our internal review process and release notes generation from commit messages, so the PR was marked as "closed", but the changes have been merged. Thanks for understanding

jthoward64-lexcelon · 2026-03-04T14:36:12Z

No worries, thank you! Any idea if this will get a release in the next couple weeks?

peterdragun · 2026-03-05T08:53:46Z

Not sure yet, I would like to include a couple of things in the next release that are not done yet, but I will try to do it at the end of March, maybe early April.

github-actions bot changed the title ~~Fix rapid binary logs~~ Fix rapid binary logs (IDFGH-17270) Feb 20, 2026

espressif-bot added the Status: Opened label Feb 20, 2026

espressif-bot assigned peterdragun Feb 20, 2026

peterdragun requested a review from Copilot February 20, 2026 14:23

Copilot started reviewing on behalf of peterdragun February 20, 2026 14:23 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

esp_idf_monitor/base/serial_handler.py Outdated Show resolved Hide resolved

esp_idf_monitor/base/binlog.py Show resolved Hide resolved

jthoward64-lexcelon force-pushed the fix-fast-binlog branch from 612c7bb to 0f29b80 Compare February 20, 2026 14:30

jthoward64-lexcelon marked this pull request as draft February 20, 2026 14:34

jthoward64-lexcelon force-pushed the fix-fast-binlog branch from 0f29b80 to 29f99bb Compare February 20, 2026 15:24

jthoward64-lexcelon marked this pull request as ready for review February 20, 2026 16:07

espressif-bot added Status: Selected for Development Status: In Progress and removed Status: Opened Status: Selected for Development labels Feb 23, 2026

espressif-bot added Status: Reviewing and removed Status: In Progress labels Mar 2, 2026

Joshua Tag Howard and others added 2 commits March 3, 2026 16:12

fix(binlog): Improve error handling and streamline binary log processing

143cfe2

Closes espressif#36

test(binlog): Add unit tests for BinaryLog class

ac5f499

peterdragun force-pushed the fix-fast-binlog branch from eba4761 to ac5f499 Compare March 3, 2026 15:13

espressif-bot closed this in 143cfe2 Mar 4, 2026

espressif-bot added Status: Done Resolution: NA Resolution: Done and removed Status: Reviewing Resolution: NA labels Mar 4, 2026

Conversation

jthoward64-lexcelon commented Feb 20, 2026

Description

Related

Testing

Checklist

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterdragun commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

jthoward64-lexcelon commented Feb 20, 2026

Uh oh!

peterdragun commented Feb 23, 2026

Uh oh!

jthoward64-lexcelon commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jthoward64-lexcelon commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jthoward64-lexcelon commented Mar 1, 2026

Uh oh!

peterdragun commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterdragun commented Mar 2, 2026

Uh oh!

jthoward64-lexcelon commented Mar 2, 2026

Uh oh!

peterdragun commented Mar 4, 2026

Uh oh!

jthoward64-lexcelon commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterdragun commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Feb 20, 2026 •

edited

Loading

jthoward64-lexcelon commented Feb 23, 2026 •

edited

Loading

jthoward64-lexcelon commented Feb 23, 2026 •

edited

Loading

peterdragun commented Mar 2, 2026 •

edited

Loading

jthoward64-lexcelon commented Mar 4, 2026 •

edited

Loading