Skip to content

in_node_exporter_metrics: Increase buffer size to read /proc/stat correctly#11253

Merged
edsiper merged 1 commit intofluent:masterfrom
piwai:master
Dec 5, 2025
Merged

in_node_exporter_metrics: Increase buffer size to read /proc/stat correctly#11253
edsiper merged 1 commit intofluent:masterfrom
piwai:master

Conversation

@piwai
Copy link
Contributor

@piwai piwai commented Dec 4, 2025

Hello,

for weeks we've been experiencing an issue with fluent-bit where the CPU metrics provided by node exporter like node_cpu_seconds_total would stall, for an unknown reason. Also, after a few days, the metrics would "unfreeze" and resume, and we really didn't understand why.

After recompiling and adding debug into fluent-bit, I managed to trace it down to an issue in the "ne_utils_file_read_lines()" function:

int ne_utils_file_read_lines(const char *mount, const char *path, struct mk_list *list)

This function uses a 512 bytes buffer to read lines, which is insufficient to read /proc/stat entries correctly. Indeed, the "intr" line can be larger.

Most of the time, this isn't an issue, except when the line being read has a length which is multiple of (buffer size -2), which cause the fgets() loop to produce an empty line, causing an error up in the call stack, and preventing the CPU metrics to be updated. But as soon as some interrupt counter reaches an additional digit (e.g 99 -> 100), the problem disappears since the line length will increase, and there will be a single character to read.

Sample /proc/stat file showing the issue, with intr line of 1020 chars (obtained from an Ubuntu 20.04 LTS):

$ cat proc_stat.txt
cpu  644441261 250278 193710764 6839564199 150758 0 21261544 4590423 0 0
cpu0 160903415 60997 48404023 1709556132 38581 0 7668248 1152730 0 0
cpu1 160732678 64155 48446004 1710342339 38188 0 5804330 1145684 0 0
cpu2 161798128 63702 48389798 1709541165 35120 0 4468764 1145420 0 0
cpu3 161007038 61422 48470938 1710124561 38867 0 3320201 1146587 0 0
intr 32324448099 29 9 0 0 0 0 3 0 1 0 9724876 32 15 0 0 19295503 0 0 0 0 0 0 0 0 0 2367 0 0 0 9209232 9791754 9997988 8338376 0 1001160064 916738961 0 410030264 436692148 0 652365018 480804600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 150886032456
btime 1745401938
processes 111586863
procs_running 1
procs_blocked 0
softirq 19224384590 0 1384271522 571488 1539519326 9648544 0 137410393 4010803922 21329 3552203474

Sample reproducer script, using a copy of ne_utils_file_read_lines() as main:

$ cat test.c
#include <stdio.h>
#include <string.h>


// copy of ne_utils_file_read_lines() in plugins/in_node_exporter_metrics/ne_utils.s
int main (int argc, char **argv) {
    
    int len;
    int ret;
    FILE *f;
    char line[512];
    // char real_path[2048];

    // mk_list_init(list);

    // /* Check the path starts with the mount point to prevent duplication. */
    // if (strncasecmp(path, mount, strlen(mount)) == 0 &&
    //     path[strlen(mount)] == '/') {
    //     mount = "";
    // }


    f = fopen(argv[1], "r");
    if (f == NULL) {
        printf("Cannot open %s\n", argv[1]);
        //flb_errno();
        return -1;
    }

    /* Read the content */
    while (fgets(line, sizeof(line) - 1, f)) {
        len = strlen(line);
        if (line[len - 1] == '\n') {
            line[--len] = 0;
            if (len && line[len - 1] == '\r') {
                line[--len] = 0;
            }
        }
        printf("line of %d bytes: %s \n", len, line);
        if (len == 0) {
            printf("!!!!!!!!! ERROR, line has no len, flb_slist_add will fail!!!!!!!!!!!!\n");
        }

        //ret = flb_slist_add(list, line);
        //if (ret == -1) {
        //    fclose(f);
        //    flb_slist_destroy(list);
        //    return -1;
        //}

    }
}

Sample output:

$ gcc test.c
$ ./a.out proc_stat.txt
line of 72 bytes: cpu  644441261 250278 193710764 6839564199 150758 0 21261544 4590423 0 0
line of 68 bytes: cpu0 160903415 60997 48404023 1709556132 38581 0 7668248 1152730 0 0
line of 68 bytes: cpu1 160732678 64155 48446004 1710342339 38188 0 5804330 1145684 0 0
line of 68 bytes: cpu2 161798128 63702 48389798 1709541165 35120 0 4468764 1145420 0 0
line of 68 bytes: cpu3 161007038 61422 48470938 1710124561 38867 0 3320201 1146587 0 0
line of 510 bytes: intr 32324448099 29 9 0 0 0 0 3 0 1 0 9724876 32 15 0 0 19295503 0 0 0 0 0 0 0 0 0 2367 0 0 0 9209232 9791754 9997988 8338376 0 1001160064 916738961 0 410030264 436692148 0 652365018 480804600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
line of 510 bytes:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
line of 0 bytes:
!!!!!!!!! ERROR, line has no len, flb_slist_add will fail!!!!!!!!!!!!
line of 17 bytes: ctxt 150886032456
line of 16 bytes: btime 1745401938
line of 19 bytes: processes 111586863
line of 15 bytes: procs_running 1
line of 15 bytes: procs_blocked 0
line of 98 bytes: softirq 19224384590 0 1384271522 571488 1539519326 9648544 0 137410393 4010803922 21329 3552203474

Proposed fix is to increase the readline buffer size from 512 to 2048 bytes (1024 seems a bit low given the size of the intr line which is 1020 chars)


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found (static buffer)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Increased the maximum supported line length for file reading, allowing the system to correctly handle longer lines from diverse data sources.
    • Improves compatibility and reduces failures/truncation when processing larger or unexpectedly long input lines.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

Walkthrough

In ne_utils_file_read_lines, the line-read buffer size was increased from 512 to 2048 bytes; file open, line reading, newline trimming, and appending behavior remain unchanged. No public API or function signatures were modified.

Changes

Cohort / File(s) Summary
Buffer expansion in file reader
plugins/in_node_exporter_metrics/ne_utils.c
Increased line-read buffer from 512 to 2048 bytes in ne_utils_file_read_lines; logic for opening, reading, trimming, and appending lines unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

  • Verify buffer allocation location (stack vs heap) and lifetime
  • Search for any code that assumes the previous 512-byte limit
  • Confirm 2048 bytes fits expected metric line sizes

Possibly related PRs

Poem

My nose twitches at the code,
I hop where buffers stretch and grow,
From tiny steps of five-one-two,
To roomy bounds of two-oh-four-oh,
Metrics munching, happy rabbit, go! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately describes the main change: increasing buffer size in the in_node_exporter_metrics plugin from 512 to 2048 bytes to fix /proc/stat reading issues.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
plugins/in_node_exporter_metrics/ne_utils.c (1)

175-190: Consider explicit handling of partial reads for future robustness.

The current logic adds each fgets result as a separate list entry. While the buffer increase makes partial reads unlikely, lines exceeding 2046 characters would still be split across multiple entries. For maximum robustness, consider:

  1. Accumulating chunks until a newline is found, or
  2. Explicitly skipping empty lines (len == 0) before calling flb_slist_add

Given that /proc/stat lines are unlikely to exceed 2KB in practice, this is purely a future-proofing suggestion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c88c545 and 7fc057f.

📒 Files selected for processing (1)
  • plugins/in_node_exporter_metrics/ne_utils.c (1 hunks)
🔇 Additional comments (1)
plugins/in_node_exporter_metrics/ne_utils.c (1)

156-156: LGTM! Buffer increase addresses the reported issue.

The buffer size increase from 512 to 2048 bytes directly fixes the bug where /proc/stat's "intr" line (~1020 characters) was causing intermittent stalls. With a 2048-byte buffer, lines up to 2046 characters will be read atomically, providing comfortable headroom for the reported use case.

@edsiper
Copy link
Member

edsiper commented Dec 4, 2025

Thanks for your contribution, please fix your commit subject:

Run python .github/scripts/commit_prefix_check.py
❌ Commit 7fc057f3fc failed:
Missing prefix in commit subject: 'fix(ne_utils): Increase buffer size to read /proc/stat correctly'
Commit prefix validation failed.
Error: Process completed with exit code 1.

it must be prefixed with in_node_exporter_metrics: ...

…rectly

The "intr" entry of proc stat can be larger than 512 chars, and generate errors
leading to stalled CPU metrics if it's the wrong length.

Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
@piwai
Copy link
Contributor Author

piwai commented Dec 5, 2025

@edsiper thanks for the review, I updated the commit title as requested.

@cosmo0920 cosmo0920 changed the title fix(ne_utils): Increase buffer size to read /proc/stat correctly in_node_exporter_metrics: Increase buffer size to read /proc/stat correctly Dec 5, 2025
@edsiper edsiper merged commit 7ded9ae into fluent:master Dec 5, 2025
59 of 61 checks passed
@edsiper
Copy link
Member

edsiper commented Dec 5, 2025

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants