in_node_exporter_metrics: Increase buffer size to read /proc/stat correctly#11253
in_node_exporter_metrics: Increase buffer size to read /proc/stat correctly#11253edsiper merged 1 commit intofluent:masterfrom
Conversation
WalkthroughIn Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10–15 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
plugins/in_node_exporter_metrics/ne_utils.c (1)
175-190: Consider explicit handling of partial reads for future robustness.The current logic adds each fgets result as a separate list entry. While the buffer increase makes partial reads unlikely, lines exceeding 2046 characters would still be split across multiple entries. For maximum robustness, consider:
- Accumulating chunks until a newline is found, or
- Explicitly skipping empty lines (len == 0) before calling
flb_slist_addGiven that /proc/stat lines are unlikely to exceed 2KB in practice, this is purely a future-proofing suggestion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_node_exporter_metrics/ne_utils.c(1 hunks)
🔇 Additional comments (1)
plugins/in_node_exporter_metrics/ne_utils.c (1)
156-156: LGTM! Buffer increase addresses the reported issue.The buffer size increase from 512 to 2048 bytes directly fixes the bug where /proc/stat's "intr" line (~1020 characters) was causing intermittent stalls. With a 2048-byte buffer, lines up to 2046 characters will be read atomically, providing comfortable headroom for the reported use case.
|
Thanks for your contribution, please fix your commit subject: it must be prefixed with |
…rectly The "intr" entry of proc stat can be larger than 512 chars, and generate errors leading to stalled CPU metrics if it's the wrong length. Signed-off-by: Pierre-Yves Rofes <3604235+piwai@users.noreply.github.com>
|
@edsiper thanks for the review, I updated the commit title as requested. |
|
thanks! |
Hello,
for weeks we've been experiencing an issue with fluent-bit where the CPU metrics provided by node exporter like
node_cpu_seconds_totalwould stall, for an unknown reason. Also, after a few days, the metrics would "unfreeze" and resume, and we really didn't understand why.After recompiling and adding debug into fluent-bit, I managed to trace it down to an issue in the "ne_utils_file_read_lines()" function:
fluent-bit/plugins/in_node_exporter_metrics/ne_utils.c
Line 151 in 712b48d
This function uses a 512 bytes buffer to read lines, which is insufficient to read
/proc/statentries correctly. Indeed, the "intr" line can be larger.Most of the time, this isn't an issue, except when the line being read has a length which is multiple of (buffer size -2), which cause the fgets() loop to produce an empty line, causing an error up in the call stack, and preventing the CPU metrics to be updated. But as soon as some interrupt counter reaches an additional digit (e.g 99 -> 100), the problem disappears since the line length will increase, and there will be a single character to read.
Sample /proc/stat file showing the issue, with intr line of 1020 chars (obtained from an Ubuntu 20.04 LTS):
Sample reproducer script, using a copy of ne_utils_file_read_lines() as main:
Sample output:
Proposed fix is to increase the readline buffer size from 512 to 2048 bytes (1024 seems a bit low given the size of the intr line which is 1020 chars)
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.