node_exporter_metrics: Add file path to error messages#11839
Conversation
Improve error logs to indicate which file path failed when reading metrics, making it easier to debug missing hardware sensors or permissions issues. Signed-off-by: Ross Golder <ross@golder.org>
📝 WalkthroughWalkthroughThis PR refactors three core filesystem-reading utility functions to accept a context pointer for improved error reporting. All 15 collector modules in the node exporter plugin are updated to pass the context at call sites, enabling better error diagnostics through plugin logging. ChangesUtility functions context propagation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44bf2d2665
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (ctx) { | ||
| flb_plg_error(ctx->ins, "could not read from '%s'", p); | ||
| } |
There was a problem hiding this comment.
Log path before freeing it in uint64 read errors
In ne_utils_file_read_uint64, the p buffer is destroyed immediately after a successful open, but the new read-error log path still formats %s with p when read() returns -1. That makes the error path dereference freed memory (undefined behavior), which can produce corrupted logs or crash under allocator checks when I/O errors occur (e.g., transient EIO on sysfs/procfs reads).
Useful? React with 👍 / 👎.
| if (ctx) { | ||
| flb_plg_error(ctx->ins, "could not read from '%s'", p); | ||
| } |
There was a problem hiding this comment.
Avoid use-after-free in SDS read error logging
The same pattern exists in ne_utils_file_read_sds: p is freed right after open() succeeds, then reused in flb_plg_error(..., "%s", p) if read() fails. This introduces a use-after-free in an error path, so a read failure can trigger undefined behavior instead of safe diagnostic logging.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/in_node_exporter_metrics/ne_utils.c`:
- Around line 135-141: The code frees the SDS string p via flb_sds_destroy(p)
before calling read(fd, ...) and then uses p in the error log
flb_plg_error(ctx->ins, "could not read from '%s'", p), leading to a
use-after-free; to fix, delay calling flb_sds_destroy(p) until after the read()
result is handled (i.e., after the if (bytes == -1) block and any flb_plg_error
that references p), and apply the same change to the second identical block
around the other read() (the one at the 280-286 region) so both read/error paths
log p while it remains valid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 780dcfd7-8ada-4a79-b714-4c89712daf30
📒 Files selected for processing (17)
plugins/in_node_exporter_metrics/ne_cpu_linux.cplugins/in_node_exporter_metrics/ne_cpufreq_linux.cplugins/in_node_exporter_metrics/ne_diskstats_linux.cplugins/in_node_exporter_metrics/ne_filefd_linux.cplugins/in_node_exporter_metrics/ne_hwmon_linux.cplugins/in_node_exporter_metrics/ne_loadavg_linux.cplugins/in_node_exporter_metrics/ne_meminfo_linux.cplugins/in_node_exporter_metrics/ne_netdev_linux.cplugins/in_node_exporter_metrics/ne_netstat_linux.cplugins/in_node_exporter_metrics/ne_nvme_linux.cplugins/in_node_exporter_metrics/ne_processes_linux.cplugins/in_node_exporter_metrics/ne_sockstat_linux.cplugins/in_node_exporter_metrics/ne_stat_linux.cplugins/in_node_exporter_metrics/ne_thermalzone_linux.cplugins/in_node_exporter_metrics/ne_utils.cplugins/in_node_exporter_metrics/ne_utils.hplugins/in_node_exporter_metrics/ne_vmstat_linux.c
| flb_sds_destroy(p); | ||
|
|
||
| bytes = read(fd, &tmp, sizeof(tmp)); | ||
| if (bytes == -1) { | ||
| flb_errno(); | ||
| if (ctx) { | ||
| flb_plg_error(ctx->ins, "could not read from '%s'", p); | ||
| } |
There was a problem hiding this comment.
Use-after-free in read error logging path
Line 135 and Line 280 free p before read(), but Line 140 and Line 285 still log %s using p on read failure. Keep p alive until after read/error handling.
Proposed fix
@@
- flb_sds_destroy(p);
-
bytes = read(fd, &tmp, sizeof(tmp));
if (bytes == -1) {
if (ctx) {
flb_plg_error(ctx->ins, "could not read from '%s'", p);
}
@@
}
close(fd);
+ flb_sds_destroy(p);
return -1;
}
close(fd);
+ flb_sds_destroy(p);
@@
- flb_sds_destroy(p);
-
bytes = read(fd, &tmp, sizeof(tmp));
if (bytes == -1) {
if (ctx) {
flb_plg_error(ctx->ins, "could not read from '%s'", p);
}
@@
}
close(fd);
+ flb_sds_destroy(p);
return -1;
}
close(fd);
+ flb_sds_destroy(p);Also applies to: 280-286
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/in_node_exporter_metrics/ne_utils.c` around lines 135 - 141, The code
frees the SDS string p via flb_sds_destroy(p) before calling read(fd, ...) and
then uses p in the error log flb_plg_error(ctx->ins, "could not read from '%s'",
p), leading to a use-after-free; to fix, delay calling flb_sds_destroy(p) until
after the read() result is handled (i.e., after the if (bytes == -1) block and
any flb_plg_error that references p), and apply the same change to the second
identical block around the other read() (the one at the 280-286 region) so both
read/error paths log p while it remains valid.
Problem: When the node_exporter_metrics input plugin fails to read hardware sensors (hwmon, thermal zones, cpufreq), the error logs only show:
This is useless for debugging because users have no way to know which file/path is missing.
Solution: Add the file path to error messages in ne_utils_file_read_uint64, ne_utils_file_read_sds, and ne_utils_file_read_lines functions.
Example after fix:
Closes #11838
Summary by CodeRabbit