Skip to content

groups: rework log records counter #11518

Open
edsiper wants to merge 37 commits intomasterfrom
grouped-log-counter-fix
Open

groups: rework log records counter #11518
edsiper wants to merge 37 commits intomasterfrom
grouped-log-counter-fix

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Mar 3, 2026

This PR fixes metrics/counter correctness when grouped logs are present.

  • Group markers are no longer counted as real log records in core log-count paths.
  • Output/router/task counters now use effective per-route record/byte totals, including retry/drop paths.
  • Multiple plugins were aligned so their record metrics reflect logical emitted log records (not raw serialized marker count).
  • out_counter now reports both serialized event count and logical log-record count.
  • Added runtime/internal tests to validate counter parity across grouped logs, filters/processors, and retry/drop scenarios.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Log-record-aware counting for more accurate record/byte metrics.
    • Per-route tracking of records and bytes for precise routing metrics.
    • Better grouped-log handling for correct single-record semantics.
  • Bug Fixes

    • More accurate counting across outputs, filters, and processors; improved dropped/retry metric reporting.
    • Consistent byte/record accounting during routing and retries.
  • Tests

    • Extensive unit and runtime tests covering grouped logs, counter parity, and routing behaviors.

edsiper added 30 commits March 3, 2026 16:30
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
edsiper added 3 commits March 3, 2026 16:52
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
@edsiper edsiper requested review from a team, braydonk and cosmo0920 as code owners March 3, 2026 22:55
@edsiper edsiper added this to the Fluent Bit v5.0 milestone Mar 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Adds a log-record-aware counter flb_mp_count_log_records() and replaces generic record counts with it across core, inputs, filters, outputs and processors; introduces per-route records/bytes tracking in task routes and propagates those values into engine metric updates and routing logic.

Changes

Cohort / File(s) Summary
Log Record Counting Core
include/fluent-bit/flb_mp.h, src/flb_mp.c
Adds flb_mp_count_log_records(const void *data, size_t bytes) that decodes log events and counts actual log records (excludes group markers); updates timestamp validation for log chunks.
Task / Route Data
include/fluent-bit/flb_task.h, src/flb_task.c
Adds records and bytes fields to struct flb_task_route; initializes them from event chunks during task/route construction and exposes inline helpers to set/get route data.
Engine — use per-route effective values
src/flb_engine.c, include/fluent-bit/flb_output.h
Engine reads per-route records/bytes (locked) as effective values for metrics and router updates; flb_output paths choose counted chunk and propagate route-data via flb_task_set_route_data.
Input chunk and log handling
src/flb_input_chunk.c, src/flb_input_log.c
Input chunk APIs derive total_records/record counts using log-aware counting for FLB_INPUT_LOGS; release/drop paths now track dropped bytes alongside records.
Processor / Filter changes
src/flb_processor.c, src/flb_filter.c, plugins/filter_alter_size/alter_size.c
Switches counting calls to flb_mp_count_log_records() for processor/filter logic and alter_size removal flow.
Output plugins — count replacements & format changes
plugins/out_*.c
plugins/out_azure/azure.c, plugins/out_azure_kusto/azure_kusto.c, plugins/out_azure_logs_ingestion/azure_logs_ingestion.c, plugins/out_bigquery/bigquery.c, plugins/out_counter/counter.c, plugins/out_datadog/datadog.c, plugins/out_forward/..., plugins/out_kafka_rest/kafka.c, plugins/out_logdna/logdna.c, plugins/out_loki/loki.c, plugins/out_nats/nats.c, plugins/out_nrlogs/newrelic.c, plugins/out_oracle_log_analytics/oci_logan.c, plugins/out_s3/s3.c, plugins/out_skywalking/skywalking.c, plugins/out_stackdriver/stackdriver.c
Replaces many flb_mp_count() usages with flb_mp_count_log_records(); forward plugin adds helpers classifying non-log types and adjusts handling for transcoded/compressed payloads; Stackdriver signature changes to return formatted_records and updates retry/metrics code; counter plugin removes instance context and emits JSON with serialized/log counts.
Tests — unit & runtime additions
tests/internal/*, tests/runtime/*, tests/runtime/CMakeLists.txt
Adds extensive unit/runtime tests for grouped log semantics, grouped payload builders, task/route retention across retries, and counter parity e2e tests; new tests validate that group markers are excluded from log counts and that metrics align with grouped payload behavior.

Sequence Diagrams

sequenceDiagram
    participant Engine as flb_engine
    participant Task as flb_task / Route
    participant Output as Output Plugin
    participant Metrics as cmetrics

    Engine->>Task: flb_task_get_route_data(task, output)
    Task-->>Engine: effective_records, effective_bytes
    Engine->>Metrics: update proc/route counters with effective_records/effective_bytes
    Engine->>Output: signal route completion (uses effective_* for metrics/logs)
Loading
sequenceDiagram
    participant Output as Output Plugin
    participant Counter as flb_mp_count_log_records
    participant Decoder as flb_log_event_decoder

    Output->>Counter: flb_mp_count_log_records(data, bytes)
    Counter->>Decoder: init decoder for data
    Decoder-->>Counter: iterate events
    Counter->>Counter: skip GROUP_START/GROUP_END, count actual records
    Counter-->>Output: return count
    Output->>Metrics: use count for payload sizing and stats
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • #11108 — Related changes to routing metrics and per-route tracking used by engine and outputs.
  • #11055 — Overlaps on task/route structure changes and routing-context adjustments.
  • #11091 — Touches flb_input_log.c and grouped-log handling, related to the updated log-counting semantics.

Suggested labels

docs-required

Suggested reviewers

  • cosmo0920
  • braydonk

Poem

🐰 I nibble markers, skip the fluff,
Counting real logs — that's enough.
Routes now hold their bytes and counts,
Metrics sing from many mounts.
Hoppity-hop, the numbers chuff!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'groups: rework log records counter' clearly and specifically describes the main focus of the PR—refactoring how log record counts are computed in grouped log scenarios.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch grouped-log-counter-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0826c46dc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1216 to +1220
records = counted_event_chunk->total_events;
bytes = counted_event_chunk->size;

flb_task_acquire_lock(task);
flb_task_set_route_data(task, o_ins, records, bytes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve non-log record counts when storing route data

This now writes per-route records from counted_event_chunk->total_events, but for processed metrics/traces/profiles chunks we create processed_event_chunk with total_events = 0 in flb_output_flush_create, so flb_output_return persists zero and handle_output_event later reports cmt_proc_records/retry/drop record counters as 0 whenever an output processor is active for those signal types. Before this change, engine counters used task->event_chunk->total_events, so this is a regression in production metrics accuracy for non-log pipelines with processors.

Useful? React with 👍 / 👎.

log_records = flb_mp_count_log_records(event_chunk->data,
event_chunk->size);
}
total = serialized_events + log_records;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop double-counting records in counter plugin total

The new total value adds serialized_events and log_records, which are overlapping measures for logs; for a normal non-grouped log chunk (serialized_events == log_records) this reports total = 2N, and for grouped chunks it mixes two different units into one field. That makes the emitted total incorrect/misleading compared to the plugin’s record-count intent.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
tests/internal/processor.c (1)

283-289: Avoid asserting out_buf == NULL on full-drop path; assert semantics and free defensively instead.

Line 287 assumes a specific buffer contract that may vary while still being correct (out_size == 0). This can make the test fragile and leak out_buf if allocated.

Suggested fix
@@
-        TEST_CHECK(out_buf == NULL);
         TEST_CHECK(out_size == 0);
+        if (out_buf != NULL && out_buf != mp_buf) {
+            flb_free(out_buf);
+        }
         flb_free(mp_buf);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/internal/processor.c` around lines 283 - 289, The test currently
asserts out_buf == NULL after calling flb_processor_run, which is fragile;
instead verify that out_size == 0 and defensively free any non-NULL out_buf to
avoid leaks. Update the assertions around flb_processor_run (the call using
TEST_CHECK(ret == 0); TEST_CHECK(out_size == 0);) remove the TEST_CHECK(out_buf
== NULL) check, and add logic that if out_buf != NULL then call
flb_free(out_buf) and set out_buf = NULL; continue to flb_free(mp_buf) as
before. Ensure references to flb_processor_run, out_buf, out_size, mp_buf, and
flb_free are used so the change is localized to this test block.
tests/runtime/CMakeLists.txt (1)

197-199: Simplify redundant FLB_IN_LIB gating.

These checks are functionally correct, but FLB_IN_LIB is repeated in both surrounding conditions and macro arguments.

♻️ Suggested simplification
-  if(FLB_IN_LIB)
-    FLB_RT_TEST(FLB_FILTER_GREP "filter_counter_semantics.c")
-  endif()
+  FLB_RT_TEST(FLB_FILTER_GREP "filter_counter_semantics.c")
 endif()

 if(FLB_IN_LIB AND FLB_FILTER_GREP AND FLB_OUT_NULL AND FLB_OUT_HTTP)
-  FLB_RT_TEST(FLB_IN_LIB "counter_parity_e2e.c")
+  FLB_RT_TEST(FLB_FILTER_GREP "counter_parity_e2e.c")
 endif()

Also applies to: 202-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/runtime/CMakeLists.txt` around lines 197 - 199, The outer
if(FLB_IN_LIB) ... endif() gating around FLB_RT_TEST calls is redundant because
FLB_RT_TEST already encodes the FLB_IN_LIB requirement; remove the surrounding
if/endif blocks and keep the FLB_RT_TEST(FLB_FILTER_GREP
"filter_counter_semantics.c") invocation (and the other similar FLB_RT_TEST
instances referenced around the same area) so the macro alone controls
inclusion.
plugins/out_forward/forward.c (1)

1260-1272: Consider consolidating event-type helpers into shared forward internals.

The helper logic at Lines 1260-1272 is duplicated in plugins/out_forward/forward_format.c; centralizing this in plugins/out_forward/forward.h would reduce drift risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_forward/forward.c` around lines 1260 - 1272, The two helper
functions forward_event_type_is_non_log and
forward_event_type_supports_fluentd_compat are duplicated in forward.c and
forward_format.c; move their declarations to a shared header (e.g., add
prototypes to plugins/out_forward/forward.h) and implement a single copy (or
mark inline) in one C file or the header so both forward.c and forward_format.c
include forward.h and use the common helpers; update includes in both source
files to include forward.h and remove the duplicate definitions to avoid drift.
tests/runtime/out_forward.c (1)

32-62: Extract common instance-lookup helpers into a shared runtime test utility.

get_input_instance_by_name / get_output_instance_by_name duplicate logic already present in other runtime tests; a shared helper would reduce maintenance overhead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/runtime/out_forward.c` around lines 32 - 62, Duplicate lookup logic in
get_input_instance_by_name and get_output_instance_by_name should be moved to a
shared runtime test utility: create a new test helper (e.g.,
runtime_test_utils.c/.h) that exposes these functions
(get_input_instance_by_name, get_output_instance_by_name) operating on flb_ctx_t
and returning flb_input_instance / flb_output_instance, move the implementations
there, update tests that currently define the same helpers to include the new
header instead, and remove the duplicate functions from tests; keep the function
signatures and behavior identical and ensure the build/test harness includes the
new source file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/out_counter/counter.c`:
- Around line 63-71: The total is double-counting because serialized_events
already includes log_records; change the total calculation in counter.c so it
does not add log_records to serialized_events (e.g., set total =
serialized_events or sum only non-overlapping counters), and ensure the printf
that prints total still uses the updated total variable; update references to
total, serialized_events, and log_records accordingly.

In `@src/flb_input_chunk.c`:
- Around line 385-390: The metric uses chunk_size when calling cmt_counter_add
on router->logs_drop_bytes_total, which overcounts for fan-out routes; call
flb_task_get_route_data for the specific task/route to obtain the
route-effective byte count (use that value in the cmt_counter_add call) and only
fall back to chunk_size if flb_task_get_route_data returns NULL or zero; update
the cmt_counter_add calls that reference router->logs_drop_bytes_total (the one
using old_input_chunk and chunk_size and the other similar occurrence later) to
use the route-effective bytes variable instead of chunk_size while preserving
the same tag values (flb_input_name(old_input_chunk->in),
flb_output_name(output_plugin)).
- Around line 360-362: The current check treating dropped_record_count <= 0 as
"unknown" causes explicit zeros to be overwritten; change the condition in the
block using dropped_record_count and
get_input_chunk_record_count(old_input_chunk) to only fallback when the count is
truly unknown (e.g., dropped_record_count < 0 or equal to a sentinel like -1) so
that a valid zero is preserved; apply the same fix to the other occurrences
referenced around the blocks that include dropped_record_count (the occurrences
near lines 524-526 and 547-552).

In `@tests/internal/processor.c`:
- Around line 277-300: The test dereferences pu->ctx (used as f_ins) and calls
cmt_counter_get_val even when flb_processor_init or flb_processor_run fail;
guard these counter reads by checking the return values from flb_processor_init
and flb_processor_run (ret) before accessing pu->ctx or calling
cmt_counter_get_val, e.g., only perform f_ins = pu->ctx and the subsequent
cmt_counter_get_val calls when ret == 0 (and ensure mp_buf is freed on all
paths), so replace the unconditional reads of f_ins/counters with a conditional
block that runs only on successful init/run.

In `@tests/internal/task_map.c`:
- Around line 49-50: The scheduler failure path currently calls
mk_event_loop_destroy(ret_ctx->evl) and then flb_config_exit(ret_ctx->config),
but flb_config_exit also destroys config->evl, causing a double-free; to fix,
remove the explicit mk_event_loop_destroy call in the failure path (or ensure
ret_ctx->evl is nulled before calling flb_config_exit) so that only
flb_config_exit handles event loop teardown; update the code around
mk_event_loop_destroy(ret_ctx->evl) / flb_config_exit(ret_ctx->config) to avoid
destroying the same event loop twice.

---

Nitpick comments:
In `@plugins/out_forward/forward.c`:
- Around line 1260-1272: The two helper functions forward_event_type_is_non_log
and forward_event_type_supports_fluentd_compat are duplicated in forward.c and
forward_format.c; move their declarations to a shared header (e.g., add
prototypes to plugins/out_forward/forward.h) and implement a single copy (or
mark inline) in one C file or the header so both forward.c and forward_format.c
include forward.h and use the common helpers; update includes in both source
files to include forward.h and remove the duplicate definitions to avoid drift.

In `@tests/internal/processor.c`:
- Around line 283-289: The test currently asserts out_buf == NULL after calling
flb_processor_run, which is fragile; instead verify that out_size == 0 and
defensively free any non-NULL out_buf to avoid leaks. Update the assertions
around flb_processor_run (the call using TEST_CHECK(ret == 0);
TEST_CHECK(out_size == 0);) remove the TEST_CHECK(out_buf == NULL) check, and
add logic that if out_buf != NULL then call flb_free(out_buf) and set out_buf =
NULL; continue to flb_free(mp_buf) as before. Ensure references to
flb_processor_run, out_buf, out_size, mp_buf, and flb_free are used so the
change is localized to this test block.

In `@tests/runtime/CMakeLists.txt`:
- Around line 197-199: The outer if(FLB_IN_LIB) ... endif() gating around
FLB_RT_TEST calls is redundant because FLB_RT_TEST already encodes the
FLB_IN_LIB requirement; remove the surrounding if/endif blocks and keep the
FLB_RT_TEST(FLB_FILTER_GREP "filter_counter_semantics.c") invocation (and the
other similar FLB_RT_TEST instances referenced around the same area) so the
macro alone controls inclusion.

In `@tests/runtime/out_forward.c`:
- Around line 32-62: Duplicate lookup logic in get_input_instance_by_name and
get_output_instance_by_name should be moved to a shared runtime test utility:
create a new test helper (e.g., runtime_test_utils.c/.h) that exposes these
functions (get_input_instance_by_name, get_output_instance_by_name) operating on
flb_ctx_t and returning flb_input_instance / flb_output_instance, move the
implementations there, update tests that currently define the same helpers to
include the new header instead, and remove the duplicate functions from tests;
keep the function signatures and behavior identical and ensure the build/test
harness includes the new source file.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7db0c7 and 0826c46.

📒 Files selected for processing (38)
  • include/fluent-bit/flb_mp.h
  • include/fluent-bit/flb_output.h
  • include/fluent-bit/flb_task.h
  • plugins/filter_alter_size/alter_size.c
  • plugins/out_azure/azure.c
  • plugins/out_azure_kusto/azure_kusto.c
  • plugins/out_azure_logs_ingestion/azure_logs_ingestion.c
  • plugins/out_bigquery/bigquery.c
  • plugins/out_counter/counter.c
  • plugins/out_datadog/datadog.c
  • plugins/out_forward/forward.c
  • plugins/out_forward/forward_format.c
  • plugins/out_kafka_rest/kafka.c
  • plugins/out_logdna/logdna.c
  • plugins/out_loki/loki.c
  • plugins/out_nats/nats.c
  • plugins/out_nrlogs/newrelic.c
  • plugins/out_oracle_log_analytics/oci_logan.c
  • plugins/out_s3/s3.c
  • plugins/out_skywalking/skywalking.c
  • plugins/out_stackdriver/stackdriver.c
  • src/flb_engine.c
  • src/flb_filter.c
  • src/flb_input_chunk.c
  • src/flb_input_log.c
  • src/flb_mp.c
  • src/flb_processor.c
  • src/flb_task.c
  • tests/internal/input_chunk.c
  • tests/internal/input_chunk_routes.c
  • tests/internal/log_event_decoder.c
  • tests/internal/processor.c
  • tests/internal/task_map.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/counter_parity_e2e.c
  • tests/runtime/filter_counter_semantics.c
  • tests/runtime/group_counter_semantics.c
  • tests/runtime/out_forward.c

Comment on lines +49 to +50
mk_event_loop_destroy(ret_ctx->evl);
flb_config_exit(ret_ctx->config);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid double-destroying the event loop in the scheduler failure path.

Line 49 destroys ret_ctx->evl, and Line 50 immediately calls flb_config_exit(ret_ctx->config), which also destroys config->evl. This can trigger a double free.

🩹 Proposed fix
     ret_ctx->config->evl = ret_ctx->evl;
     ret_ctx->config->sched = flb_sched_create(ret_ctx->config, ret_ctx->evl);
     if(!TEST_CHECK(ret_ctx->config->sched != NULL)) {
-        mk_event_loop_destroy(ret_ctx->evl);
         flb_config_exit(ret_ctx->config);
         flb_free(ret_ctx);
         return NULL;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mk_event_loop_destroy(ret_ctx->evl);
flb_config_exit(ret_ctx->config);
flb_config_exit(ret_ctx->config);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/internal/task_map.c` around lines 49 - 50, The scheduler failure path
currently calls mk_event_loop_destroy(ret_ctx->evl) and then
flb_config_exit(ret_ctx->config), but flb_config_exit also destroys config->evl,
causing a double-free; to fix, remove the explicit mk_event_loop_destroy call in
the failure path (or ensure ret_ctx->evl is nulled before calling
flb_config_exit) so that only flb_config_exit handles event loop teardown;
update the code around mk_event_loop_destroy(ret_ctx->evl) /
flb_config_exit(ret_ctx->config) to avoid destroying the same event loop twice.

edsiper added 4 commits March 4, 2026 08:37
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugins/out_counter/counter.c (1)

63-63: Consider clarifying or removing the redundant total variable.

Currently total is always equal to serialized_events, providing no additional information. If the intent is to report the "effective record count" (i.e., logical log records for logs, serialized events for other types), consider:

-    total = serialized_events;
+    if (event_chunk->type == FLB_EVENT_TYPE_LOGS) {
+        total = log_records;
+    }
+    else {
+        total = serialized_events;
+    }

Alternatively, if total is intentionally kept equal to serialized_events for backward compatibility or as a placeholder, a brief comment would help clarify the intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_counter/counter.c` at line 63, The variable total in
plugins/out_counter/counter.c is redundant because it’s always assigned
serialized_events; either remove total and use serialized_events directly where
referenced, or make total represent the intended “effective record count” by
assigning it conditionally (e.g., total = (record_type == RECORD_TYPE_LOGS ?
records : serialized_events) using the function/enum names present) and update
usages accordingly; if you must keep total for backward compatibility, add a
one-line comment next to the assignment explaining it’s intentionally equal to
serialized_events as a placeholder for future effective-count logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/out_counter/counter.c`:
- Around line 66-71: The change in counter.c replaced the plain numeric output
with a JSON object in the printf call (see the printf that prints
ts/serialized_events/log_records/total), which is a breaking behavior; either
revert that printf to output the original plain numeric value (previously just
printing total) or make the JSON format opt-in via a config flag/behavior
toggle, and in either case update tests in tests/runtime/out_counter.c to assert
the intended output: if keeping JSON, add assertions that the output parses as
JSON and contains the keys "ts","serialized_events","log_records","total" with
correct counts; if reverting, update tests to assert the plain numeric total.
Ensure the change is applied to the printf site and tests reference the same
expected format.

In `@tests/internal/processor.c`:
- Around line 285-299: The test leaks mp_buf when create_grouped_msgpack_records
allocates it but init_ok is false and the current cleanup is inside the init_ok
guard; move or add a flb_free(mp_buf) unconditional cleanup after the init_ok
block so mp_buf is always freed when non-NULL. Also defensively avoid
double-freeing when cleaning out_buf by checking out_buf != NULL && out_buf !=
mp_buf before calling flb_free(out_buf) and nulling pointers after free (same
pattern as used in processor()). Update references around
create_grouped_msgpack_records, mp_buf, out_buf, init_ok, and flb_free
accordingly.

---

Nitpick comments:
In `@plugins/out_counter/counter.c`:
- Line 63: The variable total in plugins/out_counter/counter.c is redundant
because it’s always assigned serialized_events; either remove total and use
serialized_events directly where referenced, or make total represent the
intended “effective record count” by assigning it conditionally (e.g., total =
(record_type == RECORD_TYPE_LOGS ? records : serialized_events) using the
function/enum names present) and update usages accordingly; if you must keep
total for backward compatibility, add a one-line comment next to the assignment
explaining it’s intentionally equal to serialized_events as a placeholder for
future effective-count logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f6d69ee-813a-4341-8aa8-0b198310948e

📥 Commits

Reviewing files that changed from the base of the PR and between 0826c46 and eb1214f.

📒 Files selected for processing (4)
  • include/fluent-bit/flb_output.h
  • plugins/out_counter/counter.c
  • src/flb_input_chunk.c
  • tests/internal/processor.c

Comment on lines +66 to +71
printf("{\"ts\":%.6f,\"serialized_events\":%zu,\"log_records\":%zu,"
"\"total\":%zu}\n",
flb_time_to_double(&tm),
serialized_events,
log_records,
total);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Breaking output format change – consider documentation and test updates.

The output changed from a plain numeric value to a JSON object. This is a breaking change for any downstream consumers (scripts, log parsers, dashboards) expecting the previous format.

Additionally, the existing tests in tests/runtime/out_counter.c do not validate output content—they only verify the plugin initializes and handles data without crashing. Consider adding assertions to validate the new JSON structure and counter values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_counter/counter.c` around lines 66 - 71, The change in counter.c
replaced the plain numeric output with a JSON object in the printf call (see the
printf that prints ts/serialized_events/log_records/total), which is a breaking
behavior; either revert that printf to output the original plain numeric value
(previously just printing total) or make the JSON format opt-in via a config
flag/behavior toggle, and in either case update tests in
tests/runtime/out_counter.c to assert the intended output: if keeping JSON, add
assertions that the output parses as JSON and contains the keys
"ts","serialized_events","log_records","total" with correct counts; if
reverting, update tests to assert the plain numeric total. Ensure the change is
applied to the printf site and tests reference the same expected format.

Comment on lines +285 to +299
ret = create_grouped_msgpack_records(&mp_buf, &mp_size);
TEST_CHECK(ret == 0);
if (ret == 0 && init_ok == FLB_TRUE) {
ret = flb_processor_run(proc, 0, FLB_PROCESSOR_LOGS,
"TEST", 4, mp_buf, mp_size,
&out_buf, &out_size);
TEST_CHECK(ret == 0);
TEST_CHECK(out_size == 0);
if (out_buf != NULL) {
flb_free(out_buf);
out_buf = NULL;
}
flb_free(mp_buf);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid leak (and potential alias double-free) in grouped test cleanup.

On Line 285, mp_buf can be allocated even when processor init failed; but on Line 287 the cleanup path is gated by init_ok, so mp_buf may leak. Also, Line 294 should defensively avoid freeing out_buf when it aliases mp_buf (same defensive pattern used in processor()).

Proposed cleanup fix
-    ret = create_grouped_msgpack_records(&mp_buf, &mp_size);
-    TEST_CHECK(ret == 0);
-    if (ret == 0 && init_ok == FLB_TRUE) {
-        ret = flb_processor_run(proc, 0, FLB_PROCESSOR_LOGS,
-                                "TEST", 4, mp_buf, mp_size,
-                                &out_buf, &out_size);
-        TEST_CHECK(ret == 0);
-        TEST_CHECK(out_size == 0);
-        if (out_buf != NULL) {
-            flb_free(out_buf);
-            out_buf = NULL;
-        }
-        flb_free(mp_buf);
-    }
+    ret = create_grouped_msgpack_records(&mp_buf, &mp_size);
+    TEST_CHECK(ret == 0);
+    if (ret == 0) {
+        if (init_ok == FLB_TRUE) {
+            ret = flb_processor_run(proc, 0, FLB_PROCESSOR_LOGS,
+                                    "TEST", 4, mp_buf, mp_size,
+                                    &out_buf, &out_size);
+            TEST_CHECK(ret == 0);
+            TEST_CHECK(out_size == 0);
+            if (out_buf != NULL && out_buf != mp_buf) {
+                flb_free(out_buf);
+                out_buf = NULL;
+            }
+        }
+        flb_free(mp_buf);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ret = create_grouped_msgpack_records(&mp_buf, &mp_size);
TEST_CHECK(ret == 0);
if (ret == 0 && init_ok == FLB_TRUE) {
ret = flb_processor_run(proc, 0, FLB_PROCESSOR_LOGS,
"TEST", 4, mp_buf, mp_size,
&out_buf, &out_size);
TEST_CHECK(ret == 0);
TEST_CHECK(out_size == 0);
if (out_buf != NULL) {
flb_free(out_buf);
out_buf = NULL;
}
flb_free(mp_buf);
}
ret = create_grouped_msgpack_records(&mp_buf, &mp_size);
TEST_CHECK(ret == 0);
if (ret == 0) {
if (init_ok == FLB_TRUE) {
ret = flb_processor_run(proc, 0, FLB_PROCESSOR_LOGS,
"TEST", 4, mp_buf, mp_size,
&out_buf, &out_size);
TEST_CHECK(ret == 0);
TEST_CHECK(out_size == 0);
if (out_buf != NULL && out_buf != mp_buf) {
flb_free(out_buf);
out_buf = NULL;
}
}
flb_free(mp_buf);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/internal/processor.c` around lines 285 - 299, The test leaks mp_buf
when create_grouped_msgpack_records allocates it but init_ok is false and the
current cleanup is inside the init_ok guard; move or add a flb_free(mp_buf)
unconditional cleanup after the init_ok block so mp_buf is always freed when
non-NULL. Also defensively avoid double-freeing when cleaning out_buf by
checking out_buf != NULL && out_buf != mp_buf before calling flb_free(out_buf)
and nulling pointers after free (same pattern as used in processor()). Update
references around create_grouped_msgpack_records, mp_buf, out_buf, init_ok, and
flb_free accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant