Skip to content

in_tail: fix data loss with buffered data on shutdown and gzip files#11269

Open
jinyongchoi wants to merge 3 commits intofluent:masterfrom
jinyongchoi:fix/11265-in-tail-data-loss
Open

in_tail: fix data loss with buffered data on shutdown and gzip files#11269
jinyongchoi wants to merge 3 commits intofluent:masterfrom
jinyongchoi:fix/11265-in-tail-data-loss

Conversation

@jinyongchoi
Copy link
Contributor

@jinyongchoi jinyongchoi commented Dec 9, 2025

Unprocessed data in the internal buffer is discarded when Fluent Bit stops, causing data loss because the DB offset is already advanced.

This patch fixes the issue by rewinding the file offset by the remaining buffer length on exit, ensuring data is re-read on restart.

For compressed gzip files, a separate issue caused data duplication after restart because skip_bytes was incorrectly decremented during runtime. A new field 'exclude_bytes' is introduced as a runtime-only counter, preserving skip_bytes for correct DB persistence.

Additionally, this patch prevents resurrecting deleted file entries in the DB by resetting db_id to 0 upon deletion and checking it before updating the offset.

The SQLite schema is updated to include 'anchor_offset' and 'skip_bytes' columns. On upgrade from older versions, these columns are automatically added via ALTER TABLE if they do not exist.

Closes #11265


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[SERVICE]
    flush 2
    grace 60
    log_level debug
    log_file /tmp/testing/logs/testing.log
    parsers_file /tmp/testing/parsers.conf
    plugins_file /tmp/testing/plugins.conf
    http_server on
    http_listen 0.0.0.0
    http_port 22002

    storage.path /tmp/testing/storage
    storage.metrics on
    storage.max_chunks_up 512
    storage.sync full
    storage.checksum off
    storage.backlog.mem_limit 100M

[INPUT]
    Name tail
    Path /tmp/testing.input
    Tag testing
    Key message
    Offset_Key   log_offset

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 31557600

    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 16MB
    Inotify_Watcher false

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/testing/storage/testing.db
    DB.sync normal
    DB.locking false

    Alias input_log

[OUTPUT]
    Name file
    Match *
    File /tmp/testing.out
[SERVICE]
    flush 2
    grace 60
    log_level debug
    log_file /tmp/testing/logs/testing.log
    parsers_file /tmp/testing/parsers.conf
    plugins_file /tmp/testing/plugins.conf
    http_server on
    http_listen 0.0.0.0
    http_port 22002

    storage.path /tmp/testing/storage
    storage.metrics on
    storage.max_chunks_up 512
    storage.sync full
    storage.checksum off
    storage.backlog.mem_limit 100M

[INPUT]
    Name tail
    Path /tmp/testing.input.gz
    Tag testing
    Key message
    Offset_Key   log_offset

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 31557600

    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 16MB
    Inotify_Watcher false

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/testing/storage/testing.db
    DB.sync normal
    DB.locking false

    Alias input_log

[OUTPUT]
    Name file
    Match *
    File /tmp/testing.out
  • Debug log output from testing the change
normal file
[2025/12/15 20:40:56.47094045] [debug] [input:tail:input_log] inode=50643270 rewind offset for /tmp/testing.input: old=185883589 new=185883490 (buf_len=99)

compressed file
[2025/12/15 20:45:12.615579997] [debug] [input:tail:input_log] Skipping: anchor=0 offset=0 exclude=1119419529 decompressed=999999
[2025/12/15 20:45:12.617241577] [debug] [input:tail:input_log] Skipping: anchor=0 offset=999999 exclude=1118419530 decompressed=15809
...
[2025/12/15 20:45:15.408197399] [debug] [input:tail:input_log] Skipping: anchor=0 offset=10923918 exclude=1014921 decompressed=999999
[2025/12/15 20:45:15.408206153] [debug] [input:tail:input_log] Skipping: anchor=0 offset=10923918 exclude=14922 decompressed=15809
[2025/12/15 20:45:20.13095551] [debug] [input:tail:input_log] Gzip member completed: updating anchor from 0 to 10923918, resetting skip from 2147483784 to 0
  • Attached Valgrind output that shows no leaks or memory corruption was found
valgrind --leak-check=full ./bin/fluent-bit -v -c ./fluentbit.conf
...
==546544== 
==546544== HEAP SUMMARY:
==546544==     in use at exit: 0 bytes in 0 blocks
==546544==   total heap usage: 1,973,893 allocs, 1,973,893 frees, 2,123,730,947 bytes allocated
==546544== 
==546544== All heap blocks were freed -- no leaks are possible
==546544== 
==546544== For lists of detected and suppressed errors, rerun with: -s
==546544== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Persistent resume for gzip-compressed logs, database schema migration to store resume state, and a new sentinel for "no DB id".
  • Bug Fixes

    • Improved seek/offset handling for compressed and plain files; correct behavior across truncation, rotation, removal and shutdown; reliable skipping of already-processed decompressed bytes.
  • Tests

    • Added DB + gzip tests covering resume loss, append/inotify append, rotation, and multi-resume.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds gzip-aware resume state and DB-backed bookkeeping: new per-file fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode), DB schema migration and SQL updates for skip/anchor, seek/processing changes to respect gzip decompression and member boundaries, rewind DB offsets when buffered data remains, and new gzip/DB tests.

Changes

Cohort / File(s) Summary
Core file & decompression logic
plugins/in_tail/tail_file.c
Add gzip-aware resume logic and includes; choose seek position from DB/anchor/read_from_head; initialize anchor/exclude/skipping after seek; apply skip/exclude logic during chunk processing; increment/reset skip_bytes across gzip members; rewind DB offset on removal when buffered data exists (unless decompression active).
DB layer & SQL
plugins/in_tail/tail_db.c, plugins/in_tail/tail_sql.h
Add pragma-based migration to add skip/anchor columns; replace in-file query_status with cb_column_exists; extend db_file_exists to return skip and anchor; bind/read skip/anchor in insert/update/offset/rotate/delete paths; reset db_id to FLB_TAIL_DB_ID_NONE after delete.
File struct & internal headers
plugins/in_tail/tail_file_internal.h, plugins/in_tail/tail.h
Add per-file fields anchor_offset (int64_t), skip_bytes (uint64_t), exclude_bytes (uint64_t), skipping_mode (int); add macro FLB_TAIL_DB_ID_NONE.
FS event / stat handling
plugins/in_tail/tail_fs_inotify.c, plugins/in_tail/tail_fs_stat.c
On truncation (size_delta < 0) initialize anchor_offset, skip_bytes, exclude_bytes, and skipping_mode alongside offset and buf_len.
File removal & counters
plugins/in_tail/tail_file.c (remove/adjust_counters)
On removal, if buffered data exists and no decompression, rewind file->offset by buf_len (floor 0) and persist to DB; on truncation reset anchor/skip/exclude/skipping and update DB when enabled.
SQL definitions
plugins/in_tail/tail_sql.h
Add skip and anchor columns (DEFAULT 0) to in_tail_files; update SQL_INSERT_FILE and SQL_UPDATE_OFFSET to include/handle skip and anchor.
Tests & helpers
tests/runtime/in_tail.c
Add raw write helper, gzip create/append utilities, gzip-resume inspection and wait utility; add DB/gzip resume, append and rotation tests and register them in TEST_LIST.

Sequence Diagram(s)

sequenceDiagram
    participant Disk as Disk File
    participant Reader as in_tail Reader
    participant Decompress as Gzip Decompressor
    participant Buffer as In-memory Buffer
    participant DB as SQLite DB

    Disk->>Reader: read compressed/raw bytes (advance raw offset)
    Reader->>Buffer: append raw bytes
    alt decompression_context (gzip)
        Buffer->>Decompress: feed compressed bytes
        Decompress-->>Buffer: decompressed data (may span members)
        Buffer->>Buffer: apply exclude_bytes / skipping_mode (drop initial decompressed bytes)
        alt gzip member boundary reached
            Decompress->>Reader: notify member end
            Reader->>Reader: set anchor_offset (member start), reset skip_bytes
        end
        Reader->>DB: persist raw offset, skip, anchor
    else no decompression
        Buffer->>DB: persist raw offset
    end
    alt Shutdown with buffered unprocessed data and no decompression
        Reader->>DB: rewind offset (offset -= buf_len) and persist
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pay attention to:
    • DB migration and SQL binding/column ordering (plugins/in_tail/tail_db.c, plugins/in_tail/tail_sql.h).
    • Seek selection and post-seek initialization for compressed streams (plugins/in_tail/tail_file.c).
    • Lifecycle and transitions of skip_bytes / exclude_bytes / skipping_mode across reads and gzip-member boundaries.
    • Correctness of offset rewind logic in flb_tail_file_remove and DB synchronization on shutdown.
    • New test helpers and gzip test correctness in tests/runtime/in_tail.c.

Possibly related PRs

Suggested labels

backport to v4.0.x, backport to v4.1.x

Suggested reviewers

  • edsiper
  • cosmo0920
  • leonardo-albertovich
  • koleini
  • fujimotos

Poem

🐇
I nibble compressed bytes at dawn,
I mark anchors where members yawn.
When partial lines hide from sight,
I hop back offsets through the night.
Hops, skips, and crumbs — I make logs right.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main objectives: fixing data loss with buffered data on shutdown and adding gzip file handling.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from issue #11265: implements offset rewinding for uncompressed files, handles gzip with schema changes and resume logic, resets db_id to prevent resurrection, and adds DB migration for new columns.
Out of Scope Changes check ✅ Passed All changes are scoped to addressing the data loss issue: buffer handling, gzip resume logic, DB schema migration, and test coverage. No unrelated modifications were detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8640940 and fdd8174.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (11 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/in_tail/tail_fs_inotify.c
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_sql.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_db.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
plugins/in_tail/tail_db.c (2)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.h (1)
  • flb_tail_target_file_name_cmp (64-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (16)
plugins/in_tail/tail_fs_stat.c (1)

139-142: LGTM! Consistent gzip state reset on truncation.

The initialization of anchor_offset, skip_bytes, exclude_bytes, and skipping_mode on file truncation is correct and consistent with the corresponding logic in tail_fs_inotify.c and adjust_counters() in tail_file.c.

plugins/in_tail/tail_file.c (8)

47-47: LGTM!

Required include for the decompression state constants used in the gzip member boundary detection logic.


1031-1096: Well-structured gzip-aware DB position handling.

The logic correctly handles three scenarios:

  1. Gzip files with persisted anchor/skip: seeks to anchor_offset and enables skipping mode
  2. Normal files or legacy gzip entries: uses the stored offset
  3. No DB position with read_from_head=off: seeks to EOF

The comment at lines 1089-1094 appropriately documents the multi-member gzip limitation where skip_bytes only tracks bytes within the current member. This is a known limitation acknowledged in the PR discussion.


1130-1138: LGTM!

Correctly initializes gzip resume fields for compressed files without DB persistence. Setting stream_offset = 0 is intentional as there's no prior decompressed byte count to track.


1329-1338: LGTM!

Good initialization of the gzip resume fields. Using FLB_TAIL_DB_ID_NONE instead of a magic 0 improves code clarity and consistency with the deletion path.


1509-1548: Correct offset rewind implementation for uncompressed files.

The logic properly:

  1. Rewinds offset by buf_len for uncompressed files to prevent data loss on restart
  2. Logs a debug message documenting the acknowledged gzip limitation
  3. Uses db_id > FLB_TAIL_DB_ID_NONE check to avoid resurrecting deleted DB entries (addressing the past review concern)

The gzip limitation (lines 1537-1541) is intentional and documented in the PR objectives—mapping decompressed buffer positions back to compressed offsets is infeasible with streaming decompression.


1668-1671: LGTM!

Correctly resets all gzip resume state fields on file truncation, consistent with the truncation handling in tail_fs_stat.c and tail_fs_inotify.c.


1872-1892: Correct implementation of decompressed data skipping.

The skip logic properly handles:

  1. Full skip: when exclude_bytes >= decompressed_data_length, decrement and discard all data
  2. Partial skip: calculate remaining bytes, use memmove for the overlapping buffer shift, clear skipping_mode

Using memmove is correct here since source and destination overlap.


1933-1954: Solid gzip member boundary detection and anchor update.

The logic correctly:

  1. Tracks decompressed bytes within the current member via skip_bytes
  2. Detects member completion when the decompressor expects a new header and all buffers are empty
  3. Updates anchor_offset to the current raw file position for safe resume

As noted in the PR discussion (by cosmo0920), there's a known corner case with multi-member gzip + multiline where a shutdown between the in-memory skip_bytes increment and the DB persist can cause small duplication on restart. This is technically difficult to eliminate and is appropriately documented as a limitation rather than a bug.

plugins/in_tail/tail_db.c (7)

28-34: LGTM!

Clean callback implementation for detecting query results. This addresses the previous review feedback about using pragma_table_info for reliable column existence detection.


61-107: Robust schema migration using pragma_table_info.

This addresses the previous review feedback:

  • Uses pragma_table_info to reliably detect column existence
  • Properly distinguishes between query failures (returns NULL with error log) and missing columns (triggers migration)
  • Using flb_plg_debug for migration messages is appropriate per the PR discussion

182-232: LGTM! Proper type handling for skip/anchor columns.

The extended signature correctly uses:

  • int64_t for offset and anchor (matching sqlite3_column_int64 return type)
  • uint64_t for skip (matching file->skip_bytes type)

This addresses the previous review concern about potential truncation on platforms where off_t is 32-bit. The added cleanup at lines 204-205 before returning on error is also good practice.


262-263: LGTM!

Correctly binds skip_bytes and anchor_offset to the INSERT statement parameters.


323-359: LGTM! Correct restoration of gzip resume state from DB.

The logic properly:

  1. Retrieves skip_bytes and anchor_offset from the database
  2. Initializes skipping_mode and exclude_bytes when skip_bytes > 0, enabling the skip logic in flb_tail_file_chunk()
  3. Uses correct types matching the db_file_exists signature

373-375: LGTM!

Binding order correctly matches the SQL_UPDATE_OFFSET statement: offset, skip, anchor for SET clause, then db_id for WHERE clause.


444-444: Critical fix: Reset db_id to prevent DB entry resurrection.

Setting db_id = FLB_TAIL_DB_ID_NONE after deletion works in conjunction with the db_id > FLB_TAIL_DB_ID_NONE check in flb_tail_file_remove() to prevent the bug where a deleted file's DB entry could be recreated if offset rewinding occurs after deletion.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ded9ae and 71208f6.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c

[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.

(normalCheckLevelMaxBranches)


[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.

(toomanyconfigs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: Agent
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
plugins/in_tail/tail_file.c (1)

1471-1471: Remove trailing whitespace.

There are trailing spaces after #endif on this line.

Apply this diff:

-#endif        
+#endif
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)

1451-1472: Consider adding a warning for compressed files with buffered data.

While the fix correctly excludes compressed files (where rewinding isn't possible due to gzip decompression state requirements), users monitoring .gz files may be unaware that buffered data can still be lost on shutdown. Consider adding a debug or warning message similar to:

if (file->buf_len > 0) {
    if (file->decompression_context != NULL) {
        flb_plg_debug(ctx->ins, 
                      "inode=%"PRIu64" compressed file %s has %lu bytes buffered; "
                      "rewind not supported for compressed files",
                      file->inode, file->name, (unsigned long)file->buf_len);
    }
    else {
        /* existing rewind logic */
        ...
    }
}

This would help users understand the limitation without affecting correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71208f6 and f63f1f4.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c

[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.

(normalCheckLevelMaxBranches)


[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.

(toomanyconfigs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)

1451-1472: LGTM! Rewind logic correctly prevents data loss for regular files.

The implementation properly addresses the data loss issue when Fluent Bit shuts down with unprocessed buffered data:

  • The condition file->buf_len > 0 && file->decompression_context == NULL correctly identifies regular files with unprocessed data while excluding compressed files (where offset and buf_len have incompatible units).
  • The rewind arithmetic offset -= buf_len (clamped to 0) correctly repositions the file offset to the start of unprocessed data.
  • The DB persistence call ensures the rewound offset survives restarts.

This addresses the core issue described in #11265 where partial lines and buffered data were lost on restart.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
plugins/in_tail/tail_file.c (2)

1451-1472: Core rewind logic looks correct; consider adding warning for compressed files.

The check for file->decompression_context == NULL properly prevents the compressed-file arithmetic bug flagged in earlier reviews. For regular files, the offset and buffer length are in the same units, so the subtraction and DB update are valid.

However, when compressed files have buffered decompressed data at shutdown, that data will be silently lost (since rewinding is skipped). Consider logging a warning in that case so users are aware:

 if (file->buf_len > 0 && file->decompression_context == NULL) {
     /*
      * If there is data in the buffer, it means it was not processed.
      * We must rewind the offset to ensure this data is re-read on restart.
      */
     off_t old_offset = file->offset;

     if (file->offset > file->buf_len) {
         file->offset -= file->buf_len;
     } else {
         file->offset = 0;
     }

     flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: old=%"PRId64" new=%"PRId64" (buf_len=%lu)",
                   file->inode, file->name, old_offset, file->offset, (unsigned long)file->buf_len);

 #ifdef FLB_HAVE_SQLDB
     if (ctx->db) {
         flb_tail_db_file_offset(file, ctx);
     }
 #endif
+}
+else if (file->buf_len > 0 && file->decompression_context != NULL) {
+    flb_plg_warn(ctx->ins, "inode=%"PRIu64" cannot rewind compressed file %s; "
+                 "%lu decompressed bytes in buffer may be lost on restart",
+                 file->inode, file->name, (unsigned long)file->buf_len);
 }

1471-1471: Remove trailing whitespace.

Minor formatting issue: extra spaces after #endif.

-#endif        
+#endif
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f63f1f4 and 76f3c05.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c

[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.

(normalCheckLevelMaxBranches)


[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.

(toomanyconfigs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_tail/tail_db.c (1)

335-368: Add db_id guards to all flb_tail_db_file_offset() call sites to prevent database entry duplication.

The function's restoration logic at line 362 (when sqlite3_changes()==0) will call db_file_insert() and create duplicate database entries if invoked after setting file->db_id = FLB_TAIL_DB_ID_NONE (lines 414, 1303). Six of eight call sites only check if (ctx->db) or if (file->config->db) without verifying db_id != FLB_TAIL_DB_ID_NONE. Only tail_file.c lines 1498 and 1505 have the proper guard. Add the db_id check to call sites at:

  • tail_fs_stat.c:148
  • tail_fs_inotify.c:278
  • tail_file.c:1075, 1637, 1697, 1911
♻️ Duplicate comments (3)
plugins/in_tail/tail_db.c (1)

58-81: Use pragma_table_info for migration checks (current logic can misfire on locked/corrupt DB).

-    ret = flb_sqldb_query(db, "SELECT skip FROM in_tail_files LIMIT 1;", NULL, NULL);
-    if (ret != FLB_OK) {
+    /* Check column existence explicitly (avoid treating unrelated SQL errors as "missing column") */
+    ret = flb_sqldb_query(db,
+                          "SELECT 1 FROM pragma_table_info('in_tail_files') WHERE name='skip' LIMIT 1;",
+                          NULL, NULL);
+    if (ret != FLB_OK) {
+        flb_plg_error(ctx->ins, "db: could not query table info for migration");
+        flb_sqldb_close(db);
+        return NULL;
+    }
+    /* TODO: detect 0-row result and only then ALTER TABLE (needs a callback or a helper that returns row count). */

Same for anchor.

plugins/in_tail/tail_file.c (2)

1477-1509: Fix debug format specifiers and add the promised gzip “buffered data on exit” warning.

  1. The log uses PRId64 for off_t and %lu for size_t, which is not portable and can break -Wformat / LLP64.

  2. For decompression_context != NULL, the PR objectives say “log a warning when buffered data remain on exit”, but no warning is emitted.

     if (file->buf_len > 0) {
         if (file->decompression_context == NULL) {
@@
-            flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: "
-                          "old=%"PRId64" new=%"PRId64" (buf_len=%lu)",
-                          file->inode, file->name, old_offset, file->offset,
-                          (unsigned long)file->buf_len);
+            flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: "
+                          "old=%jd new=%jd (buf_len=%zu)",
+                          file->inode, file->name,
+                          (intmax_t) old_offset, (intmax_t) file->offset,
+                          file->buf_len);
@@
         }
         else {
+            flb_plg_warn(ctx->ins,
+                         "inode=%"PRIu64" file=%s: buffered data (%zu bytes) remains on exit for gzip input; "
+                         "cannot rewind raw offset safely with streaming decompression",
+                         file->inode, file->name, file->buf_len);
 #ifdef FLB_HAVE_SQLDB
             if (ctx->db && file->db_id > FLB_TAIL_DB_ID_NONE) {
                 flb_tail_db_file_offset(file, ctx);
             }
 #endif
         }
     }

1832-1849: Fix gzip debug log format specifiers (%lu truncation risk).

These should be %jd for offsets (cast to intmax_t) and %zu for size-like values. Same issue in the “Gzip member completed” log.

-    flb_plg_debug(ctx->ins, "Skipping: anchor=%lu offset=%lu exclude=%lu decompressed=%lu",
-                 (unsigned long)file->anchor_offset, (unsigned long)file->offset,
-                 (unsigned long)file->exclude_bytes, (unsigned long)decompressed_data_length);
+    flb_plg_debug(ctx->ins, "Skipping: anchor=%jd offset=%jd exclude=%zu decompressed=%zu",
+                 (intmax_t) file->anchor_offset, (intmax_t) file->offset,
+                 (size_t) file->exclude_bytes, (size_t) decompressed_data_length);
@@
-    flb_plg_debug(file->config->ins,
-                 "Gzip member completed: updating anchor from %lu to %lu, resetting skip from %lu to 0",
-                 (unsigned long)file->anchor_offset, (unsigned long)file->offset,
-                 (unsigned long)file->skip_bytes);
+    flb_plg_debug(file->config->ins,
+                 "Gzip member completed: updating anchor from %jd to %jd, resetting skip from %" PRIu64 " to 0",
+                 (intmax_t) file->anchor_offset, (intmax_t) file->offset,
+                 file->skip_bytes);

Also applies to: 1890-1906

🧹 Nitpick comments (2)
plugins/in_tail/tail_fs_stat.c (1)

122-151: Truncation reset includes anchor/skip/exclude state (consistent with gzip resume model).

Optional: consider making local offset an off_t (matches lseek() / file->offset).

tests/runtime/in_tail.c (1)

2840-2875: Prefer parameter binding over SQL string interpolation (even in tests).

Today get_db_offset() builds a LIKE '%%%s%%' query; consider SELECT offset FROM in_tail_files WHERE name LIKE ? + sqlite3_bind_text() to avoid edge cases with filenames containing quotes/%.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 075366a and d7b865d.

📒 Files selected for processing (8)
  • plugins/in_tail/tail.h (1 hunks)
  • plugins/in_tail/tail_db.c (9 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
  • tests/runtime/in_tail.c (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/in_tail/tail_fs_inotify.c
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_fs_stat.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
  • tests/runtime/in_tail.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (336-369)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (6)
plugins/in_tail/tail.h (1)

36-44: Nice cleanup: introduce FLB_TAIL_DB_ID_NONE sentinel for db_id + config defaults.

plugins/in_tail/tail_file_internal.h (1)

43-49: Types look right: offset/anchor_offset as off_t, skip/exclude as uint64_t.

plugins/in_tail/tail_sql.h (1)

30-54: Schema + insert/update statements are coherent for skip/anchor persistence.

tests/runtime/in_tail.c (2)

272-301: write_raw() helper is a good fit for “incomplete line” shutdown tests.


3069-3389: DB+gzip tests are correctly registered under FLB_HAVE_SQLDB.

Also applies to: 3417-3426

plugins/in_tail/tail_file.c (1)

1040-1113: set_file_position(): compressed resume via anchor_offset looks coherent; please verify stream_offset semantics for gzip + offset_key.

On gzip resume you seek to anchor_offset and initialize exclude_bytes = skip_bytes, but file->stream_offset stays at 0 for compressed files. If offset_key is enabled, this may reset offsets across restarts for gzip inputs. Please confirm that’s acceptable, or initialize stream_offset from persisted state (if intended).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
plugins/in_tail/tail_db.c (1)

176-212: Avoid off_t for DB skip field to prevent truncation on 32‑bit builds

db_file_exists() and flb_tail_db_file_set() currently funnel the DB skip column through off_t:

  • Signature: off_t *skip, off_t *anchor.
  • Locals: off_t skip = 0; off_t anchor = 0;
  • Assignment: file->skip_bytes = skip; where file->skip_bytes is uint64_t.
  • DB read: *skip = sqlite3_column_int64(..., 6);

On 32‑bit platforms where off_t is 32‑bit, large gzip streams can produce skip values >4 GiB. Reading these via off_t will truncate the high bits, causing incorrect resume positions (duplicate or skipped data) even though SQLite stores full 64‑bit values.

To keep the DB path consistent with the 64‑bit skip_bytes field and SQLite’s INTEGER (int64) storage, use uint64_t for skip (and optionally for anchor if you want full 64‑bit offsets):

-static int db_file_exists(struct flb_tail_file *file,
-                          struct flb_tail_config *ctx,
-                          uint64_t *id, uint64_t *inode,
-                          off_t *offset, off_t *skip, off_t *anchor)
+static int db_file_exists(struct flb_tail_file *file,
+                          struct flb_tail_config *ctx,
+                          uint64_t *id, uint64_t *inode,
+                          off_t *offset, uint64_t *skip, off_t *anchor)
@@
-    off_t offset = 0;
-    off_t skip = 0;
-    off_t anchor = 0;
+    off_t    offset = 0;
+    uint64_t skip   = 0;
+    off_t    anchor = 0;
@@
-        *skip = sqlite3_column_int64(ctx->stmt_get_file, 6);
+        *skip = (uint64_t) sqlite3_column_int64(ctx->stmt_get_file, 6);
@@
-        file->skip_bytes = skip;
+        file->skip_bytes = skip;

(Apply the corresponding signature/type updates to any other callers and related locals.)

Please double-check the intended width of skip/anchor against the table definition in tail_sql.h and your target platforms’ off_t size to ensure no truncation can occur.

Also applies to: 309-352

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b40844 and bd12958.

📒 Files selected for processing (8)
  • plugins/in_tail/tail.h (1 hunks)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
  • tests/runtime/in_tail.c (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/in_tail/tail_sql.h
  • plugins/in_tail/tail.h
  • plugins/in_tail/tail_fs_inotify.c
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_file_internal.h
  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_lib.c (9)
  • flb_service_set (652-678)
  • flb_input_set (305-335)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
  • flb_create (143-225)
  • flb_input (266-276)
  • flb_output (279-289)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (5)
plugins/in_tail/tail_fs_stat.c (1)

127-151: Truncation state reset for skip/anchor fields looks correct

Resetting anchor_offset, skip_bytes, exclude_bytes, and skipping_mode together with offset/buf_len on truncation keeps runtime and DB skip state consistent for both compressed and uncompressed files.

tests/runtime/in_tail.c (1)

274-303: New rewind + gzip DB tests and helpers align well with the PR objectives

The additions around write_raw(), gzip file helpers, cb_check_gzip_resume(), wait_for_count_with_timeout(), and the new DB-focused tests (flb_test_db_offset_rewind_on_shutdown, flb_test_db_gzip_*) accurately exercise:

  • Rewind-on-shutdown for buffered incomplete lines on plain files.
  • Gzip resume behavior across restart, inotify append, rotation, and multiple restarts.
  • The intended “no rewind but warn” semantics for gzip on shutdown.

Resource handling (file/DB unlinks, ctx teardown) and assertions look reasonable for runtime tests.

Also applies to: 309-368, 374-415, 500-529, 2870-3397, 3427-3434

plugins/in_tail/tail_db.c (1)

28-35: PRAGMA-based column existence checks are a solid migration strategy

Using pragma_table_info('in_tail_files') with cb_column_exists to detect and ALTER TABLE in missing skip/anchor columns avoids misinterpreting other SQL errors as “missing column” and cleanly migrates older DBs.

Also applies to: 61-102

plugins/in_tail/tail_file.c (2)

1037-1076: Seek/anchor/skip handling for compressed vs normal files is coherent

  • For DB-backed files, set_file_position() now:

    • Uses anchor_offset (plus skip_bytes/exclude_bytes) to choose a seek position for compressed files.
    • Uses offset for normal files.
    • Falls back to EOF only when there is no stored DB position and read_from_head is false.
  • In flb_tail_file_chunk():

    • On resume with skipping_mode == FLB_TRUE, the exclude_bytes countdown drops previously processed decompressed bytes before feeding the parser, matching the anchor+skip design.
    • After processing, skip_bytes is incremented by processed_bytes, and at gzip member boundaries anchor and skip are atomically updated (anchor_offset = offset; skip_bytes = 0), giving a clean restart point.

Combined with the DB updates (flb_tail_db_file_offset binding offset/skip/anchor), this provides a consistent and replay-safe position model for gzip streams without over-seeking or reprocessing more than intended.

Also applies to: 1106-1112, 1899-1919


1303-1304: Offset rewind on shutdown and truncation state reset are correctly guarded

  • Initializing file->db_id to FLB_TAIL_DB_ID_NONE in flb_tail_file_append() ensures we don’t accidentally resurrect deleted DB rows when rewinding offsets.

  • In flb_tail_file_remove():

    • For non-compressed files with buffered data, rewinding offset by buf_len (bounded at 0) before calling flb_tail_db_file_offset() ensures any unprocessed bytes are re-read on restart, closing the data-loss gap described in #11265.
    • For compressed files, the code intentionally avoids raw-offset rewind and instead logs a clear debug message and persists the current state, which matches the documented limitation for gzip.
  • adjust_counters() now resets anchor_offset, skip_bytes, exclude_bytes, and skipping_mode when a file is truncated and also persists the reset state to the DB, avoiding stale skip state after truncation.

These changes together make the DB position handling and shutdown behavior much safer without regressing rotation/deletion flows.

Also applies to: 1477-1515, 1606-1646

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
plugins/in_tail/tail_file_internal.h (1)

44-49: New offset/skip fields align with gzip/DB logic

Using off_t for offset/anchor_offset and uint64_t for skip_bytes/exclude_bytes matches how lseek and DB INTEGER are used in the implementation. skipping_mode being an int is fine, though it could be tightened to bool for clarity later.

tests/runtime/in_tail.c (1)

417-460: DB offset rewind test is correct but could handle missing initial offset more defensively

The flb_test_db_offset_rewind_on_shutdown() flow correctly:

  • Writes an initial line and waits for it to be ingested.
  • Captures the DB offset, writes an incomplete line without \n, and waits until the DB offset advances.
  • Stops Fluent Bit so flb_tail_file_remove() can rewind the stored offset by buf_len.
  • Restarts, writes a bare newline, and asserts that exactly one record (the previously buffered line) is emitted.

One small robustness improvement: if get_db_offset() still returns -1 after the second attempt, you currently proceed and current_offset > start_offset will trivially pass. Consider looping on the initial offset until it becomes non‑negative (with a timeout), or explicitly TEST_CHECK(start_offset >= 0) and abort early to avoid false positives if DB writes fail.

Also applies to: 2872-3074

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd12958 and a5f84e8.

📒 Files selected for processing (8)
  • plugins/in_tail/tail.h (1 hunks)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
  • tests/runtime/in_tail.c (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/in_tail/tail_fs_inotify.c
  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail.h
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_file_internal.h
  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (3)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_lib.c (4)
  • flb_input_set (305-335)
  • flb_output_set (520-551)
  • flb_input (266-276)
  • flb_output (279-289)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (357-390)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (9)
plugins/in_tail/tail_sql.h (1)

37-53: Schema and CRUD macros are consistent with new skip/anchor fields

SQL_CREATE_FILES, SQL_INSERT_FILE, and SQL_UPDATE_OFFSET now include skip and anchor in a way that matches the column order consumed in db_file_exists() and the insert/offset statements in tail_db.c. This keeps new and migrated DBs in sync with struct flb_tail_file without backward‑compat issues.

tests/runtime/in_tail.c (3)

37-40: Test helpers for raw writes, gzip files, and msgpack inspection look solid

  • write_raw() correctly supports “data only”, “newline only”, and “data + newline” and checks write() results, matching the existing write_msg() style.
  • create_gzip_file() / append_gzip_file() now validate flb_gzip_compress() and fwrite() results and clean up on error, which avoids silently truncated gzip test artifacts.
  • cb_check_gzip_resume() now guards msgpack string lengths and uses memcmp instead of strncmp on non‑NUL‑terminated buffers, fixing the prior OOB risk.

No issues here; good test‑scaffolding improvements.

Also applies to: 274-368, 374-415, 502-531


3076-3400: Gzip+DB resume/rotation tests comprehensively exercise the new skip/anchor behavior

The four gzip tests:

  • flb_test_db_gzip_resume_loss
  • flb_test_db_gzip_inotify_append
  • flb_test_db_gzip_rotation
  • flb_test_db_gzip_multi_resume

correctly validate that:

  • Incomplete final lines in a gzip member aren’t lost across restart (resume_loss).
  • Inotify/refresh‑driven appends are picked up without duplicating earlier lines.
  • Rotation plus writes to both new and old gzip files within rotate_wait are handled and all lines are seen exactly once.
  • Multiple restarts only process newly appended gzip members (multi_resume), using the persisted skip/anchor state.

State (t_ctx.count, t_ctx.found_line2) is reset between runs where needed, DB and gzip files are unlinked at start and end, and all four functions are correctly guarded by FLB_HAVE_SQLDB. This is good coverage of the new resume semantics.


3429-3436: New DB/gzip tests are properly registered and guarded

The added entries in TEST_LIST for db_offset_rewind_on_shutdown and the four gzip+DB scenarios are all enclosed in the #ifdef FLB_HAVE_SQLDB block and match their function names. Test wiring looks correct.

plugins/in_tail/tail_db.c (2)

28-34: Column‑existence migration via pragma_table_info is a solid improvement

The new cb_column_exists + SELECT 1 FROM pragma_table_info('in_tail_files') WHERE name='skip'/'anchor' checks cleanly distinguish “column missing” from general SQL errors and only run ALTER TABLE when needed. Errors in the PRAGMA query now abort DB init instead of silently attempting migration, which is safer than the previous SELECT skip FROM ... approach.

Also applies to: 43-43, 61-102


176-211: DB ↔ in‑memory wiring for skip/anchor and db_id reset is consistent

  • db_file_exists() now returns skip (col 6) and anchor (col 7), matching the schema and SELECT * order.
  • db_file_insert() binds file->skip_bytes and file->anchor_offset, so new rows persist the current gzip resume state.
  • flb_tail_db_file_set() restores offset, skip_bytes, and anchor_offset into struct flb_tail_file and initializes exclude_bytes / skipping_mode when a non‑zero skip is present, which is exactly what the gzip resume path expects.
  • flb_tail_db_file_offset() updates offset/skip/anchor together; sqlite3_changes()==0 still triggers a re‑insert, preserving the legacy “recreate row if someone deleted it” behavior.
  • flb_tail_db_file_delete() now sets file->db_id = FLB_TAIL_DB_ID_NONE, which works in tandem with the db_id > FLB_TAIL_DB_ID_NONE checks in tail_file.c to avoid resurrecting deliberately deleted entries.

The DB layer looks internally consistent with the new gzip/anchor semantics.

Also applies to: 248-255, 309-354, 362-367, 415-436

plugins/in_tail/tail_file.c (3)

1032-1076: DB‑backed seek logic cleanly separates normal vs gzip files

set_file_position() now:

  • Uses flb_tail_db_file_set() to populate offset, anchor_offset, and skip_bytes when a DB is configured.
  • For compressed files (decompression_context != NULL), seeks by anchor_offset when a DB position exists, ignoring the raw offset column and using skip_bytes only as a “have DB state” flag.
  • For non‑compressed files, continues to seek by offset, falling back to tail‑of‑file when no DB position and read_from_head is false.
  • Initializes anchor_offset and exclude_bytes appropriately for compressed files after seeking.

This matches the “anchor + skip” model and keeps legacy read‑from‑head/tail semantics intact.

Also applies to: 1106-1113


1303-1304: Offset rewind on removal for buffered data is correct and guarded against DB resurrection

  • New files start with file->db_id = FLB_TAIL_DB_ID_NONE, and DB insertion later sets a positive ID.
  • In flb_tail_file_remove():
    • For uncompressed files with buf_len > 0, the code rewinds file->offset by buf_len (clamped at 0) so unprocessed bytes will be re‑read on restart.
    • The DB is updated only when ctx->db is set and file->db_id > FLB_TAIL_DB_ID_NONE, which prevents resurrecting rows that were explicitly deleted earlier (e.g., in stale/deleted‑file paths).
    • For compressed files, it intentionally does not rewind the raw offset and instead logs a detailed debug message plus persists the current DB state, matching the documented limitation for gzip.

This directly addresses the buffered‑data loss on shutdown for plain files while keeping compressed behavior conservative.

Also applies to: 1477-1515


1606-1652: Gzip skip/anchor state handling and truncation resets are coherent

  • On truncation in adjust_counters(), the code now resets:
    • anchor_offset to the new offset,
    • skip_bytes and exclude_bytes to 0,
    • skipping_mode to FLB_FALSE,
      and writes the updated state back to the DB when enabled. This avoids carrying stale gzip resume state across truncations.
  • In flb_tail_file_chunk() for compressed files:
    • skipping_mode + exclude_bytes transiently drop the first skip_bytes worth of decompressed data after a restart, without mutating the persisted skip_bytes.
    • Afterwards, skip_bytes is incremented by processed_bytes so the DB always reflects how many decompressed bytes have been consumed since the last anchor.
    • When a gzip member is fully consumed (EXPECTING_HEADER, no input buffer, empty tail buffer), anchor_offset is advanced to the current raw offset and skip_bytes is reset to 0, so future restarts can begin from the next member boundary.

Together, this gives a consistent compressed‑file resume story and keeps DB‑stored skip/anchor in sync with the new in‑memory fields.

Also applies to: 1838-1858, 1899-1919

@jinyongchoi
Copy link
Contributor Author

@codex review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/runtime/in_tail.c (1)

439-442: SQL injection risk in test code.

The snprintf constructs a query using string interpolation with file_name. While this is test code and the filename is controlled, using LIKE '%%%s%%' with unescaped input could be problematic if filenames contain special characters like %, _, or '.

For test robustness, consider using parameter binding or at minimum escaping the filename.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a5f84e8 and c08e644.

📒 Files selected for processing (8)
  • plugins/in_tail/tail.h (1 hunks)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
  • tests/runtime/in_tail.c (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • plugins/in_tail/tail_fs_inotify.c
  • plugins/in_tail/tail.h
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_sql.h
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (357-390)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (21)
plugins/in_tail/tail_db.c (6)

28-34: LGTM: Clean callback for migration detection.

The callback approach properly addresses the previous review feedback about using PRAGMA table_info to detect column existence. The callback sets a flag when a row is returned, which reliably indicates column presence.


61-101: Migration logic looks correct.

The PRAGMA-based migration detection properly handles the case where the table exists but columns are missing. This addresses the previous review concern about misinterpreting non-schema errors as missing columns.


197-201: Good defensive cleanup on error path.

Adding sqlite3_clear_bindings and sqlite3_reset before returning on the error path ensures proper statement cleanup, preventing potential resource leaks or state corruption.


434-436: Good: Reset db_id prevents resurrecting deleted entries.

Setting db_id to FLB_TAIL_DB_ID_NONE after deletion ensures that subsequent offset updates won't inadvertently re-insert a row for a deleted file. This directly addresses the PR objective about preventing DB entry resurrection.


363-366: SQL UPDATE binding order is correct.

The bindings match the SQL_UPDATE_OFFSET statement parameters: position 1 (offset), 2 (skip), 3 (anchor), and 4 (db_id) align with the SQL statement UPDATE in_tail_files set offset=@offset, skip=@skip, anchor=@anchor WHERE id=@id;


339-350: Struct is guaranteed zero-initialized via calloc(), making this initialization safe.

The flb_tail_file struct is allocated using flb_calloc(1, sizeof(struct flb_tail_file)), which is a wrapper around standard C calloc(). The calloc() function guarantees zero-initialization, so exclude_bytes and skipping_mode are always initialized to 0 (and FLB_FALSE) at allocation time. The explicit initialization in the else branch is therefore redundant but not problematic.

tests/runtime/in_tail.c (7)

37-40: LGTM: SQLite header properly guarded.

The #include <sqlite3.h> is now correctly wrapped with #ifdef FLB_HAVE_SQLDB, addressing the previous review concern about builds without SQLite support.


274-303: Good test helper for incomplete line scenarios.

The write_raw function enables testing the exact scenario described in the PR: writing data without a trailing newline to simulate incomplete lines remaining in the buffer at shutdown.


328-331: LGTM: fwrite error checking added.

Both gzip helpers now properly check fwrite return values and handle short writes by cleaning up and returning an error. This addresses the previous review concern.

Also applies to: 358-361


397-404: LGTM: Safe msgpack string comparison.

The callback now uses memcmp with proper size validation (key.via.str.size == 3 and v.via.str.size >= 5) to avoid out-of-bounds reads, addressing the previous review feedback.


417-460: LGTM: get_db_offset properly guarded.

The function is now wrapped with #ifdef FLB_HAVE_SQLDB, preventing compilation errors when SQLite is not available.


3140-3141: Good: found_line2 reset before second run.

The test properly resets both t_ctx.count and t_ctx.found_line2 before the second Fluent Bit run, addressing the previous review concern about false positives from stale state.


3188-3190: Good test hygiene: DB file unlinked at start.

The test properly cleans up both the log file and DB file before starting, preventing cross-run leakage.

plugins/in_tail/tail_file.c (8)

47-47: LGTM: Compression header included for decompression state checking.

The inclusion of flb_compression.h is needed to access FLB_DECOMPRESSOR_STATE_EXPECTING_HEADER for gzip member boundary detection.


1040-1065: Good gzip-aware seek position logic.

The code correctly differentiates between compressed and uncompressed files:

  • Compressed: seek to anchor_offset (gzip member boundary)
  • Uncompressed: seek to stored offset

The has_db_position check ensures we only seek when there's a valid stored position.


1109-1112: LGTM: Initialize exclude_bytes for non-DB gzip files.

When seeking without DB state for a gzip file, anchor_offset and exclude_bytes are properly initialized, ensuring consistent state.


1477-1515: Core offset rewind logic looks correct.

The implementation properly:

  1. Only rewinds for uncompressed files (decompression_context == NULL)
  2. Checks db_id > FLB_TAIL_DB_ID_NONE before updating DB
  3. Logs a warning for gzip files where rewind is infeasible
  4. Uses correct format specifiers (%jd for intmax_t, %zu for size_t)

This addresses the main PR objective of preventing data loss on shutdown.


1635-1638: Good: Reset all gzip-related state on truncation.

The truncation handler now resets anchor_offset, skip_bytes, exclude_bytes, and skipping_mode, addressing the previous review concern about inconsistent state after truncation.


1838-1858: Skipping logic correctly handles partial and complete skips.

The implementation properly handles:

  • Complete skip: decrement exclude_bytes, set decompressed_data_length = 0
  • Partial skip: memmove remaining data, clear skipping_mode

Format specifiers are correct (%jd with intmax_t cast, %zu for size_t).


1899-1919: Gzip member boundary detection for anchor updates.

The condition checks for:

  1. Decompressor awaiting next header (FLB_DECOMPRESSOR_STATE_EXPECTING_HEADER)
  2. No remaining compressed input (input_buffer_length == 0)
  3. No remaining decompressed buffer (buf_len == 0)

This ensures anchor updates occur only at clean gzip member boundaries, enabling accurate resume.


1906-1908: No issues. FLB_DECOMPRESSOR_STATE_EXPECTING_HEADER is properly defined in include/fluent-bit/flb_compression.h at line 32.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/runtime/in_tail.c (1)

2886-3086: Tighten DB rewind test to fail fast if DB offset never advances

The overall flow of flb_test_db_offset_rewind_on_shutdown() matches the bug you’re targeting (DB offset advanced while an incomplete line stays buffered, then verifying replay after restart). One corner case: if get_db_offset() never observes an offset > 0 or the second poll never sees current_offset > start_offset, the test logs a warning via TEST_CHECK/TEST_MSG but continues. That can let the test pass without actually exercising the “advanced DB offset + buffered data” scenario.

Consider treating a non‑advancing DB offset as a hard failure so the test can’t silently turn into a no‑op if timing changes. For example:

-    if (!TEST_CHECK(current_offset > start_offset)) {
-         TEST_MSG("DB offset did not advance. start=%ld current=%ld", start_offset, current_offset);
-    }
+    if (!TEST_CHECK(current_offset > start_offset)) {
+        TEST_MSG("DB offset did not advance. start=%ld current=%ld",
+                 start_offset, current_offset);
+        test_tail_ctx_destroy(ctx);
+        unlink(file[0]);
+        unlink(db);
+        exit(EXIT_FAILURE);
+    }

This keeps the test robust against future timing changes in the tailer/DB path.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c08e644 and 857403f.

📒 Files selected for processing (8)
  • plugins/in_tail/tail.h (1 hunks)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
  • tests/runtime/in_tail.c (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/in_tail/tail_sql.h
  • plugins/in_tail/tail_fs_inotify.c
  • plugins/in_tail/tail.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file_internal.h
  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_fs_stat.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (3)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_lib.c (5)
  • flb_service_set (652-678)
  • flb_input_set (305-335)
  • flb_output_set (520-551)
  • flb_input (266-276)
  • flb_output (279-289)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (9)
plugins/in_tail/tail_fs_stat.c (1)

128-143: Truncation path correctly resets new skip/anchor state

Resetting anchor_offset, skip_bytes, exclude_bytes, and skipping_mode alongside offset/buf_len on truncation keeps gzip skip state consistent with adjust_counters() and avoids stale skip on truncated files. Looks good.

plugins/in_tail/tail_file_internal.h (1)

44-50: Struct extensions and type choices align with DB and gzip logic

Switching offset to off_t and introducing anchor_offset, skip_bytes/exclude_bytes (uint64_t), and skipping_mode is consistent with how offsets and skip counters are handled in tail_db.c and tail_file.c, including 64‑bit DB storage. No issues from a layout or semantics standpoint.

plugins/in_tail/tail_db.c (3)

28-35: Column migration via pragma_table_info is robust and fail‑fast

Using PRAGMA table_info('in_tail_files') with cb_column_exists cleanly distinguishes “column missing” from real SQL errors, and aborts if the pragma itself fails. Adding skip/anchor only when column_found == 0 avoids repeated ALTER TABLE attempts and the earlier misdetection problem.

Also applies to: 42-44, 61-108


182-221: Skip/anchor plumbing between DB and in‑memory state is consistent

  • db_file_exists() now returns skip as uint64_t and anchor as off_t, reading them from columns 6 and 7; this matches the SQL layout and avoids 32‑bit truncation on large gzip members.
  • db_file_insert() binds file->skip_bytes and file->anchor_offset, ensuring new rows are created with correct initial values.
  • flb_tail_db_file_set() applies DB state back into the file: offset, skip_bytes, anchor_offset, and initializes exclude_bytes/skipping_mode only when skip_bytes > 0, which lines up with the transient skip semantics used in tail_file.c.
  • flb_tail_db_file_offset() updates offset, skip, and anchor together, keeping DB and in‑memory state aligned.

This all looks coherent with the gzip resume design.

Also applies to: 248-280, 318-360, 365-399


424-445: Resetting db_id on delete correctly prevents row resurrection

After a successful delete, setting file->db_id = FLB_TAIL_DB_ID_NONE ensures later calls that gate on db_id > FLB_TAIL_DB_ID_NONE (e.g., in flb_tail_file_remove()) will not re‑insert a row for a file that was deliberately removed. This addresses the stale‑entry resurrection concern without changing existing update paths.

tests/runtime/in_tail.c (2)

37-40: Helper and guard additions for gzip/DB tests look correct

  • write_raw() nicely models incomplete vs completed lines without duplicating logic from write_msg().
  • create_gzip_file() / append_gzip_file() now check fwrite() return values and clean up on short write, avoiding silently corrupted gzip test artifacts.
  • struct test_ctx + cb_check_gzip_resume() correctly guard msgpack string lengths and use memcmp, eliminating the prior OOB risk.
  • get_db_offset() and the sqlite3 include are cleanly wrapped in #ifdef FLB_HAVE_SQLDB, so non‑SQLite builds won’t see sqlite symbols.
  • wait_for_count_with_timeout() follows the same time‑diff pattern as existing wait helpers.

No functional concerns here.

Also applies to: 274-303, 309-338, 340-367, 369-415, 417-459, 501-530


3088-3411: Gzip+DB runtime tests comprehensively cover the new resume semantics

The four gzip DB tests (db_gzip_resume_loss, db_gzip_inotify_append, db_gzip_rotation, db_gzip_multi_resume) exercise:

  • replaying an incomplete last line across restarts,
  • inotify/append behavior while running,
  • rotation with old/new gzip files both receiving appends within rotate_wait,
  • multiple resume cycles using skip_bytes/anchor_offset.

All are correctly guarded by #ifdef FLB_HAVE_SQLDB, and the new TEST_LIST entries are under the same guard, so non‑SQLite builds won’t see DB‑dependent tests. This is solid coverage for the new anchor/skip design.

Also applies to: 3441-3448

plugins/in_tail/tail_file.c (2)

47-48: Gzip resume: seek/anchor/skip behavior matches the intended design

  • Including <fluent-bit/flb_compression.h> here is appropriate given the direct use of decompressor state.
  • In set_file_position():
    • DB‑enabled paths now treat file->offset > 0 as the primary persisted position for both plain and gzip files, so migrated DBs that only have offset are still respected.
    • For gzip files with established state (anchor_offset > 0 || skip_bytes > 0), you override seek_pos to anchor_offset, ensuring we always start decompressing from the beginning of the tracked member and then apply skipping via skip_bytes/exclude_bytes.
    • When there is no DB position and read_from_head is off, seeking to EOF and setting anchor_offset accordingly preserves the “tail from end” semantics while initializing anchor for future DB updates.
  • Post‑seek init for gzip without DB (anchor_offset = file->offset; exclude_bytes = file->skip_bytes) cleanly seeds the transient skip countdown for the next run if skip_bytes is ever non‑zero.
  • In flb_tail_file_chunk():
    • The skipping_mode + exclude_bytes block correctly discards already‑processed decompressed bytes before handing any remainder to process_content(), using memmove safely over overlapping buffer regions.
    • skip_bytes += processed_bytes tracks logical decompressed progress, and the “gzip member completed” check updates anchor_offset to the current raw offset and resets skip_bytes to 0 only when we’re at a member boundary with no buffered decompressed data. This matches the intended “anchor + relative skip” model per member.

This end‑to‑end behavior looks coherent for both upgraded and newly created gzip DB entries.

Also applies to: 1027-1115, 1839-1859, 1901-1921


1304-1305: Offset rewind and DB‑ID handling correctly avoid stale or resurrected DB rows

  • Initializing file->db_id to FLB_TAIL_DB_ID_NONE in flb_tail_file_append() ensures we never try to update offsets for files that haven’t been inserted into the DB yet.
  • In flb_tail_file_remove():
    • For non‑compressed files, rewinding file->offset by file->buf_len (capped at 0) when the buffer is non‑empty ensures any unprocessed bytes are re‑read on restart, directly addressing the original data‑loss bug.
    • The DB update is gated on ctx->db && file->db_id > FLB_TAIL_DB_ID_NONE, so entries that were explicitly deleted via flb_tail_db_file_delete() (which now resets db_id to the sentinel) are not silently re‑created.
    • For gzip files, you intentionally avoid rewinding the raw offset, emit a clear debug message about buffered data, and still persist the current DB state when applicable; this matches the stated “no rewind for compressed” constraint.
  • adjust_counters() now resets anchor_offset, skip_bytes, exclude_bytes, and skipping_mode when a truncation is detected, aligning the truncation path with the fs backends and preventing stale skip state after a compressed file is truncated.

These pieces together give the right rewind semantics for plain files and safe DB behavior around deletion/truncation.

Also applies to: 1478-1516, 1632-1647

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_tail/tail_file.c (1)

1054-1078: Missing exclude_bytes and skipping_mode initialization for gzip DB resume.

When resuming a gzip file from DB with skip_bytes > 0:

  1. skip_bytes is loaded from DB
  2. Code seeks to anchor_offset and returns at line 1078
  3. exclude_bytes and skipping_mode are never initialized

Without skipping_mode = FLB_TRUE, the skip logic at line 1839 won't trigger, causing already-processed decompressed data to be reprocessed—resulting in duplicate logs.

Add initialization after seeking for gzip files in the DB path:

             file->offset = ret;
+
+            /* Initialize skip state for gzip resume */
+            if (file->decompression_context != NULL && file->skip_bytes > 0) {
+                file->exclude_bytes = file->skip_bytes;
+                file->skipping_mode = FLB_TRUE;
+            }
         }
         else if (ctx->read_from_head == FLB_FALSE) {
🧹 Nitpick comments (2)
plugins/in_tail/tail_db.c (1)

216-220: Column indices are correct; consider explicit SELECT for robustness.

The column access indices at lines 216-220 correctly reference the schema:

  • Column 6 → skip
  • Column 7 → anchor

However, since SQL_GET_FILE uses SELECT *, the hardcoded indices are brittle against future schema changes. Consider refactoring to explicitly name columns: SELECT id, name, offset, inode, created, rotated, skip, anchor FROM in_tail_files ... to ensure column positions remain correct.

plugins/in_tail/tail_file.c (1)

1110-1113: skipping_mode not set in non-DB decompression path.

If skip_bytes > 0 here, skipping_mode should also be set to FLB_TRUE for the skip logic to trigger. While skip_bytes is likely 0 in this non-DB path, the initialization should be consistent.

 else {
     file->anchor_offset = file->offset;
     file->exclude_bytes = file->skip_bytes; /* Init transient skip countdown */
+    if (file->skip_bytes > 0) {
+        file->skipping_mode = FLB_TRUE;
+    }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 857403f and cbf558d.

📒 Files selected for processing (8)
  • plugins/in_tail/tail.h (1 hunks)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
  • tests/runtime/in_tail.c (5 hunks)
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_db.c
  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_fs_inotify.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • tests/runtime/in_tail.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (3)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_lib.c (6)
  • flb_input_set (305-335)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_stop (1011-1055)
  • flb_input (266-276)
  • flb_output (279-289)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (26)
plugins/in_tail/tail.h (1)

36-38: LGTM!

The FLB_TAIL_DB_ID_NONE macro is a clean addition for representing the "not in database" state. Using 0 is appropriate since SQLite rowids start at 1, so this serves as a safe sentinel value to prevent resurrecting deleted DB entries.

plugins/in_tail/tail_file_internal.h (1)

44-49: LGTM!

The new per-file state fields are well-structured:

  • skip_bytes and exclude_bytes correctly use uint64_t to avoid 32-bit truncation (addressing the past review concern)
  • anchor_offset uses off_t consistent with the file offset type
  • Clear comments documenting each field's purpose
plugins/in_tail/tail_fs_inotify.c (1)

270-273: LGTM!

Proper initialization of the new per-file state fields during file truncation. Resetting anchor_offset, skip_bytes, exclude_bytes, and skipping_mode when a file is truncated ensures the skip/anchor tracking starts fresh, which is correct behavior for handling truncated files.

plugins/in_tail/tail_fs_stat.c (1)

139-142: LGTM!

Consistent with the inotify backend changes. Both filesystem backends now properly reset the skip/anchor state fields during file truncation, ensuring uniform behavior regardless of which backend is in use.

plugins/in_tail/tail_sql.h (1)

30-53: LGTM!

The SQL schema and statement updates are well-structured:

  • New skip and anchor columns with DEFAULT 0 ensure backward compatibility
  • SQL_INSERT_FILE and SQL_UPDATE_OFFSET correctly include the new fields
  • The migration logic in tail_db.c handles upgrades from older databases
plugins/in_tail/tail_db.c (4)

28-34: LGTM!

Clean implementation of the cb_column_exists callback for detecting column presence during migration. This addresses the past review concern about using SELECT queries which could misfire on non-schema errors.


61-107: LGTM!

The migration logic correctly uses PRAGMA table_info to detect missing columns, addressing the past review feedback. The pattern properly handles both the skip and anchor columns with appropriate error handling and cleanup.


348-359: LGTM!

Correct initialization of skipping_mode and exclude_bytes based on the persisted skip_bytes value. When resuming from a DB with a non-zero skip value, the file will enter skipping mode to skip already-processed decompressed bytes.


444-444: LGTM!

Resetting db_id to FLB_TAIL_DB_ID_NONE upon deletion prevents accidentally resurrecting deleted DB entries when flb_tail_db_file_offset is later called. This directly addresses the PR objective.

tests/runtime/in_tail.c (11)

37-40: LGTM!

The sqlite3.h include is now properly guarded with FLB_HAVE_SQLDB, addressing the past review concern about builds without SQLite headers.


274-303: LGTM!

The write_raw helper is well-designed for testing incomplete line scenarios. It correctly handles:

  • Optional data write with length validation
  • Optional newline append controlled by add_newline flag
  • Proper fsync to ensure data is flushed

This enables testing the exact scenario where buffered data without a trailing newline remains in the tail buffer.


309-367: LGTM!

Both gzip helpers now properly check fwrite() return values and handle errors correctly, addressing the past review concern about silent truncation. The error handling cleans up resources (closes file, frees memory) before returning -1.


397-406: LGTM!

The msgpack string comparisons now correctly use memcmp with explicit size checks, addressing the past review concern about OOB reads with non-NUL-terminated msgpack strings:

  • Key comparison: key.via.str.size == 3 && memcmp(..., "log", 3) == 0
  • Value comparison: v.via.str.size >= 5 && memcmp(..., "line2", 5) == 0

417-459: LGTM!

The get_db_offset helper is now properly guarded with #ifdef FLB_HAVE_SQLDB, addressing the past review concern about compilation failures on builds without SQLite support.


2870-3091: Comprehensive test for offset rewind on shutdown.

This test properly validates the core fix by:

  1. Writing data with a complete line, verifying it's processed
  2. Writing data without newline (stays in buffer as incomplete line)
  3. Polling DB until offset advances (data read into buffer)
  4. Stopping Fluent Bit abruptly
  5. Restarting and verifying buffered data is re-read

The test methodology is sound for verifying the data loss prevention mechanism.


3093-3175: LGTM!

The gzip resume loss test correctly validates:

  1. First run processes only complete lines (line1)
  2. Incomplete line (line2 without newline) stays in buffer
  3. After restart with appended data, line2 and line3 are both processed
  4. found_line2 flag confirms line2 wasn't lost

The t_ctx.found_line2 = 0 reset before restart addresses the past review concern about false positives.


3178-3239: LGTM!

Test properly validates gzip file append detection via inotify/refresh while Fluent Bit is running. The test confirms that appended gzip members are correctly detected and processed.


3242-3320: LGTM!

Comprehensive rotation test that validates:

  1. Initial file processing
  2. File rotation (rename to .gz.1)
  3. New file with same name created
  4. Appends to both old (rotated) and new files within rotate_wait window
  5. Both files continue to be monitored and processed correctly

3323-3416: LGTM!

Multi-resume test validates that gzip skip/anchor tracking works correctly across multiple restart cycles:

  1. Run 1: Process line1, stop
  2. Run 2: Skip line1, process only line2, stop
  3. Run 3: Skip line1+line2, process only line3

This confirms the skip_bytes persistence correctly prevents duplicate processing.


3449-3453: LGTM!

All new DB/gzip tests are correctly registered within the #ifdef FLB_HAVE_SQLDB guard, ensuring they're only compiled when SQLite support is available.

plugins/in_tail/tail_file.c (6)

47-47: LGTM!

Required header for the decompression context types and functions used in the gzip handling logic.


1304-1304: LGTM!

Using the explicit FLB_TAIL_DB_ID_NONE constant instead of an implicit 0 improves clarity and aligns with the resurrection prevention logic at line 1499.


1478-1516: LGTM!

The rewind logic correctly:

  • Rewinds uncompressed files by buf_len to prevent data loss
  • Avoids rewinding gzip files (infeasible with streaming decompression) and logs a warning
  • Checks db_id > FLB_TAIL_DB_ID_NONE to prevent resurrecting deleted DB entries
  • Uses proper format specifiers (%jd with intmax_t cast, %zu for size_t)

1636-1639: LGTM!

All gzip-related state (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) is properly reset on file truncation. This addresses the previous review concern about inconsistent exclude_bytes reset.


1839-1859: LGTM!

The skip logic correctly handles gzip resume by discarding already-processed decompressed bytes:

  • Tracks remaining bytes to skip with exclude_bytes
  • Uses memmove to shift usable data when skip completes
  • Clears skipping_mode when done
  • Uses proper format specifiers

1900-1921: LGTM!

The gzip member boundary detection is well-designed:

  • Updates anchor_offset only when a member is fully complete (state awaiting next header, no buffered compressed or decompressed data)
  • Resets skip_bytes to 0 at member boundaries since no skipping is needed
  • Enables safe resume from the next gzip member on restart

This addresses the fundamental challenge of gzip offset tracking with streaming decompression.

@jinyongchoi
Copy link
Contributor Author

@codex review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
plugins/in_tail/tail_file_internal.h (1)

44-49: Clarify skipping_mode’s semantics

The new offset/anchor/skip fields look appropriate, but the skipping_mode comment (/* skipping mode ? */) is vague. Suggest clarifying what the flag actually represents (e.g., “actively discarding decompressed bytes up to exclude_bytes for gzip resume”) to aid future maintainers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbf558d and 383865f.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_fs_inotify.c
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_db.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (8)
plugins/in_tail/tail_sql.h (1)

31-40: Schema & SQL constants line up with new skip/anchor handling

The extended in_tail_files schema (skip, anchor) and the updated INSERT/UPDATE statements are consistent with the column indices used in db_file_exists() and flb_tail_db_file_offset() (columns 6/7 for skip/anchor). This keeps DB reads/writes coherent across migrations and new installs.

Also applies to: 45-47, 52-53

plugins/in_tail/tail_db.c (3)

28-35: Migration via pragma_table_info is robust and fails fast on real DB errors

Using SELECT 1 FROM pragma_table_info('in_tail_files') WHERE name='skip'/'anchor' with cb_column_exists cleanly distinguishes “no such column” (no rows) from generic SQL failures. On error you now log, close the DB, and abort instead of mis-treating lock/corruption/permission issues as a missing column, which is safer.

Also applies to: 61-83, 85-107


182-221: DB ↔ runtime skip/anchor wiring looks correct for gzip resume

  • db_file_exists() now pulls skip (col 6) and anchor (col 7), matching the schema.
  • flb_tail_db_file_set() copies those into file->skip_bytes / file->anchor_offset and initializes exclude_bytes/skipping_mode based on skip_bytes.

This ensures a restarted gzip tailer can seek to the stored anchor, discard skip_bytes worth of decompressed data, and continue without duplicates, while keeping raw offsets in file->offset.

Also applies to: 318-360


365-376: Avoiding resurrection of deleted DB rows via db_id sentinel is sound

Binding skip_bytes and anchor_offset into stmt_offset keeps the DB in sync with the new gzip state, and resetting file->db_id to FLB_TAIL_DB_ID_NONE in flb_tail_db_file_delete() ensures later calls to flb_tail_db_file_offset() from flb_tail_file_remove() are gated and don’t recreate rows that were intentionally deleted.

Also applies to: 425-445

plugins/in_tail/tail_file.c (4)

1310-1311: Using FLB_TAIL_DB_ID_NONE sentinel on initialization is appropriate

Initializing file->db_id to FLB_TAIL_DB_ID_NONE in flb_tail_file_append() matches the new semantics in tail_db.c (where this sentinel guards offset updates and is reset on delete), and avoids accidental DB updates before the file is actually registered.


1484-1522: Rewind-on-buffered-data for plain files and warning‑only path for gzip are well‑scoped

  • For uncompressed files, rewinding file->offset by buf_len (clamped at 0) before persisting avoids losing unprocessed buffered bytes across restarts.
  • The DB update is correctly guarded with file->db_id > FLB_TAIL_DB_ID_NONE, so rows explicitly deleted earlier won’t be recreated.
  • For gzip inputs, you only log a warning and persist the current DB state without raw rewind, which is consistent with the “anchor + skip” design and avoids incorrect seeks when decompressed vs compressed lengths diverge.

This addresses the original data‑loss issue while avoiding the previous “resurrect deleted DB entry” problem.


1631-1652: Truncation now clears gzip skip state consistently

In adjust_counters(), when size_delta < 0 you now reset:

  • anchor_offset to the new offset,
  • skip_bytes and exclude_bytes to 0,
  • skipping_mode to FLB_FALSE,

and persist the new DB offset if enabled. This prevents stale gzip skipping state from being reused after a truncate, aligning with the resets already done in the inotify/stat truncation paths.


1845-1865: Gzip exclude_bytes and member‑boundary anchor updates look logically sound

  • The skipping_mode block drops up to exclude_bytes worth of freshly decompressed data, either by consuming whole chunks or by memmove‑ing the remaining tail down, and only increases buf_len by the post‑skip length. Buffer invariants stay intact.
  • After processing, skip_bytes += processed_bytes tracks the total decompressed bytes consumed since the current anchor.
  • When a gzip member is fully consumed (EXPECTING_HEADER, input_buffer_length == 0, buf_len == 0), updating anchor_offset to the current raw offset and resetting skip_bytes to 0 correctly advances the anchor to the next member boundary and keeps DB state minimal.

Overall this matches the intended “anchor + skip per member” model and should allow correct resume without duplicates for multi‑member gzip files.

Also applies to: 1906-1927

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Added test for offset rewind on shutdown (prevent DB resurrection)
- Added gzip tests: resume loss, inotify append, rotation, multi resume

Signed-off-by: jinyong.choi <inimax801@gmail.com>
Unprocessed buffered data is lost on shutdown because the file offset is saved
ahead of processing. This patch rewinds the offset by the remaining buffer
length on exit, ensuring data is re-read on restart.

Also, to prevent resurrecting deleted files in the DB, db_id is reset to 0
upon deletion. The offset update logic now checks db_id > 0 to ensure only
active files are updated.

Signed-off-by: jinyong.choi <inimax801@gmail.com>
@jinyongchoi
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jinyongchoi
Copy link
Contributor Author

@cosmo0920

In addition to the existing implementation, the following changes were added:

  • Fixed data loss in the in_tail plugin when tailing gzip files across restarts
  • Added DB columns (skip_bytes, anchor_offset) to persist the uncompressed position
  • Implemented a skip mechanism to resume from the correct position in the decompressed stream
  • Reset tracking fields on file rotation

Please let me know if you have any feedback or see any issues with this approach.

Thanks!

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a quite minor style issue.
Otherwise, the patch looks pretty great.
We're mostly all set. 👍

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/in_tail/tail_db.c (1)

182-220: Minor type inconsistency: skip mixes uint64_t and int64_t.

The skip field is declared as uint64_t in C (lines 184, 324) but is read/written using sqlite3_column_int64 and sqlite3_bind_int64 which operate on int64_t. While unlikely to cause issues in practice (skip_bytes won't exceed INT64_MAX), this inconsistency could theoretically cause sign-extension or overflow issues with very large values.

For consistency and correctness, consider using int64_t for skip throughout, or explicitly cast to/from uint64_t with overflow checks.

Also applies to: 262-263, 323-324, 372-375

plugins/in_tail/tail_file.c (1)

1040-1089: Critical: stream_offset not initialized for compressed files in DB path.

Lines 1085-1087 only initialize stream_offset for non-decompression contexts. For compressed files restored from DB with skip_bytes > 0, stream_offset remains at 0 instead of being set to skip_bytes. This breaks offset_key monotonicity:

  • On restart, stream_offset stays 0
  • stream_offset accumulates from 0 at line 1912
  • offset_key (calculated as stream_offset + processed_bytes) restarts from 0 instead of continuing from the saved position

After the lseek() succeeds (line 1061), add:

if (file->decompression_context == NULL) {
    file->stream_offset = file->offset;
}
else {
    file->stream_offset = file->skip_bytes;
}

This ensures stream_offset reflects the decompressed stream position for gzip files.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 623d455 and a6afba7.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/in_tail/tail_fs_stat.c
  • plugins/in_tail/tail_fs_inotify.c
  • plugins/in_tail/tail_file_internal.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
🔇 Additional comments (11)
plugins/in_tail/tail_sql.h (2)

37-40: LGTM: Schema extension for gzip resume state.

The addition of skip and anchor columns with appropriate defaults enables the anchor+skip mechanism for gzip file resume. The trailing comma after rotated and the column definitions are syntactically correct.


45-47: LGTM: SQL statements updated for new columns.

The INSERT and UPDATE statements correctly include the new skip and anchor fields with proper parameter binding.

Also applies to: 52-53

plugins/in_tail/tail_db.c (4)

28-34: LGTM: Column existence detection callback.

The callback correctly sets the found flag when any row is returned by the PRAGMA query, enabling reliable migration detection.


61-108: LGTM: Migration logic addresses past concerns.

The migration now uses PRAGMA table_info to reliably detect column existence instead of attempting a SELECT that could misfire on DB errors. The error handling properly aborts if the PRAGMA query itself fails, and column addition proceeds only when the column is confirmed absent.


348-359: LGTM: Skipping mode initialization from DB state.

The initialization logic correctly sets exclude_bytes and skipping_mode when resuming from a saved gzip position with skip_bytes > 0. This properly restores the runtime state needed to skip already-processed decompressed data on restart.


444-444: LGTM: Prevent resurrecting deleted DB entries.

Resetting db_id to FLB_TAIL_DB_ID_NONE after deletion ensures that subsequent offset updates won't inadvertently recreate the deleted entry. This addresses the concern raised in past review comments.

plugins/in_tail/tail_file.c (5)

1315-1324: LGTM: New gzip resume fields initialized.

All new fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) are properly initialized for newly tracked files. The use of FLB_TAIL_DB_ID_NONE is consistent with the deletion handling.


1495-1534: LGTM: Rewind logic correctly handles buffered data.

The rewind implementation properly addresses the stated objectives:

  • Uncompressed files: Rewinds offset by buf_len (clamped to 0) so unprocessed bytes are re-read on restart
  • Compressed files: Logs a warning that buffered decompressed data cannot be safely rewound due to streaming decompression, accepting the known limitation
  • DB protection: Checks db_id > FLB_TAIL_DB_ID_NONE to prevent updating deleted entries
  • Format specifiers: Uses proper %jd with (intmax_t) casts for off_t values

1654-1657: LGTM: Truncation resets gzip resume state.

All gzip-related fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) are properly reset when the file is truncated, ensuring stale skip state doesn't persist after truncation. This addresses the concern raised in past review comments.


1857-1877: LGTM: Skip mechanism correctly excludes already-processed data.

The skip logic properly decrements exclude_bytes as decompressed data is produced, using memmove to discard the skipped portion when a partial skip occurs, and clearing skipping_mode when the skip completes. This correctly implements the runtime skip countdown for gzip resume.


1918-1939: LGTM: Gzip member boundary detection updates anchor points.

Line 1919 increments skip_bytes by processed_bytes after each chunk is processed, tracking decompressed progress within the current gzip member. When a member completes (decompressor state is EXPECTING_HEADER, buffers are empty), the code updates anchor_offset to the current compressed position and resets skip_bytes to 0, establishing a new resume point at the member boundary. This correctly implements the anchor+skip strategy for gzip files.

@jinyongchoi
Copy link
Contributor Author

@cosmo0920
Thanks for the review! I've addressed the feedback in two commits:

  • a6afba7 - Code style fix
    Fixed else formatting to match project style

  • 1263901 - Initialize stream_offset for compressed files
    DB path: Added missing stream_offset = skip_bytes initialization when restoring compressed files from DB
    Non-DB path: Explicitly initialize exclude_bytes and stream_offset to 0 for clarity and consistency
    Includes comment noting the limitation for multi-member gzip files

Please let me know if there are any other concerns.

@cosmo0920
Copy link
Contributor

I found a potential issue but we could be skipped in this PR:

Interaction with multiline + gzip (MEDIUM risk, but acceptable)

There’s a subtle corner:

Scenario:

  1. Multi-member gzip
  2. Mid-member processing
  3. Shutdown happens after decompression but before anchor update
  4. DB has:
    1. anchor_offset → previous member
    2. skip_bytes → partially updated
  5. Restart resumes correctly most of the time, but…

Risk:

If shutdown occurs between

  • skip_bytes += processed_bytes;

and

  • flb_tail_db_file_offset()

then DB may lag by one chunk.

This results in small duplication, not loss.

This scenario could be still a main risk to behind the right offset of tailing gzipped files but this just causes duplicated contents of gzipped files. This could be technically difficult to solve this. So, we need to describe such corner case if we accept this behavior.

Other risks are still existing in this patch but others are relatively too small than this.

Plus, it's just a logging issue but should we need to show altering database when starting to do this operation? This is just my curiosity.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_tail/tail_db.c (1)

258-263: Verify SQL_INSERT_FILE parameter order and address skip_bytes type mismatch.

The bindings at lines 258-263 correctly match SQL_INSERT_FILE's parameter order (name, offset, inode, created, skip, anchor). However, skip_bytes is defined as uint64_t but bound via sqlite3_bind_int64, which expects a signed sqlite3_int64. While values exceeding INT64_MAX are unlikely for a byte offset, consider either casting to int64_t explicitly or using sqlite3_bind_int64 with a comment documenting the assumption that skip_bytes remains within signed int64 range.

🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)

1027-1141: Document multi-member gzip stream_offset limitation more prominently.

The comment at lines 1090-1095 acknowledges that stream_offset = skip_bytes is only correct for single-member gzip files. For multi-member gzip, skip_bytes resets at member boundaries, so stream_offset won't reflect total decompressed bytes from all prior members. This could affect offset_key accuracy when users concatenate multiple gzip streams.

Consider either:

  1. Adding a more prominent warning in the code or documentation that offset_key for multi-member gzip reflects position within the current member, not total decompressed bytes across all members.
  2. Tracking total decompressed bytes separately if multi-member gzip is a common use case.

Otherwise, the seek logic correctly handles DB-backed resume for both gzip and plain files, with proper initialization of stream_offset, exclude_bytes, and skipping_mode.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1263901 and 8640940.

📒 Files selected for processing (6)
  • plugins/in_tail/tail_db.c (11 hunks)
  • plugins/in_tail/tail_file.c (9 hunks)
  • plugins/in_tail/tail_file_internal.h (1 hunks)
  • plugins/in_tail/tail_fs_inotify.c (1 hunks)
  • plugins/in_tail/tail_fs_stat.c (1 hunks)
  • plugins/in_tail/tail_sql.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • plugins/in_tail/tail_file_internal.h
  • plugins/in_tail/tail_fs_stat.c
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/in_tail/tail_db.c
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_fs_inotify.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (2)
src/flb_sqldb.c (2)
  • flb_sqldb_query (151-166)
  • flb_sqldb_close (129-149)
plugins/in_tail/tail_file.h (1)
  • flb_tail_target_file_name_cmp (64-116)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
  • flb_tail_db_file_offset (366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-without-cxx (3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (16)
plugins/in_tail/tail_db.c (6)

28-34: LGTM!

The column-existence callback is correctly implemented. It properly sets the flag when a row is returned from the pragma query, enabling reliable column detection for schema migration.


61-108: LGTM!

The migration logic correctly uses pragma_table_info to detect column existence, distinguishing real errors from missing columns. The pattern is consistent for both skip and anchor columns, with proper error handling and logging.


318-363: LGTM!

The function correctly restores skip_bytes and anchor_offset from the database and initializes the runtime-only exclude_bytes and skipping_mode fields based on the persisted skip state. The logic properly handles both cases (skip > 0 and skip == 0).


424-446: LGTM!

Resetting db_id to FLB_TAIL_DB_ID_NONE after deletion prevents accidental resurrection of deleted DB entries when flb_tail_db_file_offset is called later. This is an important safeguard for data integrity.


365-398: The parameter binding order in flb_tail_db_file_offset correctly matches the SQL_UPDATE_OFFSET statement: offset (parameter 1), skip (parameter 2), anchor (parameter 3), and id (parameter 4).


182-221: Column indices for skip and anchor are correct.

The code correctly reads skip from column index 6 and anchor from column index 7, matching the table schema defined in SQL_CREATE_FILES. When SELECT * is executed, SQLite returns columns in the order they appear in the table definition: id (0), name (1), offset (2), inode (3), created (4), rotated (5), skip (6), and anchor (7).

plugins/in_tail/tail_fs_inotify.c (1)

259-281: LGTM!

The truncation handler correctly resets all gzip resume state fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) alongside the file offset and buffer, ensuring a clean state after truncation. This initialization is consistent with similar handling in tail_fs_stat.c and adjust_counters in tail_file.c.

plugins/in_tail/tail_sql.h (3)

30-40: LGTM!

The table schema correctly adds skip and anchor columns with INTEGER DEFAULT 0, ensuring backward compatibility with existing databases. Column indices (skip=6, anchor=7) align with the reads in db_file_exists.


45-47: LGTM!

The SQL_INSERT_FILE statement correctly includes skip and anchor in both the column list and VALUES clause. The parameter order matches the binding sequence in db_file_insert (tail_db.c lines 258-263).


52-53: LGTM!

The SQL_UPDATE_OFFSET statement correctly updates all three position-tracking fields (offset, skip, anchor) atomically. The parameter order matches the binding sequence in flb_tail_db_file_offset (tail_db.c lines 372-375).

plugins/in_tail/tail_file.c (6)

47-47: LGTM!

Including flb_compression.h is appropriate for the gzip decompression functionality added in this PR.


1328-1338: LGTM!

All gzip resume fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) are correctly initialized to zero/false when a new file is appended, ensuring clean initial state.


1509-1548: LGTM!

The offset rewind logic correctly handles buffered data on shutdown:

  • For non-compressed files: rewinds the offset by buf_len (with bounds checking) so unprocessed data is re-read on restart.
  • For compressed files: logs a warning explaining that accurate rewinding is infeasible with streaming decompression.
  • DB updates are properly guarded by db_id > FLB_TAIL_DB_ID_NONE to prevent resurrecting deleted entries.

This addresses the data loss issue described in the PR objectives.


1657-1679: LGTM!

The truncation handler in adjust_counters correctly resets all gzip resume state fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode) when a file is truncated, consistent with the inotify and stat-based truncation handlers.


1871-1891: LGTM!

The skip logic during decompression correctly handles resuming from a mid-stream position:

  • When exclude_bytes >= decompressed_data_length, all newly decompressed data is skipped and exclude_bytes is decremented.
  • When exclude_bytes < decompressed_data_length, the remaining bytes are shifted to the buffer start using memmove, and skipping_mode is cleared.

This enables accurate gzip resume from the persisted skip_bytes position.


1932-1953: LGTM!

The gzip member boundary handling correctly tracks position using the anchor/skip pattern:

  • skip_bytes is incremented by processed_bytes to track position within the current member.
  • When a member completes (decompressor transitions to EXPECTING_HEADER state and all buffers are empty), anchor_offset advances to the current raw file position and skip_bytes resets to 0.

This enables resume at member boundaries for multi-member gzip files, addressing the data loss issue for compressed inputs mentioned in the PR objectives.

@jinyongchoi
Copy link
Contributor Author

Thanks for the detailed analysis. I fully agree with your opinion. Although there is a risk of duplication in case of abrupt shutdown, it is technically difficult to solve completely, and duplication is definitely better than data loss.

Also, regarding your question about the logs, I have changed the log level of the database migration messages to debug. This ensures they are not too noisy during normal operation while still being available for troubleshooting if needed.

Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users.

Let me know what you think!
Thanks!

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a small nitpick issue but your PR is not following our coding style.
So, we need to follow the style of defining variables.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Dec 19, 2025

Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users.

Let me know what you think! Thanks!

I suppose that we need to Note annotations to depicts the possibility for database corruptions in the official documentation which should be corresponding PR for documentation. This could be corner cases but it's technically hard to solve cleanly.

Previously, when tailing gzip files, there was no mechanism to persistently
store the uncompressed position ('skip_bytes'). This meant that upon restart,
the plugin could not correctly locate the reading position, identifying it as
a rotation or new file case, potentially leading to data loss.

To fix this, 'skip_bytes' is now stored in the database to persist the
uncompressed offset. Additionally, 'exclude_bytes' is introduced to track
runtime skipping without interfering with the persistent value.

The SQLite schema has been updated to include 'anchor_offset' and 'skip_bytes'
columns to support these features.

Signed-off-by: jinyong.choi <inimax801@gmail.com>
@jinyongchoi
Copy link
Contributor Author

Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users.
Let me know what you think! Thanks!

I suppose that we need to Note annotations to depicts the possibility for database corruptions in the official documentation which should be corresponding PR for documentation. This could be corner cases but it's technically hard to solve cleanly.

Got it! I'll create a separate PR for the documentation.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

in_tail: Data loss on exit/restart due to unhandled buffer (partial lines)

3 participants