in_tail: fix data loss with buffered data on shutdown and gzip files#11269
in_tail: fix data loss with buffered data on shutdown and gzip files#11269jinyongchoi wants to merge 3 commits intofluent:masterfrom
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds gzip-aware resume state and DB-backed bookkeeping: new per-file fields (anchor_offset, skip_bytes, exclude_bytes, skipping_mode), DB schema migration and SQL updates for Changes
Sequence Diagram(s)sequenceDiagram
participant Disk as Disk File
participant Reader as in_tail Reader
participant Decompress as Gzip Decompressor
participant Buffer as In-memory Buffer
participant DB as SQLite DB
Disk->>Reader: read compressed/raw bytes (advance raw offset)
Reader->>Buffer: append raw bytes
alt decompression_context (gzip)
Buffer->>Decompress: feed compressed bytes
Decompress-->>Buffer: decompressed data (may span members)
Buffer->>Buffer: apply exclude_bytes / skipping_mode (drop initial decompressed bytes)
alt gzip member boundary reached
Decompress->>Reader: notify member end
Reader->>Reader: set anchor_offset (member start), reset skip_bytes
end
Reader->>DB: persist raw offset, skip, anchor
else no decompression
Buffer->>DB: persist raw offset
end
alt Shutdown with buffered unprocessed data and no decompression
Reader->>DB: rewind offset (offset -= buf_len) and persist
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (3)
🧰 Additional context used🧠 Learnings (12)📓 Common learnings📚 Learning: 2025-10-23T07:43:16.216ZApplied to files:
📚 Learning: 2025-09-22T15:59:55.794ZApplied to files:
📚 Learning: 2025-08-29T06:25:27.250ZApplied to files:
📚 Learning: 2025-08-29T06:24:26.170ZApplied to files:
📚 Learning: 2025-08-29T06:25:27.250ZApplied to files:
📚 Learning: 2025-08-29T06:25:02.561ZApplied to files:
📚 Learning: 2025-08-29T06:24:55.855ZApplied to files:
📚 Learning: 2025-08-29T06:24:44.797ZApplied to files:
📚 Learning: 2025-09-08T11:21:33.975ZApplied to files:
📚 Learning: 2025-08-31T12:46:11.940ZApplied to files:
📚 Learning: 2025-11-21T06:23:29.770ZApplied to files:
🧬 Code graph analysis (2)plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
🔇 Additional comments (16)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.
(toomanyconfigs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: Agent
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
There was a problem hiding this comment.
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
71208f6 to
f63f1f4
Compare
f63f1f4 to
76f3c05
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
plugins/in_tail/tail_file.c (1)
1471-1471: Remove trailing whitespace.There are trailing spaces after
#endifon this line.Apply this diff:
-#endif +#endif
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
1451-1472: Consider adding a warning for compressed files with buffered data.While the fix correctly excludes compressed files (where rewinding isn't possible due to gzip decompression state requirements), users monitoring
.gzfiles may be unaware that buffered data can still be lost on shutdown. Consider adding a debug or warning message similar to:if (file->buf_len > 0) { if (file->decompression_context != NULL) { flb_plg_debug(ctx->ins, "inode=%"PRIu64" compressed file %s has %lu bytes buffered; " "rewind not supported for compressed files", file->inode, file->name, (unsigned long)file->buf_len); } else { /* existing rewind logic */ ... } }This would help users understand the limitation without affecting correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.
(toomanyconfigs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)
1451-1472: LGTM! Rewind logic correctly prevents data loss for regular files.The implementation properly addresses the data loss issue when Fluent Bit shuts down with unprocessed buffered data:
- The condition
file->buf_len > 0 && file->decompression_context == NULLcorrectly identifies regular files with unprocessed data while excluding compressed files (where offset and buf_len have incompatible units).- The rewind arithmetic
offset -= buf_len(clamped to 0) correctly repositions the file offset to the start of unprocessed data.- The DB persistence call ensures the rewound offset survives restarts.
This addresses the core issue described in #11265 where partial lines and buffered data were lost on restart.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/in_tail/tail_file.c (2)
1451-1472: Core rewind logic looks correct; consider adding warning for compressed files.The check for
file->decompression_context == NULLproperly prevents the compressed-file arithmetic bug flagged in earlier reviews. For regular files, the offset and buffer length are in the same units, so the subtraction and DB update are valid.However, when compressed files have buffered decompressed data at shutdown, that data will be silently lost (since rewinding is skipped). Consider logging a warning in that case so users are aware:
if (file->buf_len > 0 && file->decompression_context == NULL) { /* * If there is data in the buffer, it means it was not processed. * We must rewind the offset to ensure this data is re-read on restart. */ off_t old_offset = file->offset; if (file->offset > file->buf_len) { file->offset -= file->buf_len; } else { file->offset = 0; } flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: old=%"PRId64" new=%"PRId64" (buf_len=%lu)", file->inode, file->name, old_offset, file->offset, (unsigned long)file->buf_len); #ifdef FLB_HAVE_SQLDB if (ctx->db) { flb_tail_db_file_offset(file, ctx); } #endif +} +else if (file->buf_len > 0 && file->decompression_context != NULL) { + flb_plg_warn(ctx->ins, "inode=%"PRIu64" cannot rewind compressed file %s; " + "%lu decompressed bytes in buffer may be lost on restart", + file->inode, file->name, (unsigned long)file->buf_len); }
1471-1471: Remove trailing whitespace.Minor formatting issue: extra spaces after
#endif.-#endif +#endif
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
🪛 Cppcheck (2.18.0)
plugins/in_tail/tail_file.c
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Too many #ifdef configurations - cppcheck only checks 12 configurations. Use --force to check all configurations. For more details, use --enable=information.
(toomanyconfigs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/in_tail/tail_db.c (1)
335-368: Adddb_idguards to allflb_tail_db_file_offset()call sites to prevent database entry duplication.The function's restoration logic at line 362 (when
sqlite3_changes()==0) will calldb_file_insert()and create duplicate database entries if invoked after settingfile->db_id = FLB_TAIL_DB_ID_NONE(lines 414, 1303). Six of eight call sites only checkif (ctx->db)orif (file->config->db)without verifyingdb_id != FLB_TAIL_DB_ID_NONE. Only tail_file.c lines 1498 and 1505 have the proper guard. Add thedb_idcheck to call sites at:
- tail_fs_stat.c:148
- tail_fs_inotify.c:278
- tail_file.c:1075, 1637, 1697, 1911
♻️ Duplicate comments (3)
plugins/in_tail/tail_db.c (1)
58-81: Usepragma_table_infofor migration checks (current logic can misfire on locked/corrupt DB).- ret = flb_sqldb_query(db, "SELECT skip FROM in_tail_files LIMIT 1;", NULL, NULL); - if (ret != FLB_OK) { + /* Check column existence explicitly (avoid treating unrelated SQL errors as "missing column") */ + ret = flb_sqldb_query(db, + "SELECT 1 FROM pragma_table_info('in_tail_files') WHERE name='skip' LIMIT 1;", + NULL, NULL); + if (ret != FLB_OK) { + flb_plg_error(ctx->ins, "db: could not query table info for migration"); + flb_sqldb_close(db); + return NULL; + } + /* TODO: detect 0-row result and only then ALTER TABLE (needs a callback or a helper that returns row count). */Same for
anchor.plugins/in_tail/tail_file.c (2)
1477-1509: Fix debug format specifiers and add the promised gzip “buffered data on exit” warning.
The log uses
PRId64foroff_tand%luforsize_t, which is not portable and can break-Wformat/ LLP64.For
decompression_context != NULL, the PR objectives say “log a warning when buffered data remain on exit”, but no warning is emitted.if (file->buf_len > 0) { if (file->decompression_context == NULL) { @@ - flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: " - "old=%"PRId64" new=%"PRId64" (buf_len=%lu)", - file->inode, file->name, old_offset, file->offset, - (unsigned long)file->buf_len); + flb_plg_debug(ctx->ins, "inode=%"PRIu64" rewind offset for %s: " + "old=%jd new=%jd (buf_len=%zu)", + file->inode, file->name, + (intmax_t) old_offset, (intmax_t) file->offset, + file->buf_len); @@ } else { + flb_plg_warn(ctx->ins, + "inode=%"PRIu64" file=%s: buffered data (%zu bytes) remains on exit for gzip input; " + "cannot rewind raw offset safely with streaming decompression", + file->inode, file->name, file->buf_len); #ifdef FLB_HAVE_SQLDB if (ctx->db && file->db_id > FLB_TAIL_DB_ID_NONE) { flb_tail_db_file_offset(file, ctx); } #endif } }
1832-1849: Fix gzip debug log format specifiers (%lutruncation risk).These should be
%jdfor offsets (cast tointmax_t) and%zufor size-like values. Same issue in the “Gzip member completed” log.- flb_plg_debug(ctx->ins, "Skipping: anchor=%lu offset=%lu exclude=%lu decompressed=%lu", - (unsigned long)file->anchor_offset, (unsigned long)file->offset, - (unsigned long)file->exclude_bytes, (unsigned long)decompressed_data_length); + flb_plg_debug(ctx->ins, "Skipping: anchor=%jd offset=%jd exclude=%zu decompressed=%zu", + (intmax_t) file->anchor_offset, (intmax_t) file->offset, + (size_t) file->exclude_bytes, (size_t) decompressed_data_length); @@ - flb_plg_debug(file->config->ins, - "Gzip member completed: updating anchor from %lu to %lu, resetting skip from %lu to 0", - (unsigned long)file->anchor_offset, (unsigned long)file->offset, - (unsigned long)file->skip_bytes); + flb_plg_debug(file->config->ins, + "Gzip member completed: updating anchor from %jd to %jd, resetting skip from %" PRIu64 " to 0", + (intmax_t) file->anchor_offset, (intmax_t) file->offset, + file->skip_bytes);Also applies to: 1890-1906
🧹 Nitpick comments (2)
plugins/in_tail/tail_fs_stat.c (1)
122-151: Truncation reset includes anchor/skip/exclude state (consistent with gzip resume model).Optional: consider making local
offsetanoff_t(matcheslseek()/file->offset).tests/runtime/in_tail.c (1)
2840-2875: Prefer parameter binding over SQL string interpolation (even in tests).Today
get_db_offset()builds aLIKE '%%%s%%'query; considerSELECT offset FROM in_tail_files WHERE name LIKE ?+sqlite3_bind_text()to avoid edge cases with filenames containing quotes/%.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
plugins/in_tail/tail.h(1 hunks)plugins/in_tail/tail_db.c(9 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)tests/runtime/in_tail.c(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/in_tail/tail_fs_inotify.c
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.cplugins/in_tail/tail_file_internal.hplugins/in_tail/tail_fs_stat.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
plugins/in_tail/tail_file.ctests/runtime/in_tail.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(336-369)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (6)
plugins/in_tail/tail.h (1)
36-44: Nice cleanup: introduceFLB_TAIL_DB_ID_NONEsentinel for db_id + config defaults.plugins/in_tail/tail_file_internal.h (1)
43-49: Types look right:offset/anchor_offsetasoff_t, skip/exclude asuint64_t.plugins/in_tail/tail_sql.h (1)
30-54: Schema + insert/update statements are coherent forskip/anchorpersistence.tests/runtime/in_tail.c (2)
272-301:write_raw()helper is a good fit for “incomplete line” shutdown tests.
3069-3389: DB+gzip tests are correctly registered underFLB_HAVE_SQLDB.Also applies to: 3417-3426
plugins/in_tail/tail_file.c (1)
1040-1113:set_file_position(): compressed resume viaanchor_offsetlooks coherent; please verifystream_offsetsemantics for gzip + offset_key.On gzip resume you seek to
anchor_offsetand initializeexclude_bytes = skip_bytes, butfile->stream_offsetstays at 0 for compressed files. Ifoffset_keyis enabled, this may reset offsets across restarts for gzip inputs. Please confirm that’s acceptable, or initializestream_offsetfrom persisted state (if intended).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/in_tail/tail_db.c (1)
176-212: Avoidoff_tfor DBskipfield to prevent truncation on 32‑bit builds
db_file_exists()andflb_tail_db_file_set()currently funnel the DBskipcolumn throughoff_t:
- Signature:
off_t *skip, off_t *anchor.- Locals:
off_t skip = 0; off_t anchor = 0;- Assignment:
file->skip_bytes = skip;wherefile->skip_bytesisuint64_t.- DB read:
*skip = sqlite3_column_int64(..., 6);On 32‑bit platforms where
off_tis 32‑bit, large gzip streams can produceskipvalues >4 GiB. Reading these viaoff_twill truncate the high bits, causing incorrect resume positions (duplicate or skipped data) even though SQLite stores full 64‑bit values.To keep the DB path consistent with the 64‑bit
skip_bytesfield and SQLite’sINTEGER(int64) storage, useuint64_tforskip(and optionally foranchorif you want full 64‑bit offsets):-static int db_file_exists(struct flb_tail_file *file, - struct flb_tail_config *ctx, - uint64_t *id, uint64_t *inode, - off_t *offset, off_t *skip, off_t *anchor) +static int db_file_exists(struct flb_tail_file *file, + struct flb_tail_config *ctx, + uint64_t *id, uint64_t *inode, + off_t *offset, uint64_t *skip, off_t *anchor) @@ - off_t offset = 0; - off_t skip = 0; - off_t anchor = 0; + off_t offset = 0; + uint64_t skip = 0; + off_t anchor = 0; @@ - *skip = sqlite3_column_int64(ctx->stmt_get_file, 6); + *skip = (uint64_t) sqlite3_column_int64(ctx->stmt_get_file, 6); @@ - file->skip_bytes = skip; + file->skip_bytes = skip;(Apply the corresponding signature/type updates to any other callers and related locals.)
Please double-check the intended width of
skip/anchoragainst the table definition intail_sql.hand your target platforms’off_tsize to ensure no truncation can occur.Also applies to: 309-352
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
plugins/in_tail/tail.h(1 hunks)plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)tests/runtime/in_tail.c(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/in_tail/tail_sql.h
- plugins/in_tail/tail.h
- plugins/in_tail/tail_fs_inotify.c
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_db.cplugins/in_tail/tail_fs_stat.cplugins/in_tail/tail_file_internal.htests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_lib.c (9)
flb_service_set(652-678)flb_input_set(305-335)flb_output_set(520-551)flb_start(983-994)flb_stop(1011-1055)flb_destroy(228-263)flb_create(143-225)flb_input(266-276)flb_output(279-289)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
🔇 Additional comments (5)
plugins/in_tail/tail_fs_stat.c (1)
127-151: Truncation state reset for skip/anchor fields looks correctResetting
anchor_offset,skip_bytes,exclude_bytes, andskipping_modetogether withoffset/buf_lenon truncation keeps runtime and DB skip state consistent for both compressed and uncompressed files.tests/runtime/in_tail.c (1)
274-303: New rewind + gzip DB tests and helpers align well with the PR objectivesThe additions around
write_raw(), gzip file helpers,cb_check_gzip_resume(),wait_for_count_with_timeout(), and the new DB-focused tests (flb_test_db_offset_rewind_on_shutdown,flb_test_db_gzip_*) accurately exercise:
- Rewind-on-shutdown for buffered incomplete lines on plain files.
- Gzip resume behavior across restart, inotify append, rotation, and multiple restarts.
- The intended “no rewind but warn” semantics for gzip on shutdown.
Resource handling (file/DB unlinks, ctx teardown) and assertions look reasonable for runtime tests.
Also applies to: 309-368, 374-415, 500-529, 2870-3397, 3427-3434
plugins/in_tail/tail_db.c (1)
28-35: PRAGMA-based column existence checks are a solid migration strategyUsing
pragma_table_info('in_tail_files')withcb_column_existsto detect andALTER TABLEin missingskip/anchorcolumns avoids misinterpreting other SQL errors as “missing column” and cleanly migrates older DBs.Also applies to: 61-102
plugins/in_tail/tail_file.c (2)
1037-1076: Seek/anchor/skip handling for compressed vs normal files is coherent
For DB-backed files,
set_file_position()now:
- Uses
anchor_offset(plusskip_bytes/exclude_bytes) to choose a seek position for compressed files.- Uses
offsetfor normal files.- Falls back to EOF only when there is no stored DB position and
read_from_headis false.In
flb_tail_file_chunk():
- On resume with
skipping_mode == FLB_TRUE, theexclude_bytescountdown drops previously processed decompressed bytes before feeding the parser, matching the anchor+skip design.- After processing,
skip_bytesis incremented byprocessed_bytes, and at gzip member boundaries anchor and skip are atomically updated (anchor_offset = offset; skip_bytes = 0), giving a clean restart point.Combined with the DB updates (
flb_tail_db_file_offsetbinding offset/skip/anchor), this provides a consistent and replay-safe position model for gzip streams without over-seeking or reprocessing more than intended.Also applies to: 1106-1112, 1899-1919
1303-1304: Offset rewind on shutdown and truncation state reset are correctly guarded
Initializing
file->db_idtoFLB_TAIL_DB_ID_NONEinflb_tail_file_append()ensures we don’t accidentally resurrect deleted DB rows when rewinding offsets.In
flb_tail_file_remove():
- For non-compressed files with buffered data, rewinding
offsetbybuf_len(bounded at 0) before callingflb_tail_db_file_offset()ensures any unprocessed bytes are re-read on restart, closing the data-loss gap described in #11265.- For compressed files, the code intentionally avoids raw-offset rewind and instead logs a clear debug message and persists the current state, which matches the documented limitation for gzip.
adjust_counters()now resetsanchor_offset,skip_bytes,exclude_bytes, andskipping_modewhen a file is truncated and also persists the reset state to the DB, avoiding stale skip state after truncation.These changes together make the DB position handling and shutdown behavior much safer without regressing rotation/deletion flows.
Also applies to: 1477-1515, 1606-1646
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/in_tail/tail_file_internal.h (1)
44-49: New offset/skip fields align with gzip/DB logicUsing
off_tforoffset/anchor_offsetanduint64_tforskip_bytes/exclude_bytesmatches how lseek and DBINTEGERare used in the implementation.skipping_modebeing anintis fine, though it could be tightened toboolfor clarity later.tests/runtime/in_tail.c (1)
417-460: DB offset rewind test is correct but could handle missing initial offset more defensivelyThe
flb_test_db_offset_rewind_on_shutdown()flow correctly:
- Writes an initial line and waits for it to be ingested.
- Captures the DB offset, writes an incomplete line without
\n, and waits until the DB offset advances.- Stops Fluent Bit so
flb_tail_file_remove()can rewind the stored offset bybuf_len.- Restarts, writes a bare newline, and asserts that exactly one record (the previously buffered line) is emitted.
One small robustness improvement: if
get_db_offset()still returns-1after the second attempt, you currently proceed andcurrent_offset > start_offsetwill trivially pass. Consider looping on the initial offset until it becomes non‑negative (with a timeout), or explicitlyTEST_CHECK(start_offset >= 0)and abort early to avoid false positives if DB writes fail.Also applies to: 2872-3074
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
plugins/in_tail/tail.h(1 hunks)plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)tests/runtime/in_tail.c(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/in_tail/tail_fs_inotify.c
- plugins/in_tail/tail_fs_stat.c
- plugins/in_tail/tail.h
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_db.cplugins/in_tail/tail_file_internal.htests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (3)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_lib.c (4)
flb_input_set(305-335)flb_output_set(520-551)flb_input(266-276)flb_output(279-289)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(357-390)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-centos-7
- GitHub Check: PR - fuzzing test
🔇 Additional comments (9)
plugins/in_tail/tail_sql.h (1)
37-53: Schema and CRUD macros are consistent with newskip/anchorfields
SQL_CREATE_FILES,SQL_INSERT_FILE, andSQL_UPDATE_OFFSETnow includeskipandanchorin a way that matches the column order consumed indb_file_exists()and the insert/offset statements intail_db.c. This keeps new and migrated DBs in sync withstruct flb_tail_filewithout backward‑compat issues.tests/runtime/in_tail.c (3)
37-40: Test helpers for raw writes, gzip files, and msgpack inspection look solid
write_raw()correctly supports “data only”, “newline only”, and “data + newline” and checkswrite()results, matching the existingwrite_msg()style.create_gzip_file()/append_gzip_file()now validateflb_gzip_compress()andfwrite()results and clean up on error, which avoids silently truncated gzip test artifacts.cb_check_gzip_resume()now guards msgpack string lengths and usesmemcmpinstead ofstrncmpon non‑NUL‑terminated buffers, fixing the prior OOB risk.No issues here; good test‑scaffolding improvements.
Also applies to: 274-368, 374-415, 502-531
3076-3400: Gzip+DB resume/rotation tests comprehensively exercise the new skip/anchor behaviorThe four gzip tests:
flb_test_db_gzip_resume_lossflb_test_db_gzip_inotify_appendflb_test_db_gzip_rotationflb_test_db_gzip_multi_resumecorrectly validate that:
- Incomplete final lines in a gzip member aren’t lost across restart (resume_loss).
- Inotify/refresh‑driven appends are picked up without duplicating earlier lines.
- Rotation plus writes to both new and old gzip files within
rotate_waitare handled and all lines are seen exactly once.- Multiple restarts only process newly appended gzip members (
multi_resume), using the persisted skip/anchor state.State (
t_ctx.count,t_ctx.found_line2) is reset between runs where needed, DB and gzip files are unlinked at start and end, and all four functions are correctly guarded byFLB_HAVE_SQLDB. This is good coverage of the new resume semantics.
3429-3436: New DB/gzip tests are properly registered and guardedThe added entries in
TEST_LISTfordb_offset_rewind_on_shutdownand the four gzip+DB scenarios are all enclosed in the#ifdef FLB_HAVE_SQLDBblock and match their function names. Test wiring looks correct.plugins/in_tail/tail_db.c (2)
28-34: Column‑existence migration viapragma_table_infois a solid improvementThe new
cb_column_exists+SELECT 1 FROM pragma_table_info('in_tail_files') WHERE name='skip'/'anchor'checks cleanly distinguish “column missing” from general SQL errors and only runALTER TABLEwhen needed. Errors in the PRAGMA query now abort DB init instead of silently attempting migration, which is safer than the previousSELECT skip FROM ...approach.Also applies to: 43-43, 61-102
176-211: DB ↔ in‑memory wiring forskip/anchoranddb_idreset is consistent
db_file_exists()now returnsskip(col 6) andanchor(col 7), matching the schema andSELECT *order.db_file_insert()bindsfile->skip_bytesandfile->anchor_offset, so new rows persist the current gzip resume state.flb_tail_db_file_set()restoresoffset,skip_bytes, andanchor_offsetintostruct flb_tail_fileand initializesexclude_bytes/skipping_modewhen a non‑zero skip is present, which is exactly what the gzip resume path expects.flb_tail_db_file_offset()updates offset/skip/anchor together;sqlite3_changes()==0still triggers a re‑insert, preserving the legacy “recreate row if someone deleted it” behavior.flb_tail_db_file_delete()now setsfile->db_id = FLB_TAIL_DB_ID_NONE, which works in tandem with thedb_id > FLB_TAIL_DB_ID_NONEchecks intail_file.cto avoid resurrecting deliberately deleted entries.The DB layer looks internally consistent with the new gzip/anchor semantics.
Also applies to: 248-255, 309-354, 362-367, 415-436
plugins/in_tail/tail_file.c (3)
1032-1076: DB‑backed seek logic cleanly separates normal vs gzip files
set_file_position()now:
- Uses
flb_tail_db_file_set()to populateoffset,anchor_offset, andskip_byteswhen a DB is configured.- For compressed files (
decompression_context != NULL), seeks byanchor_offsetwhen a DB position exists, ignoring the rawoffsetcolumn and usingskip_bytesonly as a “have DB state” flag.- For non‑compressed files, continues to seek by
offset, falling back to tail‑of‑file when no DB position andread_from_headis false.- Initializes
anchor_offsetandexclude_bytesappropriately for compressed files after seeking.This matches the “anchor + skip” model and keeps legacy read‑from‑head/tail semantics intact.
Also applies to: 1106-1113
1303-1304: Offset rewind on removal for buffered data is correct and guarded against DB resurrection
- New files start with
file->db_id = FLB_TAIL_DB_ID_NONE, and DB insertion later sets a positive ID.- In
flb_tail_file_remove():
- For uncompressed files with
buf_len > 0, the code rewindsfile->offsetbybuf_len(clamped at 0) so unprocessed bytes will be re‑read on restart.- The DB is updated only when
ctx->dbis set andfile->db_id > FLB_TAIL_DB_ID_NONE, which prevents resurrecting rows that were explicitly deleted earlier (e.g., in stale/deleted‑file paths).- For compressed files, it intentionally does not rewind the raw offset and instead logs a detailed debug message plus persists the current DB state, matching the documented limitation for gzip.
This directly addresses the buffered‑data loss on shutdown for plain files while keeping compressed behavior conservative.
Also applies to: 1477-1515
1606-1652: Gzip skip/anchor state handling and truncation resets are coherent
- On truncation in
adjust_counters(), the code now resets:
anchor_offsetto the new offset,skip_bytesandexclude_bytesto 0,skipping_modetoFLB_FALSE,
and writes the updated state back to the DB when enabled. This avoids carrying stale gzip resume state across truncations.- In
flb_tail_file_chunk()for compressed files:
skipping_mode+exclude_bytestransiently drop the firstskip_bytesworth of decompressed data after a restart, without mutating the persistedskip_bytes.- Afterwards,
skip_bytesis incremented byprocessed_bytesso the DB always reflects how many decompressed bytes have been consumed since the last anchor.- When a gzip member is fully consumed (EXPECTING_HEADER, no input buffer, empty tail buffer),
anchor_offsetis advanced to the current rawoffsetandskip_bytesis reset to 0, so future restarts can begin from the next member boundary.Together, this gives a consistent compressed‑file resume story and keeps DB‑stored
skip/anchorin sync with the new in‑memory fields.Also applies to: 1838-1858, 1899-1919
|
@codex review |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/runtime/in_tail.c (1)
439-442: SQL injection risk in test code.The
snprintfconstructs a query using string interpolation withfile_name. While this is test code and the filename is controlled, usingLIKE '%%%s%%'with unescaped input could be problematic if filenames contain special characters like%,_, or'.For test robustness, consider using parameter binding or at minimum escaping the filename.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
plugins/in_tail/tail.h(1 hunks)plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)tests/runtime/in_tail.c(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- plugins/in_tail/tail_fs_inotify.c
- plugins/in_tail/tail.h
- plugins/in_tail/tail_file_internal.h
- plugins/in_tail/tail_fs_stat.c
- plugins/in_tail/tail_sql.h
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_db.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(357-390)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (21)
plugins/in_tail/tail_db.c (6)
28-34: LGTM: Clean callback for migration detection.The callback approach properly addresses the previous review feedback about using
PRAGMA table_infoto detect column existence. The callback sets a flag when a row is returned, which reliably indicates column presence.
61-101: Migration logic looks correct.The PRAGMA-based migration detection properly handles the case where the table exists but columns are missing. This addresses the previous review concern about misinterpreting non-schema errors as missing columns.
197-201: Good defensive cleanup on error path.Adding
sqlite3_clear_bindingsandsqlite3_resetbefore returning on the error path ensures proper statement cleanup, preventing potential resource leaks or state corruption.
434-436: Good: Reset db_id prevents resurrecting deleted entries.Setting
db_idtoFLB_TAIL_DB_ID_NONEafter deletion ensures that subsequent offset updates won't inadvertently re-insert a row for a deleted file. This directly addresses the PR objective about preventing DB entry resurrection.
363-366: SQL UPDATE binding order is correct.The bindings match the SQL_UPDATE_OFFSET statement parameters: position 1 (offset), 2 (skip), 3 (anchor), and 4 (db_id) align with the SQL statement
UPDATE in_tail_files set offset=@offset, skip=@skip, anchor=@anchor WHERE id=@id;
339-350: Struct is guaranteed zero-initialized viacalloc(), making this initialization safe.The
flb_tail_filestruct is allocated usingflb_calloc(1, sizeof(struct flb_tail_file)), which is a wrapper around standard Ccalloc(). Thecalloc()function guarantees zero-initialization, soexclude_bytesandskipping_modeare always initialized to 0 (and FLB_FALSE) at allocation time. The explicit initialization in the else branch is therefore redundant but not problematic.tests/runtime/in_tail.c (7)
37-40: LGTM: SQLite header properly guarded.The
#include <sqlite3.h>is now correctly wrapped with#ifdef FLB_HAVE_SQLDB, addressing the previous review concern about builds without SQLite support.
274-303: Good test helper for incomplete line scenarios.The
write_rawfunction enables testing the exact scenario described in the PR: writing data without a trailing newline to simulate incomplete lines remaining in the buffer at shutdown.
328-331: LGTM: fwrite error checking added.Both gzip helpers now properly check
fwritereturn values and handle short writes by cleaning up and returning an error. This addresses the previous review concern.Also applies to: 358-361
397-404: LGTM: Safe msgpack string comparison.The callback now uses
memcmpwith proper size validation (key.via.str.size == 3andv.via.str.size >= 5) to avoid out-of-bounds reads, addressing the previous review feedback.
417-460: LGTM: get_db_offset properly guarded.The function is now wrapped with
#ifdef FLB_HAVE_SQLDB, preventing compilation errors when SQLite is not available.
3140-3141: Good: found_line2 reset before second run.The test properly resets both
t_ctx.countandt_ctx.found_line2before the second Fluent Bit run, addressing the previous review concern about false positives from stale state.
3188-3190: Good test hygiene: DB file unlinked at start.The test properly cleans up both the log file and DB file before starting, preventing cross-run leakage.
plugins/in_tail/tail_file.c (8)
47-47: LGTM: Compression header included for decompression state checking.The inclusion of
flb_compression.his needed to accessFLB_DECOMPRESSOR_STATE_EXPECTING_HEADERfor gzip member boundary detection.
1040-1065: Good gzip-aware seek position logic.The code correctly differentiates between compressed and uncompressed files:
- Compressed: seek to
anchor_offset(gzip member boundary)- Uncompressed: seek to stored
offsetThe
has_db_positioncheck ensures we only seek when there's a valid stored position.
1109-1112: LGTM: Initialize exclude_bytes for non-DB gzip files.When seeking without DB state for a gzip file,
anchor_offsetandexclude_bytesare properly initialized, ensuring consistent state.
1477-1515: Core offset rewind logic looks correct.The implementation properly:
- Only rewinds for uncompressed files (
decompression_context == NULL)- Checks
db_id > FLB_TAIL_DB_ID_NONEbefore updating DB- Logs a warning for gzip files where rewind is infeasible
- Uses correct format specifiers (
%jdforintmax_t,%zuforsize_t)This addresses the main PR objective of preventing data loss on shutdown.
1635-1638: Good: Reset all gzip-related state on truncation.The truncation handler now resets
anchor_offset,skip_bytes,exclude_bytes, andskipping_mode, addressing the previous review concern about inconsistent state after truncation.
1838-1858: Skipping logic correctly handles partial and complete skips.The implementation properly handles:
- Complete skip: decrement
exclude_bytes, setdecompressed_data_length = 0- Partial skip:
memmoveremaining data, clearskipping_modeFormat specifiers are correct (
%jdwithintmax_tcast,%zuforsize_t).
1899-1919: Gzip member boundary detection for anchor updates.The condition checks for:
- Decompressor awaiting next header (
FLB_DECOMPRESSOR_STATE_EXPECTING_HEADER)- No remaining compressed input (
input_buffer_length == 0)- No remaining decompressed buffer (
buf_len == 0)This ensures anchor updates occur only at clean gzip member boundaries, enabling accurate resume.
1906-1908: No issues.FLB_DECOMPRESSOR_STATE_EXPECTING_HEADERis properly defined ininclude/fluent-bit/flb_compression.hat line 32.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/runtime/in_tail.c (1)
2886-3086: Tighten DB rewind test to fail fast if DB offset never advancesThe overall flow of
flb_test_db_offset_rewind_on_shutdown()matches the bug you’re targeting (DB offset advanced while an incomplete line stays buffered, then verifying replay after restart). One corner case: ifget_db_offset()never observes an offset > 0 or the second poll never seescurrent_offset > start_offset, the test logs a warning viaTEST_CHECK/TEST_MSGbut continues. That can let the test pass without actually exercising the “advanced DB offset + buffered data” scenario.Consider treating a non‑advancing DB offset as a hard failure so the test can’t silently turn into a no‑op if timing changes. For example:
- if (!TEST_CHECK(current_offset > start_offset)) { - TEST_MSG("DB offset did not advance. start=%ld current=%ld", start_offset, current_offset); - } + if (!TEST_CHECK(current_offset > start_offset)) { + TEST_MSG("DB offset did not advance. start=%ld current=%ld", + start_offset, current_offset); + test_tail_ctx_destroy(ctx); + unlink(file[0]); + unlink(db); + exit(EXIT_FAILURE); + }This keeps the test robust against future timing changes in the tailer/DB path.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
plugins/in_tail/tail.h(1 hunks)plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)tests/runtime/in_tail.c(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/in_tail/tail_sql.h
- plugins/in_tail/tail_fs_inotify.c
- plugins/in_tail/tail.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file_internal.htests/runtime/in_tail.cplugins/in_tail/tail_file.cplugins/in_tail/tail_db.cplugins/in_tail/tail_fs_stat.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (3)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_lib.c (5)
flb_service_set(652-678)flb_input_set(305-335)flb_output_set(520-551)flb_input(266-276)flb_output(279-289)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(366-399)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (9)
plugins/in_tail/tail_fs_stat.c (1)
128-143: Truncation path correctly resets new skip/anchor stateResetting
anchor_offset,skip_bytes,exclude_bytes, andskipping_modealongsideoffset/buf_lenon truncation keeps gzip skip state consistent withadjust_counters()and avoids stale skip on truncated files. Looks good.plugins/in_tail/tail_file_internal.h (1)
44-50: Struct extensions and type choices align with DB and gzip logicSwitching
offsettooff_tand introducinganchor_offset,skip_bytes/exclude_bytes(uint64_t), andskipping_modeis consistent with how offsets and skip counters are handled intail_db.candtail_file.c, including 64‑bit DB storage. No issues from a layout or semantics standpoint.plugins/in_tail/tail_db.c (3)
28-35: Column migration viapragma_table_infois robust and fail‑fastUsing
PRAGMA table_info('in_tail_files')withcb_column_existscleanly distinguishes “column missing” from real SQL errors, and aborts if the pragma itself fails. Addingskip/anchoronly whencolumn_found == 0avoids repeatedALTER TABLEattempts and the earlier misdetection problem.Also applies to: 42-44, 61-108
182-221: Skip/anchor plumbing between DB and in‑memory state is consistent
db_file_exists()now returnsskipasuint64_tandanchorasoff_t, reading them from columns 6 and 7; this matches the SQL layout and avoids 32‑bit truncation on large gzip members.db_file_insert()bindsfile->skip_bytesandfile->anchor_offset, ensuring new rows are created with correct initial values.flb_tail_db_file_set()applies DB state back into the file:offset,skip_bytes,anchor_offset, and initializesexclude_bytes/skipping_modeonly whenskip_bytes > 0, which lines up with the transient skip semantics used intail_file.c.flb_tail_db_file_offset()updates offset, skip, and anchor together, keeping DB and in‑memory state aligned.This all looks coherent with the gzip resume design.
Also applies to: 248-280, 318-360, 365-399
424-445: Resettingdb_idon delete correctly prevents row resurrectionAfter a successful delete, setting
file->db_id = FLB_TAIL_DB_ID_NONEensures later calls that gate ondb_id > FLB_TAIL_DB_ID_NONE(e.g., inflb_tail_file_remove()) will not re‑insert a row for a file that was deliberately removed. This addresses the stale‑entry resurrection concern without changing existing update paths.tests/runtime/in_tail.c (2)
37-40: Helper and guard additions for gzip/DB tests look correct
write_raw()nicely models incomplete vs completed lines without duplicating logic fromwrite_msg().create_gzip_file()/append_gzip_file()now checkfwrite()return values and clean up on short write, avoiding silently corrupted gzip test artifacts.struct test_ctx+cb_check_gzip_resume()correctly guard msgpack string lengths and usememcmp, eliminating the prior OOB risk.get_db_offset()and the sqlite3 include are cleanly wrapped in#ifdef FLB_HAVE_SQLDB, so non‑SQLite builds won’t see sqlite symbols.wait_for_count_with_timeout()follows the same time‑diff pattern as existing wait helpers.No functional concerns here.
Also applies to: 274-303, 309-338, 340-367, 369-415, 417-459, 501-530
3088-3411: Gzip+DB runtime tests comprehensively cover the new resume semanticsThe four gzip DB tests (
db_gzip_resume_loss,db_gzip_inotify_append,db_gzip_rotation,db_gzip_multi_resume) exercise:
- replaying an incomplete last line across restarts,
- inotify/append behavior while running,
- rotation with old/new gzip files both receiving appends within
rotate_wait,- multiple resume cycles using
skip_bytes/anchor_offset.All are correctly guarded by
#ifdef FLB_HAVE_SQLDB, and the new TEST_LIST entries are under the same guard, so non‑SQLite builds won’t see DB‑dependent tests. This is solid coverage for the new anchor/skip design.Also applies to: 3441-3448
plugins/in_tail/tail_file.c (2)
47-48: Gzip resume: seek/anchor/skip behavior matches the intended design
- Including
<fluent-bit/flb_compression.h>here is appropriate given the direct use of decompressor state.- In
set_file_position():
- DB‑enabled paths now treat
file->offset > 0as the primary persisted position for both plain and gzip files, so migrated DBs that only haveoffsetare still respected.- For gzip files with established state (
anchor_offset > 0 || skip_bytes > 0), you overrideseek_postoanchor_offset, ensuring we always start decompressing from the beginning of the tracked member and then apply skipping viaskip_bytes/exclude_bytes.- When there is no DB position and
read_from_headis off, seeking to EOF and settinganchor_offsetaccordingly preserves the “tail from end” semantics while initializing anchor for future DB updates.- Post‑seek init for gzip without DB (
anchor_offset = file->offset; exclude_bytes = file->skip_bytes) cleanly seeds the transient skip countdown for the next run ifskip_bytesis ever non‑zero.- In
flb_tail_file_chunk():
- The
skipping_mode+exclude_bytesblock correctly discards already‑processed decompressed bytes before handing any remainder toprocess_content(), usingmemmovesafely over overlapping buffer regions.skip_bytes += processed_bytestracks logical decompressed progress, and the “gzip member completed” check updatesanchor_offsetto the current rawoffsetand resetsskip_bytesto 0 only when we’re at a member boundary with no buffered decompressed data. This matches the intended “anchor + relative skip” model per member.This end‑to‑end behavior looks coherent for both upgraded and newly created gzip DB entries.
Also applies to: 1027-1115, 1839-1859, 1901-1921
1304-1305: Offset rewind and DB‑ID handling correctly avoid stale or resurrected DB rows
- Initializing
file->db_idtoFLB_TAIL_DB_ID_NONEinflb_tail_file_append()ensures we never try to update offsets for files that haven’t been inserted into the DB yet.- In
flb_tail_file_remove():
- For non‑compressed files, rewinding
file->offsetbyfile->buf_len(capped at 0) when the buffer is non‑empty ensures any unprocessed bytes are re‑read on restart, directly addressing the original data‑loss bug.- The DB update is gated on
ctx->db && file->db_id > FLB_TAIL_DB_ID_NONE, so entries that were explicitly deleted viaflb_tail_db_file_delete()(which now resetsdb_idto the sentinel) are not silently re‑created.- For gzip files, you intentionally avoid rewinding the raw offset, emit a clear debug message about buffered data, and still persist the current DB state when applicable; this matches the stated “no rewind for compressed” constraint.
adjust_counters()now resetsanchor_offset,skip_bytes,exclude_bytes, andskipping_modewhen a truncation is detected, aligning the truncation path with the fs backends and preventing stale skip state after a compressed file is truncated.These pieces together give the right rewind semantics for plain files and safe DB behavior around deletion/truncation.
Also applies to: 1478-1516, 1632-1647
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/in_tail/tail_file.c (1)
1054-1078: Missingexclude_bytesandskipping_modeinitialization for gzip DB resume.When resuming a gzip file from DB with
skip_bytes > 0:
skip_bytesis loaded from DB- Code seeks to
anchor_offsetand returns at line 1078exclude_bytesandskipping_modeare never initializedWithout
skipping_mode = FLB_TRUE, the skip logic at line 1839 won't trigger, causing already-processed decompressed data to be reprocessed—resulting in duplicate logs.Add initialization after seeking for gzip files in the DB path:
file->offset = ret; + + /* Initialize skip state for gzip resume */ + if (file->decompression_context != NULL && file->skip_bytes > 0) { + file->exclude_bytes = file->skip_bytes; + file->skipping_mode = FLB_TRUE; + } } else if (ctx->read_from_head == FLB_FALSE) {
🧹 Nitpick comments (2)
plugins/in_tail/tail_db.c (1)
216-220: Column indices are correct; consider explicit SELECT for robustness.The column access indices at lines 216-220 correctly reference the schema:
- Column 6 →
skip✓- Column 7 →
anchor✓However, since
SQL_GET_FILEusesSELECT *, the hardcoded indices are brittle against future schema changes. Consider refactoring to explicitly name columns:SELECT id, name, offset, inode, created, rotated, skip, anchor FROM in_tail_files ...to ensure column positions remain correct.plugins/in_tail/tail_file.c (1)
1110-1113:skipping_modenot set in non-DB decompression path.If
skip_bytes > 0here,skipping_modeshould also be set toFLB_TRUEfor the skip logic to trigger. Whileskip_bytesis likely 0 in this non-DB path, the initialization should be consistent.else { file->anchor_offset = file->offset; file->exclude_bytes = file->skip_bytes; /* Init transient skip countdown */ + if (file->skip_bytes > 0) { + file->skipping_mode = FLB_TRUE; + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
plugins/in_tail/tail.h(1 hunks)plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)tests/runtime/in_tail.c(5 hunks)
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_fs_stat.cplugins/in_tail/tail_file_internal.hplugins/in_tail/tail_db.ctests/runtime/in_tail.cplugins/in_tail/tail_file.cplugins/in_tail/tail_fs_inotify.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
tests/runtime/in_tail.cplugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (3)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
tests/runtime/in_tail.c (3)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_lib.c (6)
flb_input_set(305-335)flb_output_set(520-551)flb_start(983-994)flb_stop(1011-1055)flb_input(266-276)flb_output(279-289)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-centos-7
🔇 Additional comments (26)
plugins/in_tail/tail.h (1)
36-38: LGTM!The
FLB_TAIL_DB_ID_NONEmacro is a clean addition for representing the "not in database" state. Using 0 is appropriate since SQLite rowids start at 1, so this serves as a safe sentinel value to prevent resurrecting deleted DB entries.plugins/in_tail/tail_file_internal.h (1)
44-49: LGTM!The new per-file state fields are well-structured:
skip_bytesandexclude_bytescorrectly useuint64_tto avoid 32-bit truncation (addressing the past review concern)anchor_offsetusesoff_tconsistent with the file offset type- Clear comments documenting each field's purpose
plugins/in_tail/tail_fs_inotify.c (1)
270-273: LGTM!Proper initialization of the new per-file state fields during file truncation. Resetting
anchor_offset,skip_bytes,exclude_bytes, andskipping_modewhen a file is truncated ensures the skip/anchor tracking starts fresh, which is correct behavior for handling truncated files.plugins/in_tail/tail_fs_stat.c (1)
139-142: LGTM!Consistent with the inotify backend changes. Both filesystem backends now properly reset the skip/anchor state fields during file truncation, ensuring uniform behavior regardless of which backend is in use.
plugins/in_tail/tail_sql.h (1)
30-53: LGTM!The SQL schema and statement updates are well-structured:
- New
skipandanchorcolumns withDEFAULT 0ensure backward compatibilitySQL_INSERT_FILEandSQL_UPDATE_OFFSETcorrectly include the new fields- The migration logic in
tail_db.chandles upgrades from older databasesplugins/in_tail/tail_db.c (4)
28-34: LGTM!Clean implementation of the
cb_column_existscallback for detecting column presence during migration. This addresses the past review concern about usingSELECTqueries which could misfire on non-schema errors.
61-107: LGTM!The migration logic correctly uses
PRAGMA table_infoto detect missing columns, addressing the past review feedback. The pattern properly handles both theskipandanchorcolumns with appropriate error handling and cleanup.
348-359: LGTM!Correct initialization of
skipping_modeandexclude_bytesbased on the persistedskip_bytesvalue. When resuming from a DB with a non-zero skip value, the file will enter skipping mode to skip already-processed decompressed bytes.
444-444: LGTM!Resetting
db_idtoFLB_TAIL_DB_ID_NONEupon deletion prevents accidentally resurrecting deleted DB entries whenflb_tail_db_file_offsetis later called. This directly addresses the PR objective.tests/runtime/in_tail.c (11)
37-40: LGTM!The
sqlite3.hinclude is now properly guarded withFLB_HAVE_SQLDB, addressing the past review concern about builds without SQLite headers.
274-303: LGTM!The
write_rawhelper is well-designed for testing incomplete line scenarios. It correctly handles:
- Optional data write with length validation
- Optional newline append controlled by
add_newlineflag- Proper fsync to ensure data is flushed
This enables testing the exact scenario where buffered data without a trailing newline remains in the tail buffer.
309-367: LGTM!Both gzip helpers now properly check
fwrite()return values and handle errors correctly, addressing the past review concern about silent truncation. The error handling cleans up resources (closes file, frees memory) before returning -1.
397-406: LGTM!The msgpack string comparisons now correctly use
memcmpwith explicit size checks, addressing the past review concern about OOB reads with non-NUL-terminated msgpack strings:
- Key comparison:
key.via.str.size == 3 && memcmp(..., "log", 3) == 0- Value comparison:
v.via.str.size >= 5 && memcmp(..., "line2", 5) == 0
417-459: LGTM!The
get_db_offsethelper is now properly guarded with#ifdef FLB_HAVE_SQLDB, addressing the past review concern about compilation failures on builds without SQLite support.
2870-3091: Comprehensive test for offset rewind on shutdown.This test properly validates the core fix by:
- Writing data with a complete line, verifying it's processed
- Writing data without newline (stays in buffer as incomplete line)
- Polling DB until offset advances (data read into buffer)
- Stopping Fluent Bit abruptly
- Restarting and verifying buffered data is re-read
The test methodology is sound for verifying the data loss prevention mechanism.
3093-3175: LGTM!The gzip resume loss test correctly validates:
- First run processes only complete lines (
line1)- Incomplete line (
line2without newline) stays in buffer- After restart with appended data,
line2andline3are both processedfound_line2flag confirmsline2wasn't lostThe
t_ctx.found_line2 = 0reset before restart addresses the past review concern about false positives.
3178-3239: LGTM!Test properly validates gzip file append detection via inotify/refresh while Fluent Bit is running. The test confirms that appended gzip members are correctly detected and processed.
3242-3320: LGTM!Comprehensive rotation test that validates:
- Initial file processing
- File rotation (rename to .gz.1)
- New file with same name created
- Appends to both old (rotated) and new files within rotate_wait window
- Both files continue to be monitored and processed correctly
3323-3416: LGTM!Multi-resume test validates that gzip skip/anchor tracking works correctly across multiple restart cycles:
- Run 1: Process line1, stop
- Run 2: Skip line1, process only line2, stop
- Run 3: Skip line1+line2, process only line3
This confirms the skip_bytes persistence correctly prevents duplicate processing.
3449-3453: LGTM!All new DB/gzip tests are correctly registered within the
#ifdef FLB_HAVE_SQLDBguard, ensuring they're only compiled when SQLite support is available.plugins/in_tail/tail_file.c (6)
47-47: LGTM!Required header for the decompression context types and functions used in the gzip handling logic.
1304-1304: LGTM!Using the explicit
FLB_TAIL_DB_ID_NONEconstant instead of an implicit 0 improves clarity and aligns with the resurrection prevention logic at line 1499.
1478-1516: LGTM!The rewind logic correctly:
- Rewinds uncompressed files by
buf_lento prevent data loss- Avoids rewinding gzip files (infeasible with streaming decompression) and logs a warning
- Checks
db_id > FLB_TAIL_DB_ID_NONEto prevent resurrecting deleted DB entries- Uses proper format specifiers (
%jdwithintmax_tcast,%zuforsize_t)
1636-1639: LGTM!All gzip-related state (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) is properly reset on file truncation. This addresses the previous review concern about inconsistentexclude_bytesreset.
1839-1859: LGTM!The skip logic correctly handles gzip resume by discarding already-processed decompressed bytes:
- Tracks remaining bytes to skip with
exclude_bytes- Uses
memmoveto shift usable data when skip completes- Clears
skipping_modewhen done- Uses proper format specifiers
1900-1921: LGTM!The gzip member boundary detection is well-designed:
- Updates
anchor_offsetonly when a member is fully complete (state awaiting next header, no buffered compressed or decompressed data)- Resets
skip_bytesto 0 at member boundaries since no skipping is needed- Enables safe resume from the next gzip member on restart
This addresses the fundamental challenge of gzip offset tracking with streaming decompression.
|
@codex review |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
plugins/in_tail/tail_file_internal.h (1)
44-49: Clarifyskipping_mode’s semanticsThe new offset/anchor/skip fields look appropriate, but the
skipping_modecomment (/* skipping mode ? */) is vague. Suggest clarifying what the flag actually represents (e.g., “actively discarding decompressed bytes up to exclude_bytes for gzip resume”) to aid future maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/in_tail/tail_fs_stat.c
- plugins/in_tail/tail_fs_inotify.c
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_file_internal.hplugins/in_tail/tail_file.cplugins/in_tail/tail_db.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(366-399)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (8)
plugins/in_tail/tail_sql.h (1)
31-40: Schema & SQL constants line up with new skip/anchor handlingThe extended
in_tail_filesschema (skip,anchor) and the updatedINSERT/UPDATEstatements are consistent with the column indices used indb_file_exists()andflb_tail_db_file_offset()(columns 6/7 for skip/anchor). This keeps DB reads/writes coherent across migrations and new installs.Also applies to: 45-47, 52-53
plugins/in_tail/tail_db.c (3)
28-35: Migration viapragma_table_infois robust and fails fast on real DB errorsUsing
SELECT 1 FROM pragma_table_info('in_tail_files') WHERE name='skip'/'anchor'withcb_column_existscleanly distinguishes “no such column” (no rows) from generic SQL failures. On error you now log, close the DB, and abort instead of mis-treating lock/corruption/permission issues as a missing column, which is safer.Also applies to: 61-83, 85-107
182-221: DB ↔ runtime skip/anchor wiring looks correct for gzip resume
db_file_exists()now pullsskip(col 6) andanchor(col 7), matching the schema.flb_tail_db_file_set()copies those intofile->skip_bytes/file->anchor_offsetand initializesexclude_bytes/skipping_modebased onskip_bytes.This ensures a restarted gzip tailer can seek to the stored anchor, discard
skip_bytesworth of decompressed data, and continue without duplicates, while keeping raw offsets infile->offset.Also applies to: 318-360
365-376: Avoiding resurrection of deleted DB rows viadb_idsentinel is soundBinding
skip_bytesandanchor_offsetintostmt_offsetkeeps the DB in sync with the new gzip state, and resettingfile->db_idtoFLB_TAIL_DB_ID_NONEinflb_tail_db_file_delete()ensures later calls toflb_tail_db_file_offset()fromflb_tail_file_remove()are gated and don’t recreate rows that were intentionally deleted.Also applies to: 425-445
plugins/in_tail/tail_file.c (4)
1310-1311: UsingFLB_TAIL_DB_ID_NONEsentinel on initialization is appropriateInitializing
file->db_idtoFLB_TAIL_DB_ID_NONEinflb_tail_file_append()matches the new semantics intail_db.c(where this sentinel guards offset updates and is reset on delete), and avoids accidental DB updates before the file is actually registered.
1484-1522: Rewind-on-buffered-data for plain files and warning‑only path for gzip are well‑scoped
- For uncompressed files, rewinding
file->offsetbybuf_len(clamped at 0) before persisting avoids losing unprocessed buffered bytes across restarts.- The DB update is correctly guarded with
file->db_id > FLB_TAIL_DB_ID_NONE, so rows explicitly deleted earlier won’t be recreated.- For gzip inputs, you only log a warning and persist the current DB state without raw rewind, which is consistent with the “anchor + skip” design and avoids incorrect seeks when decompressed vs compressed lengths diverge.
This addresses the original data‑loss issue while avoiding the previous “resurrect deleted DB entry” problem.
1631-1652: Truncation now clears gzip skip state consistentlyIn
adjust_counters(), whensize_delta < 0you now reset:
anchor_offsetto the new offset,skip_bytesandexclude_bytesto 0,skipping_modetoFLB_FALSE,and persist the new DB offset if enabled. This prevents stale gzip skipping state from being reused after a truncate, aligning with the resets already done in the inotify/stat truncation paths.
1845-1865: Gzipexclude_bytesand member‑boundary anchor updates look logically sound
- The
skipping_modeblock drops up toexclude_bytesworth of freshly decompressed data, either by consuming whole chunks or bymemmove‑ing the remaining tail down, and only increasesbuf_lenby the post‑skip length. Buffer invariants stay intact.- After processing,
skip_bytes += processed_bytestracks the total decompressed bytes consumed since the current anchor.- When a gzip member is fully consumed (
EXPECTING_HEADER,input_buffer_length == 0,buf_len == 0), updatinganchor_offsetto the current rawoffsetand resettingskip_bytesto 0 correctly advances the anchor to the next member boundary and keeps DB state minimal.Overall this matches the intended “anchor + skip per member” model and should allow correct resume without duplicates for multi‑member gzip files.
Also applies to: 1906-1927
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Added test for offset rewind on shutdown (prevent DB resurrection) - Added gzip tests: resume loss, inotify append, rotation, multi resume Signed-off-by: jinyong.choi <inimax801@gmail.com>
Unprocessed buffered data is lost on shutdown because the file offset is saved ahead of processing. This patch rewinds the offset by the remaining buffer length on exit, ensuring data is re-read on restart. Also, to prevent resurrecting deleted files in the DB, db_id is reset to 0 upon deletion. The offset update logic now checks db_id > 0 to ensure only active files are updated. Signed-off-by: jinyong.choi <inimax801@gmail.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
In addition to the existing implementation, the following changes were added:
Please let me know if you have any feedback or see any issues with this approach. Thanks! |
cosmo0920
left a comment
There was a problem hiding this comment.
I found a quite minor style issue.
Otherwise, the patch looks pretty great.
We're mostly all set. 👍
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/in_tail/tail_db.c (1)
182-220: Minor type inconsistency:skipmixes uint64_t and int64_t.The
skipfield is declared asuint64_tin C (lines 184, 324) but is read/written usingsqlite3_column_int64andsqlite3_bind_int64which operate onint64_t. While unlikely to cause issues in practice (skip_bytes won't exceedINT64_MAX), this inconsistency could theoretically cause sign-extension or overflow issues with very large values.For consistency and correctness, consider using
int64_tforskipthroughout, or explicitly cast to/fromuint64_twith overflow checks.Also applies to: 262-263, 323-324, 372-375
plugins/in_tail/tail_file.c (1)
1040-1089: Critical: stream_offset not initialized for compressed files in DB path.Lines 1085-1087 only initialize
stream_offsetfor non-decompression contexts. For compressed files restored from DB withskip_bytes > 0,stream_offsetremains at 0 instead of being set toskip_bytes. This breaksoffset_keymonotonicity:
- On restart,
stream_offsetstays 0stream_offsetaccumulates from 0 at line 1912offset_key(calculated asstream_offset + processed_bytes) restarts from 0 instead of continuing from the saved positionAfter the
lseek()succeeds (line 1061), add:if (file->decompression_context == NULL) { file->stream_offset = file->offset; } else { file->stream_offset = file->skip_bytes; }This ensures
stream_offsetreflects the decompressed stream position for gzip files.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/in_tail/tail_fs_stat.c
- plugins/in_tail/tail_fs_inotify.c
- plugins/in_tail/tail_file_internal.h
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_db.cplugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (1)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: PR - fuzzing test
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
🔇 Additional comments (11)
plugins/in_tail/tail_sql.h (2)
37-40: LGTM: Schema extension for gzip resume state.The addition of
skipandanchorcolumns with appropriate defaults enables the anchor+skip mechanism for gzip file resume. The trailing comma afterrotatedand the column definitions are syntactically correct.
45-47: LGTM: SQL statements updated for new columns.The INSERT and UPDATE statements correctly include the new
skipandanchorfields with proper parameter binding.Also applies to: 52-53
plugins/in_tail/tail_db.c (4)
28-34: LGTM: Column existence detection callback.The callback correctly sets the found flag when any row is returned by the PRAGMA query, enabling reliable migration detection.
61-108: LGTM: Migration logic addresses past concerns.The migration now uses
PRAGMA table_infoto reliably detect column existence instead of attempting a SELECT that could misfire on DB errors. The error handling properly aborts if the PRAGMA query itself fails, and column addition proceeds only when the column is confirmed absent.
348-359: LGTM: Skipping mode initialization from DB state.The initialization logic correctly sets
exclude_bytesandskipping_modewhen resuming from a saved gzip position withskip_bytes > 0. This properly restores the runtime state needed to skip already-processed decompressed data on restart.
444-444: LGTM: Prevent resurrecting deleted DB entries.Resetting
db_idtoFLB_TAIL_DB_ID_NONEafter deletion ensures that subsequent offset updates won't inadvertently recreate the deleted entry. This addresses the concern raised in past review comments.plugins/in_tail/tail_file.c (5)
1315-1324: LGTM: New gzip resume fields initialized.All new fields (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) are properly initialized for newly tracked files. The use ofFLB_TAIL_DB_ID_NONEis consistent with the deletion handling.
1495-1534: LGTM: Rewind logic correctly handles buffered data.The rewind implementation properly addresses the stated objectives:
- Uncompressed files: Rewinds
offsetbybuf_len(clamped to 0) so unprocessed bytes are re-read on restart- Compressed files: Logs a warning that buffered decompressed data cannot be safely rewound due to streaming decompression, accepting the known limitation
- DB protection: Checks
db_id > FLB_TAIL_DB_ID_NONEto prevent updating deleted entries- Format specifiers: Uses proper
%jdwith(intmax_t)casts foroff_tvalues
1654-1657: LGTM: Truncation resets gzip resume state.All gzip-related fields (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) are properly reset when the file is truncated, ensuring stale skip state doesn't persist after truncation. This addresses the concern raised in past review comments.
1857-1877: LGTM: Skip mechanism correctly excludes already-processed data.The skip logic properly decrements
exclude_bytesas decompressed data is produced, usingmemmoveto discard the skipped portion when a partial skip occurs, and clearingskipping_modewhen the skip completes. This correctly implements the runtime skip countdown for gzip resume.
1918-1939: LGTM: Gzip member boundary detection updates anchor points.Line 1919 increments
skip_bytesbyprocessed_bytesafter each chunk is processed, tracking decompressed progress within the current gzip member. When a member completes (decompressor state isEXPECTING_HEADER, buffers are empty), the code updatesanchor_offsetto the current compressed position and resetsskip_bytesto 0, establishing a new resume point at the member boundary. This correctly implements the anchor+skip strategy for gzip files.
|
@cosmo0920
Please let me know if there are any other concerns. |
|
I found a potential issue but we could be skipped in this PR: Interaction with multiline + gzip (MEDIUM risk, but acceptable)There’s a subtle corner: Scenario:
Risk: If shutdown occurs between
and
then DB may lag by one chunk. This results in small duplication, not loss. This scenario could be still a main risk to behind the right offset of tailing gzipped files but this just causes duplicated contents of gzipped files. This could be technically difficult to solve this. So, we need to describe such corner case if we accept this behavior. Other risks are still existing in this patch but others are relatively too small than this. Plus, it's just a logging issue but should we need to show altering database when starting to do this operation? This is just my curiosity. |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/in_tail/tail_db.c (1)
258-263: Verify SQL_INSERT_FILE parameter order and address skip_bytes type mismatch.The bindings at lines 258-263 correctly match SQL_INSERT_FILE's parameter order (name, offset, inode, created, skip, anchor). However,
skip_bytesis defined asuint64_tbut bound viasqlite3_bind_int64, which expects a signedsqlite3_int64. While values exceedingINT64_MAXare unlikely for a byte offset, consider either casting toint64_texplicitly or usingsqlite3_bind_int64with a comment documenting the assumption thatskip_bytesremains within signed int64 range.
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
1027-1141: Document multi-member gzip stream_offset limitation more prominently.The comment at lines 1090-1095 acknowledges that
stream_offset = skip_bytesis only correct for single-member gzip files. For multi-member gzip,skip_bytesresets at member boundaries, sostream_offsetwon't reflect total decompressed bytes from all prior members. This could affectoffset_keyaccuracy when users concatenate multiple gzip streams.Consider either:
- Adding a more prominent warning in the code or documentation that
offset_keyfor multi-member gzip reflects position within the current member, not total decompressed bytes across all members.- Tracking total decompressed bytes separately if multi-member gzip is a common use case.
Otherwise, the seek logic correctly handles DB-backed resume for both gzip and plain files, with proper initialization of
stream_offset,exclude_bytes, andskipping_mode.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
plugins/in_tail/tail_db.c(11 hunks)plugins/in_tail/tail_file.c(9 hunks)plugins/in_tail/tail_file_internal.h(1 hunks)plugins/in_tail/tail_fs_inotify.c(1 hunks)plugins/in_tail/tail_fs_stat.c(1 hunks)plugins/in_tail/tail_sql.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/in_tail/tail_file_internal.h
- plugins/in_tail/tail_fs_stat.c
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/in_tail/tail_db.cplugins/in_tail/tail_file.cplugins/in_tail/tail_fs_inotify.c
📚 Learning: 2025-09-22T15:59:55.794Z
Learnt from: nicknezis
Repo: fluent/fluent-bit PR: 10882
File: plugins/out_http/http.c:112-116
Timestamp: 2025-09-22T15:59:55.794Z
Learning: When users consider bug fixes out of scope for their focused PRs, it's appropriate to create separate GitHub issues to track those concerns rather than expanding the current PR scope.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
plugins/in_tail/tail_file.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
plugins/in_tail/tail_file.c
🧬 Code graph analysis (2)
plugins/in_tail/tail_db.c (2)
src/flb_sqldb.c (2)
flb_sqldb_query(151-166)flb_sqldb_close(129-149)plugins/in_tail/tail_file.h (1)
flb_tail_target_file_name_cmp(64-116)
plugins/in_tail/tail_file.c (1)
plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(366-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (16)
plugins/in_tail/tail_db.c (6)
28-34: LGTM!The column-existence callback is correctly implemented. It properly sets the flag when a row is returned from the pragma query, enabling reliable column detection for schema migration.
61-108: LGTM!The migration logic correctly uses
pragma_table_infoto detect column existence, distinguishing real errors from missing columns. The pattern is consistent for bothskipandanchorcolumns, with proper error handling and logging.
318-363: LGTM!The function correctly restores
skip_bytesandanchor_offsetfrom the database and initializes the runtime-onlyexclude_bytesandskipping_modefields based on the persisted skip state. The logic properly handles both cases (skip > 0 and skip == 0).
424-446: LGTM!Resetting
db_idtoFLB_TAIL_DB_ID_NONEafter deletion prevents accidental resurrection of deleted DB entries whenflb_tail_db_file_offsetis called later. This is an important safeguard for data integrity.
365-398: The parameter binding order inflb_tail_db_file_offsetcorrectly matches theSQL_UPDATE_OFFSETstatement: offset (parameter 1), skip (parameter 2), anchor (parameter 3), and id (parameter 4).
182-221: Column indices for skip and anchor are correct.The code correctly reads
skipfrom column index 6 andanchorfrom column index 7, matching the table schema defined in SQL_CREATE_FILES. WhenSELECT *is executed, SQLite returns columns in the order they appear in the table definition: id (0), name (1), offset (2), inode (3), created (4), rotated (5), skip (6), and anchor (7).plugins/in_tail/tail_fs_inotify.c (1)
259-281: LGTM!The truncation handler correctly resets all gzip resume state fields (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) alongside the file offset and buffer, ensuring a clean state after truncation. This initialization is consistent with similar handling intail_fs_stat.candadjust_countersintail_file.c.plugins/in_tail/tail_sql.h (3)
30-40: LGTM!The table schema correctly adds
skipandanchorcolumns withINTEGER DEFAULT 0, ensuring backward compatibility with existing databases. Column indices (skip=6, anchor=7) align with the reads indb_file_exists.
45-47: LGTM!The
SQL_INSERT_FILEstatement correctly includesskipandanchorin both the column list and VALUES clause. The parameter order matches the binding sequence indb_file_insert(tail_db.c lines 258-263).
52-53: LGTM!The
SQL_UPDATE_OFFSETstatement correctly updates all three position-tracking fields (offset,skip,anchor) atomically. The parameter order matches the binding sequence inflb_tail_db_file_offset(tail_db.c lines 372-375).plugins/in_tail/tail_file.c (6)
47-47: LGTM!Including
flb_compression.his appropriate for the gzip decompression functionality added in this PR.
1328-1338: LGTM!All gzip resume fields (
anchor_offset,skip_bytes,exclude_bytes,skipping_mode) are correctly initialized to zero/false when a new file is appended, ensuring clean initial state.
1509-1548: LGTM!The offset rewind logic correctly handles buffered data on shutdown:
- For non-compressed files: rewinds the offset by
buf_len(with bounds checking) so unprocessed data is re-read on restart.- For compressed files: logs a warning explaining that accurate rewinding is infeasible with streaming decompression.
- DB updates are properly guarded by
db_id > FLB_TAIL_DB_ID_NONEto prevent resurrecting deleted entries.This addresses the data loss issue described in the PR objectives.
1657-1679: LGTM!The truncation handler in
adjust_counterscorrectly resets all gzip resume state fields (anchor_offset,skip_bytes,exclude_bytes,skipping_mode) when a file is truncated, consistent with the inotify and stat-based truncation handlers.
1871-1891: LGTM!The skip logic during decompression correctly handles resuming from a mid-stream position:
- When
exclude_bytes >= decompressed_data_length, all newly decompressed data is skipped andexclude_bytesis decremented.- When
exclude_bytes < decompressed_data_length, the remaining bytes are shifted to the buffer start usingmemmove, andskipping_modeis cleared.This enables accurate gzip resume from the persisted
skip_bytesposition.
1932-1953: LGTM!The gzip member boundary handling correctly tracks position using the anchor/skip pattern:
skip_bytesis incremented byprocessed_bytesto track position within the current member.- When a member completes (decompressor transitions to
EXPECTING_HEADERstate and all buffers are empty),anchor_offsetadvances to the current raw file position andskip_bytesresets to 0.This enables resume at member boundaries for multi-member gzip files, addressing the data loss issue for compressed inputs mentioned in the PR objectives.
|
Thanks for the detailed analysis. I fully agree with your opinion. Although there is a risk of duplication in case of abrupt shutdown, it is technically difficult to solve completely, and duplication is definitely better than data loss. Also, regarding your question about the logs, I have changed the log level of the database migration messages to debug. This ensures they are not too noisy during normal operation while still being available for troubleshooting if needed. Finally, should I add a note about the limitation (potential duplication on crash) to the documentation? I think adding a warning/note to the 'Database file' section would be helpful for users. Let me know what you think! |
cosmo0920
left a comment
There was a problem hiding this comment.
I found a small nitpick issue but your PR is not following our coding style.
So, we need to follow the style of defining variables.
I suppose that we need to Note annotations to depicts the possibility for database corruptions in the official documentation which should be corresponding PR for documentation. This could be corner cases but it's technically hard to solve cleanly. |
Previously, when tailing gzip files, there was no mechanism to persistently
store the uncompressed position ('skip_bytes'). This meant that upon restart,
the plugin could not correctly locate the reading position, identifying it as
a rotation or new file case, potentially leading to data loss.
To fix this, 'skip_bytes' is now stored in the database to persist the
uncompressed offset. Additionally, 'exclude_bytes' is introduced to track
runtime skipping without interfering with the persistent value.
The SQLite schema has been updated to include 'anchor_offset' and 'skip_bytes'
columns to support these features.
Signed-off-by: jinyong.choi <inimax801@gmail.com>
Got it! I'll create a separate PR for the documentation. |
Unprocessed data in the internal buffer is discarded when Fluent Bit stops, causing data loss because the DB offset is already advanced.
This patch fixes the issue by rewinding the file offset by the remaining buffer length on exit, ensuring data is re-read on restart.
For compressed gzip files, a separate issue caused data duplication after restart because skip_bytes was incorrectly decremented during runtime. A new field 'exclude_bytes' is introduced as a runtime-only counter, preserving skip_bytes for correct DB persistence.
Additionally, this patch prevents resurrecting deleted file entries in the DB by resetting db_id to 0 upon deletion and checking it before updating the offset.
The SQLite schema is updated to include 'anchor_offset' and 'skip_bytes' columns. On upgrade from older versions, these columns are automatically added via ALTER TABLE if they do not exist.
Closes #11265
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.