out_s3: Provide options for inactive chunks behavior#11816
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds configurable terminal retry-exhaustion handling to the S3 output: when retries are exhausted a configured action ( ChangesS3 Retry-Exhausted Terminal Action
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d712df473
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/out_s3/s3.c`:
- Around line 1987-1992: When s3_store_file_inactive(ctx, chunk_file) fails we
must enforce terminal semantics by falling back to deleting the chunk instead of
returning early; after the flb_plg_error call, invoke the existing chunk-delete
helper (e.g., s3_unlink_chunk(ctx, chunk_file) or the plugin's equivalent delete
function) and log its success/failure, and if no helper exists perform a safe
file unlink of chunk_file and log any error—ensure the function still returns
afterwards so the terminal cleanup path is always applied.
In `@tests/integration/scenarios/out_s3/tests/test_out_s3_001.py`:
- Around line 237-241: Replace the fixed time.sleep(6) with a polling loop that
repeatedly checks for files under store_dir (using the same glob pattern) until
either at least one file is found or a timeout elapses, then call
service.stop(); keep the final assertion that files exist. Concretely, remove
the time.sleep(6) call, implement a short-interval loop that re-evaluates the
glob (the same expression used to build files) up to a configurable timeout
(e.g., several seconds) and breaks when len(files) > 0, and then call
service.stop() and assert len(files) > 0 as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 58eccc69-6fde-42e1-856a-5bcc210112c2
📒 Files selected for processing (5)
plugins/out_s3/s3.cplugins/out_s3/s3.htests/integration/scenarios/out_s3/config/out_s3_retry_exhausted_default_quarantine.yamltests/integration/scenarios/out_s3/tests/test_out_s3_001.pytests/runtime/out_s3.c
d602373 to
dee6b4f
Compare
dee6b4f to
1311268
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/integration/scenarios/out_s3/tests/test_out_s3_001.py (2)
227-233: 💤 Low valueConsider more robust cleanup with
shutil.rmtree.The current cleanup only removes files and silently swallows all
OSErrorexceptions. Usingshutil.rmtree(store_dir, ignore_errors=True)followed byos.makedirs(store_dir, exist_ok=True)would be more concise and handle both files and directories.♻️ Proposed refactor
+import shutil + def test_out_s3_default_retry_exhausted_action_quarantines_file(): store_dir = "/tmp/fluent-bit-test-suite-s3-retry-exhausted" - if os.path.exists(store_dir): - for path in glob.glob(f"{store_dir}/**", recursive=True): - if os.path.isfile(path): - try: - os.remove(path) - except OSError: - pass + shutil.rmtree(store_dir, ignore_errors=True) + os.makedirs(store_dir, exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/scenarios/out_s3/tests/test_out_s3_001.py` around lines 227 - 233, Replace the manual file-by-file cleanup loop that checks os.path.exists(store_dir) and iterates glob.glob(...) with a robust directory remove and recreate: call shutil.rmtree(store_dir, ignore_errors=True) to remove the whole tree and then os.makedirs(store_dir, exist_ok=True) to recreate the empty directory; update imports to include shutil and remove the try/except OSError block and the glob usage around store_dir.
237-246: 💤 Low valuePolling implementation properly addresses previous review feedback.
The timeout-based polling with 10-second deadline and 0.2-second intervals correctly replaces the previous fixed sleep approach, reducing CI flakiness. Consider adding a descriptive message to the assertion for easier debugging if the test fails.
💡 Optional: Add assertion message
- assert len(files) > 0 + assert len(files) > 0, f"Expected quarantined files in {store_dir}, but none were found after {timeout}s"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/scenarios/out_s3/tests/test_out_s3_001.py` around lines 237 - 246, Add a descriptive assertion message to the final check so test failures are easier to debug: update the assertion "assert len(files) > 0" (in the polling block that uses timeout, files, store_dir and service.stop()) to include a message that shows the expected condition and relevant context (e.g., store_dir path and sampled files list or count) so test output clearly indicates why no files were found.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/out_s3/s3.c`:
- Around line 1976-2005: The quarantine branch currently calls
s3_store_file_inactive which makes quarantined chunks recoverable; change this
to create a true terminal quarantine state by either (A) storing the file via a
new s3_store_file_quarantine(file, ctx) helper that moves the file into a
dedicated quarantine directory/name (and does not mark it as "inactive"), or (B)
setting a persistent quarantine flag/metadata on the chunk so recovery code can
skip it; also update the recovery path (put_all_chunks or its scan) to
explicitly ignore files marked as quarantined. Ensure you adjust bookkeeping
(ctx->quarantine_buffer_size, ctx->current_buffer_size) exactly as done now and
replace the s3_store_file_inactive call with the new terminal-quarantine
operation and its error handling, keeping existing calls to s3_store_file_delete
for failures.
- Around line 1987-1991: The current error path calls
s3_store_file_inactive(ctx, chunk_file) which frees the struct s3_file, and then
calls s3_store_file_delete(ctx, chunk_file), causing a use-after-free; to fix
this, avoid dereferencing chunk_file after s3_store_file_inactive by either
caching any fields needed for deletion (e.g., the fstore handle/size or other
identifiers) into local variables before calling s3_store_file_inactive, and
pass those cached values into s3_store_file_delete (or a new helper that deletes
by handle), or move the fallback-delete logic into
s3_store_file_inactive/s3_store routines so deletion happens before/free-safe
after inactivation; update calls around s3_store_file_inactive and
s3_store_file_delete accordingly and ensure no access to chunk_file after it may
have been freed.
---
Nitpick comments:
In `@tests/integration/scenarios/out_s3/tests/test_out_s3_001.py`:
- Around line 227-233: Replace the manual file-by-file cleanup loop that checks
os.path.exists(store_dir) and iterates glob.glob(...) with a robust directory
remove and recreate: call shutil.rmtree(store_dir, ignore_errors=True) to remove
the whole tree and then os.makedirs(store_dir, exist_ok=True) to recreate the
empty directory; update imports to include shutil and remove the try/except
OSError block and the glob usage around store_dir.
- Around line 237-246: Add a descriptive assertion message to the final check so
test failures are easier to debug: update the assertion "assert len(files) > 0"
(in the polling block that uses timeout, files, store_dir and service.stop()) to
include a message that shows the expected condition and relevant context (e.g.,
store_dir path and sampled files list or count) so test output clearly indicates
why no files were found.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7e0b9912-ffdc-4309-9be3-1b1da43ad418
📒 Files selected for processing (2)
plugins/out_s3/s3.ctests/integration/scenarios/out_s3/tests/test_out_s3_001.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/scenarios/out_s3/tests/test_out_s3_001.py`:
- Around line 227-233: The cleanup loop that swallows OSError can leave stale
files; replace it by fully removing and recreating the store_dir to hard-reset
state: call shutil.rmtree(store_dir) (guarded by os.path.exists(store_dir)),
then os.makedirs(store_dir, exist_ok=True) to recreate an empty directory, and
remove the try/except that silences os.remove failures so any cleanup error
surfaces and fails the test; update imports to include shutil and keep
references to store_dir, glob.glob, and os.path.exists as needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4cd97470-539a-4234-a326-9718d6fcb378
📒 Files selected for processing (3)
plugins/out_s3/s3.cplugins/out_s3/s3_store.ctests/integration/scenarios/out_s3/tests/test_out_s3_001.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/out_s3/s3.c
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/out_s3/s3_store.c`:
- Around line 330-338: s3_store_has_data() is counting the newly created
quarantine stream (ctx->stream_quarantine) as regular startup data; update the
detection logic in s3_store_has_data() to skip the quarantine stream — either by
comparing the stream name to "quarantine" (from flb_fstore_stream_create) or by
checking pointer equality to ctx->stream_quarantine — so quarantined files are
not treated as backlog/regular data during startup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54f56865-1b17-4174-a437-3249739e22e7
📒 Files selected for processing (4)
plugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.h
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/out_s3/s3.h
- plugins/out_s3/s3.c
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/scenarios/out_s3/tests/test_out_s3_001.py`:
- Around line 227-231: Replace the fixed /tmp path assigned to store_dir with a
unique temp directory (use
tempfile.mkdtemp(prefix="fluent-bit-test-suite-s3-retry-exhausted-")) and remove
the manual os.makedirs call; then ensure the directory is removed in a finally
block (shutil.rmtree(store_dir)) so cleanup always runs; update the test that
sets store_dir to use tempfile.mkdtemp and wrap the test body in try/finally to
guarantee removal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 00e75e0d-40e6-4305-8833-a79ac6240035
📒 Files selected for processing (1)
tests/integration/scenarios/out_s3/tests/test_out_s3_001.py
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
ddf8235 to
980cc4d
Compare
In the current behavior of inactive files, out_s3 just persists to leave as-is and nop for the quaratined files.
In this PR, we provide options to change behavior of these quarantined files.
The default behavior of inactive files should be set up as quarantine and the limit of quarantine size of files are also provided.
Closes #11759.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Tests