Skip to content

http_server: fix libevent crash on connection drop#11843

Open
cosmo0920 wants to merge 1 commit into
masterfrom
cosmo0920-plug-http-server-segv-on-windows
Open

http_server: fix libevent crash on connection drop#11843
cosmo0920 wants to merge 1 commit into
masterfrom
cosmo0920-plug-http-server-segv-on-windows

Conversation

@cosmo0920
Copy link
Copy Markdown
Contributor

@cosmo0920 cosmo0920 commented May 25, 2026

Removes dangerous connection->event.data assignments (which crash the libevent backend on Windows)
and instead preserves connection->user_data
so the event handler can safely find and clean up the session.

Closes #11842.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
PS> in/fluent-bit -i dummy -o stdout -H -P 2021 -v
  • Debug log output from testing the change

After applying this patch, Fluent Bit now starts to process HTTP requests with internal HTTP server again on Windows:

Fluent Bit v5.0.7
* Copyright (C) 2015-2026 The Fluent Bit Authors
* Fluent Bit is a CNCF graduated project under the Fluent organization
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____
|  ___| |                | |   | ___ (_) |         |  ___||  _  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   _|___ \ | |/' |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \|  /| |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V //\__/ /\ |_/ /
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)\___/


[2026/05/26 00:09:32.219] [ info] Configuration:
[2026/05/26 00:09:32.220] [ info]  flush time     | 1.000000 seconds
[2026/05/26 00:09:32.220] [ info]  grace          | 5 seconds
[2026/05/26 00:09:32.220] [ info]  daemon         | 0
[2026/05/26 00:09:32.220] [ info] ___________
[2026/05/26 00:09:32.220] [ info]  inputs:
[2026/05/26 00:09:32.220] [ info]      dummy
[2026/05/26 00:09:32.220] [ info] ___________
[2026/05/26 00:09:32.220] [ info]  filters:
[2026/05/26 00:09:32.220] [ info] ___________
[2026/05/26 00:09:32.220] [ info]  outputs:
[2026/05/26 00:09:32.220] [ info]      stdout.0
[2026/05/26 00:09:32.220] [ info] ___________
[2026/05/26 00:09:32.220] [ info]  collectors:
[2026/05/26 00:09:32.221] [ info] [fluent bit] version=5.0.7, commit=f6126ebc3a, pid=33484
[2026/05/26 00:09:32.222] [debug] [engine] maxstdio set: 512
[2026/05/26 00:09:32.222] [debug] [engine] coroutine stack size: 98302 bytes (96.0K)
[2026/05/26 00:09:32.222] [ info] [storage] ver=1.5.4, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2026/05/26 00:09:32.222] [ info] [simd    ] disabled
[2026/05/26 00:09:32.222] [ info] [cmetrics] version=2.1.4
[2026/05/26 00:09:32.222] [ info] [ctraces ] version=0.7.1
[2026/05/26 00:09:32.222] [ info] [input:dummy:dummy.0] initializing
[2026/05/26 00:09:32.222] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2026/05/26 00:09:32.222] [debug] [dummy:dummy.0] created event channels: read=804 write=808
[2026/05/26 00:09:32.222] [debug] [stdout:stdout.0] created event channels: read=812 write=816
[2026/05/26 00:09:32.224] [error] [C:\Users\cosmo\Documents\GitHub\fluent-bit\src\flb_network.c:241 errno=0] No error
[2026/05/26 00:09:32.224] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2021
[2026/05/26 00:09:32.224] [ info] [output:stdout:stdout.0] worker #0 started
[2026/05/26 00:09:32.224] [ info] [sp] stream processor started
[2026/05/26 00:09:32.224] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
[2026/05/26 00:09:34.228] [debug] [task] created task=000002533CE4EB60 id=0 OK
[2026/05/26 00:09:34.228] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721773.234064600, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:34.229] [debug] [out flush] cb_destroy coro_id=0
[2026/05/26 00:09:34.229] [debug] [task] destroy task=000002533CE4EB60 (task_id=0)
[2026/05/26 00:09:35.230] [debug] [task] created task=000002533CE4F060 id=0 OK
[2026/05/26 00:09:35.230] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721774.229241000, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:35.231] [debug] [out flush] cb_destroy coro_id=1
[2026/05/26 00:09:35.231] [debug] [task] destroy task=000002533CE4F060 (task_id=0)
[2026/05/26 00:09:36.241] [debug] [task] created task=000002533CE4E520 id=0 OK
[2026/05/26 00:09:36.241] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721775.231066400, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:36.241] [debug] [out flush] cb_destroy coro_id=2
[2026/05/26 00:09:36.241] [debug] [task] destroy task=000002533CE4E520 (task_id=0)
[0] dummy.0: [[[2026/05/26 00:09:37.239] [debug] [task] created task=000002533CE4EDE0 id=0 OK
[2026/05/26 00:09:37.239] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
1779721776.241631100, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:37.240] [debug] [out flush] cb_destroy coro_id=3
[2026/05/26 00:09:37.240] [debug] [task] destroy task=000002533CE4EDE0 (task_id=0)
[2026/05/26 00:09:38.220] [debug] [task] created task=000002533CE4EFC0 id=0 OK
[0] dummy.0: [[1779721777.239806700, {}], [2026/05/26 00:09:38.220] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
{"message"=>"dummy"}]
[2026/05/26 00:09:38.221] [debug] [out flush] cb_destroy coro_id=4
[2026/05/26 00:09:38.221] [debug] [task] destroy task=000002533CE4EFC0 (task_id=0)
[2026/05/26 00:09:39.246] [debug] [task] created task=000002533CE4EB60 id=0 OK
[2026/05/26 00:09:39.246] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721778.221294500, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:39.246] [debug] [out flush] cb_destroy coro_id=5
[2026/05/26 00:09:39.247] [debug] [task] destroy task=000002533CE4EB60 (task_id=0)
[2026/05/26 00:09:40.241] [debug] [task] created task=000002533CE4F060 id=0 OK
[2026/05/26 00:09:40.241] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721779.247253200, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:40.241] [debug] [out flush] cb_destroy coro_id=6
[2026/05/26 00:09:40.241] [debug] [task] destroy task=000002533CE4F060 (task_id=0)
[2026/05/26 00:09:41.225] [debug] [task] created task=000002533CE4F1A0 id=0 OK
[2026/05/26 00:09:41.225] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721780.241239900, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:41.226] [debug] [out flush] cb_destroy coro_id=7
[2026/05/26 00:09:41.226] [debug] [task] destroy task=000002533CE4F1A0 (task_id=0)
[2026/05/26 00:09:42.238] [debug] [task] created task=000002533CE4F100 id=0 OK
[0] dummy.0: [[[2026/05/26 00:09:42.238] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
1779721781.225863200, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:42.239] [debug] [out flush] cb_destroy coro_id=8
[2026/05/26 00:09:42.239] [debug] [task] destroy task=000002533CE4F100 (task_id=0)
[2026/05/26 00:09:43.220] [debug] [task] created task=000002533CE4E520 id=0 OK
[2026/05/26 00:09:43.220] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721782.239292200, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:43.221] [debug] [out flush] cb_destroy coro_id=9
[2026/05/26 00:09:43.221] [debug] [task] destroy task=000002533CE4E520 (task_id=0)
[2026/05/26 00:09:44.222] [debug] [task] created task=000002533CE4EE80 id=0 OK
[2026/05/26 00:09:44.222] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [[1779721783.221002400, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:44.223] [debug] [out flush] cb_destroy coro_id=10
[2026/05/26 00:09:44.223] [debug] [task] destroy task=000002533CE4EE80 (task_id=0)
[2026/05/26 00:09:44] [engine] caught signal (SIGINT)
[2026/05/26 00:09:45.231] [debug] [task] created task=000002533CE4EB60 id=0 OK
[0] dummy.0: [[[2026/05/26 00:09:45.231] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
1779721784.222948800, {}], {"message"=>"dummy"}]
[2026/05/26 00:09:45.231] [debug] [out flush] cb_destroy coro_id=11
[2026/05/26 00:09:45.231] [debug] [task] destroy task=000002533CE4EB60 (task_id=0)
[2026/05/26 00:09:45.341] [debug] [task] created task=000002533CE4EB60 id=0 OK
[2026/05/26 00:09:45.341] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2026/05/26 00:09:45.341] [ warn] [engine] service will shutdown in max 5 seconds
[2026/05/26 00:09:45.341] [debug] [engine] task 0 already scheduled to run, not re-scheduling it.
[2026/05/26 00:09:45.341] [ info] [engine] pausing all inputs..
[0] dummy.0: [[1779721785.231836900, {[2026/05/26 00:09:45.341] [ info] [input] pausing dummy.0
}], {"message"=>"dummy"}]
[2026/05/26 00:09:45.342] [debug] [out flush] cb_destroy coro_id=12
[2026/05/26 00:09:45.342] [debug] [task] destroy task=000002533CE4EB60 (task_id=0)
[2026/05/26 00:09:46.356] [ info] [engine] service has stopped (0 pending tasks)
[2026/05/26 00:09:46.356] [ info] [input] pausing dummy.0
[2026/05/26 00:09:46.356] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2026/05/26 00:09:46.357] [ info] [output:stdout:stdout.0] thread worker #0 stopped
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Refactor
    • Improved HTTP server connection and session lifecycle management for enhanced reliability and maintainability.

Review Change Stack

Removes dangerous connection->event.data assignments
(which crash the libevent backend on Windows)
and instead preserves connection->user_data
so the event handler can safely find and clean up the session.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

The HTTP server connection/session association mechanism was refactored to consolidate pointer tracking. Sessions are now associated with connections exclusively via connection->user_data, while all previous manipulations of connection->event.data for session linkage were removed from connection event handlers and session teardown paths.

Changes

Connection-Session Association Consolidation

Layer / File(s) Summary
Unified user_data-based session tracking
src/http_server/flb_http_server.c
Connection/session association and teardown refactored to use only connection->user_data for session pointer storage; connection->event.data no longer manipulated during session lookup, association, or destruction across connection handlers and cleanup functions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A tangled web of pointers two,
Now streamlined clean to one that's true,
Where user_data holds the key,
And event.data sets threads free. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'http_server: fix libevent crash on connection drop' is clear, specific, and directly describes the main change—fixing a libevent crash by addressing connection drop handling.
Linked Issues check ✅ Passed The PR addresses issue #11842 by removing problematic connection->event.data assignments that caused Windows crashes; changes align with fixing the access violation (0xc0000005) reported on Windows.
Out of Scope Changes check ✅ Passed All changes are narrowly scoped to http_server connection and session lifecycle management; modifications to event data handling and session cleanup directly address the crash issue without introducing unrelated changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-plug-http-server-segv-on-windows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9dc9923f15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 385 to +386
if (session == NULL) {
session = (struct flb_http_server_session *) event->data;
if (session != NULL &&
(session->connection == NULL ||
session->connection->fd == FLB_INVALID_SOCKET)) {
event->data = NULL;
session->drop_pending = FLB_FALSE;
flb_http_server_session_destroy(session);
}
return -1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Destroy dropped sessions when user_data is already cleared

Returning immediately when connection->user_data is NULL skips the only cleanup path for sessions marked in flb_http_server_connection_drop(). That callback sets session->drop_pending = FLB_TRUE and then clears connection->user_data, so after this change a dropped connection can leave its session stranded in server->clients indefinitely because flb_http_server_reap_stale_sessions() only reaps sessions with drop_pending == FLB_FALSE. Under repeated client disconnects, this leaks session slots and can eventually cause max_connections rejections.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/http_server/flb_http_server.c (1)

123-132: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep connection->user_data until the stale-session cleanup branch consumes it.

flb_http_server_client_activity_event_handler() now uses connection->user_data as the only session lookup path, but flb_http_server_connection_drop() clears that field before the queued activity event can run. In that case the handler returns on session == NULL, drop_pending stays true, and flb_http_server_reap_stale_sessions() will never reap the leaked session.

Suggested fix
 static void flb_http_server_connection_drop(struct flb_connection *connection)
 {
     struct flb_http_server_session *session;

     if (connection == NULL) {
         return;
     }

     session = connection->user_data;

     if (session != NULL &&
         session->connection == connection) {
         session->connection = NULL;
         session->drop_pending = FLB_TRUE;
     }

-    connection->user_data = NULL;
     connection->drop_notification_callback = NULL;
 }
     if (session->connection == NULL ||
         session->connection->fd == FLB_INVALID_SOCKET) {
         session->drop_pending = FLB_FALSE;
+        connection->user_data = NULL;
         flb_http_server_session_destroy(session);
         return -1;
     }

Also applies to: 389-392

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/http_server/flb_http_server.c` around lines 123 - 132,
flb_http_server_connection_drop currently clears connection->user_data (and
drop_notification_callback) immediately, which breaks
flb_http_server_client_activity_event_handler because it looks up sessions via
connection->user_data and leaves sessions stuck with session->drop_pending =
FLB_TRUE so flb_http_server_reap_stale_sessions never reaps them; fix by not
clearing connection->user_data in flb_http_server_connection_drop (leave
connection->user_data set until the stale-session cleanup consumes it), only
null out session->connection and set session->drop_pending = FLB_TRUE, and
ensure you do not remove connection->drop_notification_callback prematurely;
apply the same change to the other similar branch (lines referenced as 389-392)
so both code paths preserve connection->user_data for the reap logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/http_server/flb_http_server.c`:
- Around line 123-132: flb_http_server_connection_drop currently clears
connection->user_data (and drop_notification_callback) immediately, which breaks
flb_http_server_client_activity_event_handler because it looks up sessions via
connection->user_data and leaves sessions stuck with session->drop_pending =
FLB_TRUE so flb_http_server_reap_stale_sessions never reaps them; fix by not
clearing connection->user_data in flb_http_server_connection_drop (leave
connection->user_data set until the stale-session cleanup consumes it), only
null out session->connection and set session->drop_pending = FLB_TRUE, and
ensure you do not remove connection->drop_notification_callback prematurely;
apply the same change to the other similar branch (lines referenced as 389-392)
so both code paths preserve connection->user_data for the reap logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ade7770d-0998-45f7-8d70-0d1ea4d6c2a5

📥 Commits

Reviewing files that changed from the base of the PR and between f6126eb and 9dc9923.

📒 Files selected for processing (1)
  • src/http_server/flb_http_server.c

@edsiper
Copy link
Copy Markdown
Member

edsiper commented May 25, 2026

The fix direction looks correct: removing connection->event.data avoids keeping the HTTP server session pointer in libevent-owned event data, which matches the Windows crash concern.

However, the PR as-is introduces a cleanup regression. flb_http_server_connection_drop() clears connection>user_data but still leaves the detached session with session->drop_pending = FLB_TRUE. Since the activity handler no longer falls back to event.data, that session can remain unreaped and keep http_server.max_connections slots stuck.

I reproduced this on PR head with:

tests/integration/.venv/bin/python -m pytest tests/integration/scenarios/in_http_max_connections/tests/test_in_http_max_connections_001.py -q

Result on PR head: 2 failed, 1 passed.

Changing the drop path to leave the detached session reapable fixes it:

diff --git a/src/http_server/flb_http_server.c b/src/http_server/flb_http_server.c
 index 64045e404..7a717a7e6 100644
 --- a/src/http_server/flb_http_server.c
 +++ b/src/http_server/flb_http_server.c
 @@ -125,7 +125,7 @@ static void flb_http_server_connection_drop(struct flb_connection *connection)
      if (session != NULL &&
          session->connection == connection) {
          session->connection = NULL;
 -        session->drop_pending = FLB_TRUE;
 +        session->drop_pending = FLB_FALSE;
      }

      connection->user_data = NULL;  

After that change:

  • in_http_max_connections: 3 passed
  • VALGRIND=1 VALGRIND_STRICT=1 in_http_max_connections: 3 passed, 0 valgrind errors
  • out_prometheus_exporter: 2 passed
  • VALGRIND=1 VALGRIND_STRICT=1 out_prometheus_exporter: 2 passed, 0 valgrind errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Windows] Fluent-Bit service got exception code 0xc0000005 for v5.0.6

2 participants