~ServiceStateMachine: Avoid call munmap on coroutine stack by xiexiaoy · Pull Request #422 · eloqdata/eloqdoc

xiexiaoy · 2026-01-12T04:36:05Z

Summary by CodeRabbit

Bug Fixes
- Deferred coroutine stack cleanup onto the executor resume path to reduce shutdown races and crashes.
Chores
- Added support for moving tasks into the defer/resume path to improve task-resumption flexibility.
Stability
- Allocate coroutine stacks with guard-page protection and stricter failure handling to harden runtime robustness.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-12T04:36:20Z

Walkthrough

ServiceStateMachine switches coroutine stack allocation to mmap/mprotect and defers munmap to the ServiceExecutor when running on the originating thread group; otherwise it unmaps immediately. ServiceExecutor gains a new rvalue Task&& overload deferCallOnMainStack, and ServiceExecutorCoroutine implements that overload to move tasks into the resume path.

Changes

Cohort / File(s)	Summary
ServiceStateMachine (allocation & destructor) `src/mongo/transport/service_state_machine.cpp`	Constructor now allocates coroutine stacks via `mmap` and applies `mprotect(PROT_NONE)` on a reserved region; on allocation failure it logs and aborts. Destructor either `munmap`s immediately when not on the originating thread group, or captures the stack in a lambda and enqueues a deferred `munmap` via the service executor when on-origin.
ServiceExecutor API `src/mongo/transport/service_executor.h`	Added virtual method `deferCallOnMainStack(uint16_t threadGroupId, Task&& task)` to accept an rvalue `Task` for deferred execution on the main stack.
ServiceExecutorCoroutine implementation `src/mongo/transport/service_executor_coroutine.h`, `src/mongo/transport/service_executor_coroutine.cpp`	Implemented `deferCallOnMainStack(uint16_t, Task&&)` to validate `threadGroupId` and move the provided `Task` into the resume path for the target thread group; existing `const Task&` resume overloads remain unchanged.

Sequence Diagram(s)

sequenceDiagram
    participant SSM as ServiceStateMachine
    participant TG as ThreadGroup (origin)
    participant Exec as ServiceExecutor
    participant Kernel as Kernel (munmap)

    SSM->>TG: is current thread group == origin?
    alt different thread group
        SSM->>Kernel: munmap(stack)  -- immediate
    else same thread group
        SSM->>Exec: deferCallOnMainStack(threadGroupId, lambda capturing stack)
        Exec->>TG: schedule/resume lambda on main stack
        TG->>Kernel: lambda runs later -> munmap(stack)  -- deferred
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

killAllExpiredTransactions: use a shared_ptr to manage coro context #212: Also changes coroutine stack allocation/lifecycle and deferred cleanup paths — closely related to mmap/protection and cleanup strategy.
Use brpc Module Interface to refactor ThreadGroup loop #232: Modifies ServiceExecutor API and ServiceExecutorCoroutine resume behavior — related to the new deferCallOnMainStack overload.
Revert coroutine stack #236: Alters ServiceStateMachine stack allocation/cleanup semantics similar to this change.

Suggested reviewers

liunyl
lzxddz

Poem

🐇 I tunneled a stack with mmap and care,
If I'm on my home turf, don't unmap it there.
I asked the executor to hold the key,
A deferred munmap will set it free.
Hop safe — I'll nibble logs with glee. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: deferring munmap calls on coroutine stacks instead of calling them immediately, which is the primary modification across the ServiceStateMachine and ServiceExecutor implementations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd0c5f5 and e734b28.

📒 Files selected for processing (4)

src/mongo/transport/service_executor.h
src/mongo/transport/service_executor_coroutine.cpp
src/mongo/transport/service_executor_coroutine.h
src/mongo/transport/service_state_machine.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

src/mongo/transport/service_executor_coroutine.h

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

🧬 Code graph analysis (2)

src/mongo/transport/service_executor.h (1)

src/mongo/transport/service_executor_coroutine.cpp (2)

deferCallOnMainStack (328-331)

deferCallOnMainStack (328-328)

src/mongo/transport/service_state_machine.cpp (1)

src/mongo/db/local_thread_state.cpp (2)

ID (19-29)

ID (19-19)

🔇 Additional comments (4)

src/mongo/transport/service_executor_coroutine.cpp (1)

328-331: LGTM!

The new deferCallOnMainStack method is well-implemented:

Proper bounds validation with invariant

Correct use of std::move(task) when passing to resumeTask

Aligns with the PR objective of deferring munmap to the resume queue

src/mongo/transport/service_executor.h (1)

117-118: LGTM!

The new virtual method declaration is appropriate:

Empty default implementation allows backward compatibility with other executor types

Rvalue reference parameter (Task&&) enforces move semantics at the call site

src/mongo/transport/service_state_machine.cpp (2)

264-278: LGTM - Proper stack allocation with guard page.

The mmap-based allocation with an mprotect guard page is a solid approach for coroutine stacks:

Allocates with MAP_PRIVATE | MAP_ANONYMOUS for private anonymous mapping

Protects the first page with PROT_NONE to catch stack overflows

Aborting on failure is appropriate since this is a critical resource

284-294: Core fix is correct - deferred munmap prevents crash on coroutine stack.

The logic correctly handles both scenarios:

LocalThread::ID() == -1: Not on a thread group (e.g., asio/network thread), immediate munmap is safe

Otherwise: Defer munmap via the resume queue to execute after the coroutine has fully unwound

However, the dassert at line 287 is a no-op in release builds. If LocalThread::ID() doesn't match _threadGroupId in production (due to a bug elsewhere), the task would silently be enqueued to an incorrect thread group. Consider whether an invariant is warranted for stricter validation, or if the current debug-only assertion is acceptable given the existing assumptions about thread affinity.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @src/mongo/transport/service_state_machine.cpp:
- Around line 280-290: The dassert after _serviceExecutor->schedule(...) can be
a no-op in release builds, risking that _coroStack is never freed if schedule()
fails; replace the dassert with real error handling in the block that schedules
deferred unmap: check the returned Status from _serviceExecutor->schedule (for
the call using ServiceExecutor::ScheduleFlags::kDeferredTask and
transport::ServiceExecutorTaskName::kSSMProcessMessage with _threadGroupId), and
on failure either call ::munmap(_coroStack, kCoroStackSize) directly as a
fallback, log the failure and/or use invariant() to abort on unexpected
failures, and also add a sanity check or comment ensuring _serviceExecutor
remains valid during teardown so schedule() can be relied upon.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30c7bf6 and ec00ef5.

📒 Files selected for processing (1)

src/mongo/transport/service_state_machine.cpp

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

🧬 Code graph analysis (1)

src/mongo/transport/service_state_machine.cpp (1)

src/mongo/db/local_thread_state.cpp (2)

ID (19-29)

ID (19-19)

🔇 Additional comments (1)

src/mongo/transport/service_state_machine.cpp (1)

282-288: Verify executor lifecycle guarantee and error handling in destructor.

The destructor calls _serviceExecutor->schedule() to schedule a munmap task, but the returned Status is captured but never checked. If the executor has been shut down or cannot accept tasks, this failure is silently ignored, leaking the coroutine stack memory.

Confirm:

The ServiceExecutor has a shutdown-safe guarantee that it outlives all ServiceStateMachine instances and can accept kDeferredTask tasks until the end of the process, or

Check the Status return value and handle scheduling failures (log, verify success, or handle the cleanup differently)

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @src/mongo/transport/service_state_machine.cpp:
- Line 289: The destructor currently calls schedule() and uses
invariant(status.isOK()), which can crash during shutdown when schedule()
returns ShutdownInProgress; replace the invariant with the same graceful
handling used elsewhere: check the returned Status, and on error call
_terminateAndLogIfError(status, "scheduling cleanup task") (or log and return)
instead of asserting, so cleanup in the destructor won't abort the process;
update the destructor's schedule() call site to mirror the error-checking
pattern used around _terminateAndLogIfError().

🧹 Nitpick comments (1)

src/mongo/transport/service_state_machine.cpp (1)
280-290: Signed/unsigned comparison may cause incorrect behavior when LocalThread::ID() returns -1.

Based on the relevant snippet, LocalThread::ID() returns int16_t and can be -1 when not in a bthread context. If _threadGroupId is unsigned (constructor takes uint16_t groupId), comparing -1 with an unsigned value involves implicit conversion: -1 becomes 65535. If _threadGroupId ever equals 65535, the comparison incorrectly evaluates as equal, causing unnecessary deferred scheduling.

When LocalThread::ID() returns -1, we're definitively not on any coroutine stack, so immediate unmap is always safe. Consider making this explicit:
Suggested fix
-    if (LocalThread::ID() != _threadGroupId.load(std::memory_order_relaxed)) {
+    auto currentThreadGroup = LocalThread::ID();
+    // If currentThreadGroup is -1 (not in any thread group) or differs from the SSM's
+    // thread group, we're not on the coroutine stack and can safely unmap immediately.
+    if (currentThreadGroup < 0 ||
+        static_cast<uint16_t>(currentThreadGroup) != _threadGroupId.load(std::memory_order_relaxed)) {
         ::munmap(_coroStack, kCoroStackSize);
     } else {

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec00ef5 and a793ee9.

📒 Files selected for processing (1)

src/mongo/transport/service_state_machine.cpp

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

🧬 Code graph analysis (1)

src/mongo/transport/service_state_machine.cpp (1)

src/mongo/db/local_thread_state.cpp (2)

ID (19-29)

ID (19-19)

coderabbitai · 2026-01-12T05:54:13Z

+                                       ServiceExecutor::ScheduleFlags::kDeferredTask,
+                                       transport::ServiceExecutorTaskName::kSSMProcessMessage,
+                                       _threadGroupId.load(std::memory_order_relaxed));
+        invariant(status.isOK());


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and find the relevant code cd /tmp && git ls-files | grep -E "(service_state_machine|service_executor)" | head -20

Repository: eloqdata/eloqdoc

Length of output: 128

🏁 Script executed:

# Read the specific area around line 289 in service_state_machine.cpp sed -n '280,300p' src/mongo/transport/service_state_machine.cpp

Repository: eloqdata/eloqdoc

Length of output: 1132

🏁 Script executed:

# Check the surrounding context to understand what schedule() is being called on sed -n '270,310p' src/mongo/transport/service_state_machine.cpp

Repository: eloqdata/eloqdoc

Length of output: 1852

🏁 Script executed:

# Search for ServiceExecutor schedule method implementation rg -n "class ServiceExecutor" --type cpp -A 30 | head -100

Repository: eloqdata/eloqdoc

Length of output: 8192

🏁 Script executed:

# Look for schedule method signature and documentation rg -n "schedule\s*\(" --type cpp -g "*service_executor*" -B 2 -A 8 | head -150

Repository: eloqdata/eloqdoc

Length of output: 13068

🏁 Script executed:

# Search for other uses of schedule() to see typical error handling rg -n "schedule\(" src/mongo/transport/service_state_machine.cpp -B 2 -A 2

Repository: eloqdata/eloqdoc

Length of output: 1092

🏁 Script executed:

# Look for other places where schedule() return value is checked rg -n "schedule\(" --type cpp -B 1 -A 1 | grep -E "(ASSERT|Status|if|invariant)" -A 1 -B 1 | head -80

Repository: eloqdata/eloqdoc

Length of output: 5693

🏁 Script executed:

# Search for any tests that verify schedule() behavior during shutdown rg -l "shutdown|ShutdownInProgress" --type cpp src/mongo/transport/ | head -10

Repository: eloqdata/eloqdoc

Length of output: 492

🏁 Script executed:

# Check the ServiceExecutor interface definition for documentation find . -name "service_executor.h" -type f -exec head -100 {} \;

Repository: eloqdata/eloqdoc

Length of output: 4119

🏁 Script executed:

# Check for other invariant(status) patterns in service_state_machine.cpp rg -n "invariant\(status" src/mongo/transport/service_state_machine.cpp

Repository: eloqdata/eloqdoc

Length of output: 97

🏁 Script executed:

# See how schedule() errors are handled elsewhere in the transport layer rg -n "schedule\(" src/mongo/transport/ --type cpp -A 2 | grep -E "(if|Status|error)" -B 1 -A 1 | head -60

Repository: eloqdata/eloqdoc

Length of output: 4266

🏁 Script executed:

# Check specifically at line 678 in service_state_machine.cpp where there's another schedule call sed -n '675,690p' src/mongo/transport/service_state_machine.cpp

Repository: eloqdata/eloqdoc

Length of output: 711

🏁 Script executed:

# Look at the test for schedule failure handling sed -n '480,520p' src/mongo/transport/service_state_machine_test.cpp

Repository: eloqdata/eloqdoc

Length of output: 1829

🏁 Script executed:

# See what happens after the if check at line 691 (the else branch) sed -n '688,710p' src/mongo/transport/service_state_machine.cpp

Repository: eloqdata/eloqdoc

Length of output: 682

🏁 Script executed:

# Get more context on how schedule failures should be handled rg -n "ShutdownInProgress.*schedule" --type cpp -B 3 -A 3

Repository: eloqdata/eloqdoc

Length of output: 5358

Handle schedule() failure in destructor instead of using invariant().

At line 289, schedule() can fail with ShutdownInProgress during shutdown (confirmed by ServiceExecutor implementations and existing tests). Using invariant(status.isOK()) will crash the process. The same file demonstrates proper error handling at line 680: check the status and call _terminateAndLogIfError(). Either handle the error gracefully or log before the invariant for debuggability, especially since this is cleanup code in a destructor.

🤖 Prompt for AI Agents

In @src/mongo/transport/service_state_machine.cpp at line 289, The destructor currently calls schedule() and uses invariant(status.isOK()), which can crash during shutdown when schedule() returns ShutdownInProgress; replace the invariant with the same graceful handling used elsewhere: check the returned Status, and on error call _terminateAndLogIfError(status, "scheduling cleanup task") (or log and return) instead of asserting, so cleanup in the destructor won't abort the process; update the destructor's schedule() call site to mirror the error-checking pattern used around _terminateAndLogIfError().

githubzilla · 2026-01-12T06:13:22Z

+                                       ServiceExecutor::ScheduleFlags::kDeferredTask,
+                                       transport::ServiceExecutorTaskName::kSSMProcessMessage,
+                                       _threadGroupId.load(std::memory_order_relaxed));
+        invariant(status.isOK());


How to guarantee the scheduled mumap is executed after this coroutine finished?

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @src/mongo/transport/service_state_machine.cpp:
- Around line 280-290: The destructor of ServiceStateMachine calls
_serviceExecutor->coroutineResumeFunctor(...) without verifying _serviceExecutor
is valid; either confirm/document that ServiceContext or ServiceEntryPointImpl
guarantees the executor outlives ServiceStateMachine instances, or add a
defensive check/assert in the destructor (e.g., if (!_serviceExecutor) skip
submitting deferMunmap or dassert(_serviceExecutor) before calling
coroutineResumeFunctor). Also make the dassert compare LocalThread::ID() to
_threadGroupId.load(std::memory_order_relaxed) for consistency with the later
explicit .load() usage (reference symbols: ServiceStateMachine destructor,
_serviceExecutor, coroutineResumeFunctor, ServiceContext, ServiceEntryPointImpl,
LocalThread::ID, _threadGroupId, dassert).

🧹 Nitpick comments (2)

src/mongo/transport/service_executor_coroutine.cpp (1)
322-328: Consider using std::move(task) inside the lambda to avoid an extra copy.

The task is correctly moved into the lambda capture, but inside the lambda body, task is an lvalue. When passed to resumeTask(Task task), this triggers a copy construction. Since this lambda is only invoked once, moving is safe.
♻️ Suggested improvement
 std::function<void()> ServiceExecutorCoroutine::coroutineResumeFunctor(uint16_t threadGroupId,
                                                                        Task&& task) {
     invariant(threadGroupId < _threadGroups.size());
-    return [thd_group = &_threadGroups[threadGroupId], task = std::move(task)]() {
-        thd_group->resumeTask(task);
+    return [thd_group = &_threadGroups[threadGroupId], task = std::move(task)]() mutable {
+        thd_group->resumeTask(std::move(task));
     };
 }
src/mongo/transport/service_state_machine.cpp (1)
283-283: Type mismatch in debug assertion: int16_t compared with uint16_t.

LocalThread::ID() returns int16_t (can be -1), but _threadGroupId stores uint16_t. While the comparison here is after confirming ID() != -1, an explicit cast or using the same type would improve clarity.
♻️ Suggested fix
-        dassert(LocalThread::ID() == _threadGroupId);
+        dassert(static_cast<uint16_t>(LocalThread::ID()) == _threadGroupId.load(std::memory_order_relaxed));

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a793ee9 and 0e61944.

📒 Files selected for processing (4)

src/mongo/transport/service_executor.h
src/mongo/transport/service_executor_coroutine.cpp
src/mongo/transport/service_executor_coroutine.h
src/mongo/transport/service_state_machine.cpp

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

🧬 Code graph analysis (3)

src/mongo/transport/service_state_machine.cpp (1)

src/mongo/db/local_thread_state.cpp (2)

ID (19-29)

ID (19-19)

src/mongo/transport/service_executor_coroutine.h (2)

src/mongo/transport/service_executor.h (3)

std (108-110)

std (112-114)

std (116-119)

src/mongo/transport/service_executor_coroutine.cpp (4)

coroutineResumeFunctor (316-320)

coroutineResumeFunctor (316-317)

coroutineResumeFunctor (322-328)

coroutineResumeFunctor (322-323)

src/mongo/transport/service_executor.h (1)

src/mongo/transport/service_executor_coroutine.cpp (4)

coroutineResumeFunctor (316-320)

coroutineResumeFunctor (316-317)

coroutineResumeFunctor (322-328)

coroutineResumeFunctor (322-323)

🔇 Additional comments (2)

src/mongo/transport/service_executor_coroutine.h (1)

144-145: LGTM!

The new rvalue overload declaration is correctly added alongside the existing const lvalue reference overload, providing move semantics support for deferred task execution.

src/mongo/transport/service_executor.h (1)

112-114: LGTM!

The new virtual overload is correctly added with a default empty implementation, consistent with the existing coroutineResumeFunctor and coroutineLongResumeFunctor methods.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @src/mongo/transport/service_executor_coroutine.cpp:
- Around line 322-328: In ServiceExecutorCoroutine::coroutineResumeFunctor, the
lambda currently captures task by move but calls thd_group->resumeTask(task)
which passes task as an lvalue; change the lambda to be mutable and call
thd_group->resumeTask(std::move(task)) so the moved task is forwarded into
resumeTask, keeping the thd_group capture as-is to access _threadGroups.

In @src/mongo/transport/service_state_machine.cpp:
- Around line 277-291: The dassert in ServiceStateMachine::~ServiceStateMachine
that checks LocalThread::ID() == _threadGroupId.load(...) should be changed to
an invariant so the check runs in release builds; replace the dassert call with
invariant(LocalThread::ID() == _threadGroupId.load(std::memory_order_relaxed))
(or equivalent) immediately before constructing deferMunmap/submit to ensure we
detect a thread-group mismatch at runtime before calling
_serviceExecutor->coroutineResumeFunctor.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e61944 and c01f945.

📒 Files selected for processing (4)

src/mongo/transport/service_executor.h
src/mongo/transport/service_executor_coroutine.cpp
src/mongo/transport/service_executor_coroutine.h
src/mongo/transport/service_state_machine.cpp

🚧 Files skipped from review as they are similar to previous changes (2)

src/mongo/transport/service_executor.h
src/mongo/transport/service_executor_coroutine.h

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

🧬 Code graph analysis (1)

src/mongo/transport/service_state_machine.cpp (1)

src/mongo/db/local_thread_state.cpp (2)

ID (19-29)

ID (19-19)

🔇 Additional comments (2)

src/mongo/transport/service_state_machine.cpp (1)

284-289: LGTM - Safe capture and deferred execution.

The lambda correctly captures _coroStack by value before the destructor completes, ensuring the pointer is valid when the deferred munmap executes. Using std::move(deferMunmap) properly leverages the new rvalue overload to avoid unnecessary copies.

src/mongo/transport/service_executor_coroutine.cpp (1)

316-320: Existing overload captures by reference - ensure caller maintains task lifetime.

This overload captures task by reference, meaning the caller must ensure the Task object outlives the returned functor. This is appropriate for its use case in _migrateThreadGroup (line 755 in service_state_machine.cpp) where _resumeTask is a member variable with suitable lifetime.

coderabbitai · 2026-01-13T11:47:10Z

 ServiceStateMachine::~ServiceStateMachine() {
    MONGO_LOG(1) << "ServiceStateMachine::~ServiceStateMachine";
    _source = {};
-    ::munmap(_coroStack, kCoroStackSize);
+    if (LocalThread::ID() == -1) {
+        ::munmap(_coroStack, kCoroStackSize);
+    } else {
+        dassert(LocalThread::ID() == _threadGroupId.load(std::memory_order_relaxed));
+        auto deferMunmap = [coroStack = _coroStack] { ::munmap(coroStack, kCoroStackSize); };
+        // Enqueue the deferMunmap task to the resume queue instead of the task queue, because the
+        // task queue has closed when shutdown.
+        auto submit = _serviceExecutor->coroutineResumeFunctor(
+            _threadGroupId.load(std::memory_order_relaxed), std::move(deferMunmap));
+        submit();
+    }
 }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check how ThreadGroup handles tasks during/after shutdown rg -n -A5 "terminate\(\)" --type=cpp src/mongo/transport/

Repository: eloqdata/eloqdoc

Length of output: 6382

🏁 Script executed:

#!/bin/bash # Look for ServiceExecutor shutdown and resume queue behavior rg -n "resumeQueue\|resume_queue" --type=cpp src/mongo/transport/ | head -30

Repository: eloqdata/eloqdoc

Length of output: 42

🏁 Script executed:

#!/bin/bash # Check ServiceExecutor interface and shutdown rg -n -A10 "class ServiceExecutor" --type=cpp src/mongo/transport/ | head -40

Repository: eloqdata/eloqdoc

Length of output: 3188

🏁 Script executed:

#!/bin/bash # Look for coroutineResumeFunctor implementation rg -n -B2 -A8 "coroutineResumeFunctor" --type=cpp src/mongo/transport/ | head -50

Repository: eloqdata/eloqdoc

Length of output: 4739

🏁 Script executed:

#!/bin/bash # Check the actual destructor implementation for context about _serviceExecutor lifetime cat -n src/mongo/transport/service_state_machine.cpp | sed -n '277,300p'

Repository: eloqdata/eloqdoc

Length of output: 1361

🏁 Script executed:

#!/bin/bash # Find coroutineResumeFunctor implementation rg -n -A15 "std::function<void\(\)> ServiceExecutorCoroutine::coroutineResumeFunctor" --type=cpp src/mongo/transport/

Repository: eloqdata/eloqdoc

Length of output: 2482

🏁 Script executed:

#!/bin/bash # Check _serviceExecutor member and null checks in ServiceStateMachine rg -n "_serviceExecutor" --type=cpp src/mongo/transport/service_state_machine.cpp | head -20

Repository: eloqdata/eloqdoc

Length of output: 1010

🏁 Script executed:

#!/bin/bash # Look for _serviceExecutor initialization in ServiceStateMachine cat -n src/mongo/transport/service_state_machine.h | grep -A5 -B5 "_serviceExecutor"

Repository: eloqdata/eloqdoc

Length of output: 515

🏁 Script executed:

#!/bin/bash # Check if there are any null checks for _serviceExecutor before use rg -n "if.*_serviceExecutor|_serviceExecutor.*nullptr" --type=cpp src/mongo/transport/service_state_machine.cpp

Repository: eloqdata/eloqdoc

Length of output: 42

🏁 Script executed:

#!/bin/bash # Look at the resumeTask implementation to understand how resume queue works rg -n -A10 "void ThreadGroup::resumeTask" --type=cpp src/mongo/transport/service_executor_coroutine.cpp

Repository: eloqdata/eloqdoc

Length of output: 417

Consider using invariant instead of dassert to validate thread group consistency in production.

The dassert on line 283 only validates in debug builds. In release builds, if the current thread group doesn't match _threadGroupId, the deferred munmap task would be enqueued to a potentially incorrect thread group without detection. While coroutineResumeFunctor includes an invariant bounds check on threadGroupId, this only catches invalid IDs—not logical mismatches between LocalThread::ID() and the stored _threadGroupId. Consider promoting this to an invariant check if thread group consistency is critical for correctness.

The comment about the resume queue remaining operational during shutdown (while the task queue closes) is accurate—ThreadGroup::resumeTask will continue to accept and enqueue tasks.

🤖 Prompt for AI Agents

In @src/mongo/transport/service_state_machine.cpp around lines 277 - 291, The dassert in ServiceStateMachine::~ServiceStateMachine that checks LocalThread::ID() == _threadGroupId.load(...) should be changed to an invariant so the check runs in release builds; replace the dassert call with invariant(LocalThread::ID() == _threadGroupId.load(std::memory_order_relaxed)) (or equivalent) immediately before constructing deferMunmap/submit to ensure we detect a thread-group mismatch at runtime before calling _serviceExecutor->coroutineResumeFunctor.

githubzilla

+1

xiexiaoy added the trigger-ci label Jan 12, 2026

xiexiaoy linked an issue Jan 12, 2026 that may be closed by this pull request

eloqdoc crash when client close but its request is still under processing #420

Closed

coderabbitai Bot reviewed Jan 12, 2026

View reviewed changes

Comment thread src/mongo/transport/service_state_machine.cpp Outdated

xiexiaoy requested a review from githubzilla January 12, 2026 05:50

xiexiaoy force-pushed the fix_munmap_crash branch from ec00ef5 to a793ee9 Compare January 12, 2026 05:51

coderabbitai Bot reviewed Jan 12, 2026

View reviewed changes

githubzilla reviewed Jan 12, 2026

View reviewed changes

xiexiaoy marked this pull request as draft January 12, 2026 06:23

xiexiaoy force-pushed the fix_munmap_crash branch from a793ee9 to 0e61944 Compare January 13, 2026 11:32

xiexiaoy marked this pull request as ready for review January 13, 2026 11:37

coderabbitai Bot reviewed Jan 13, 2026

View reviewed changes

Comment thread src/mongo/transport/service_state_machine.cpp

xiexiaoy force-pushed the fix_munmap_crash branch from 0e61944 to c01f945 Compare January 13, 2026 11:44

coderabbitai Bot reviewed Jan 13, 2026

View reviewed changes

xiexiaoy force-pushed the fix_munmap_crash branch 2 times, most recently from f5afed5 to bd0c5f5 Compare January 14, 2026 03:35

~ServiceStateMachine: Avoid call munmap on coroutine stack

e734b28

xiexiaoy force-pushed the fix_munmap_crash branch from bd0c5f5 to e734b28 Compare January 14, 2026 03:42

xiexiaoy requested a review from githubzilla January 14, 2026 03:52

githubzilla reviewed Jan 14, 2026

View reviewed changes

githubzilla approved these changes Jan 14, 2026

View reviewed changes

xiexiaoy merged commit ebc8124 into eloqdata:main Jan 14, 2026
2 of 3 checks passed

xiexiaoy deleted the fix_munmap_crash branch January 14, 2026 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

~ServiceStateMachine: Avoid call munmap on coroutine stack#422

~ServiceStateMachine: Avoid call munmap on coroutine stack#422
xiexiaoy merged 1 commit into
eloqdata:mainfrom
xiexiaoy:fix_munmap_crash

xiexiaoy commented Jan 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jan 12, 2026

Uh oh!

githubzilla Jan 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Jan 13, 2026

Uh oh!

githubzilla left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xiexiaoy commented Jan 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

githubzilla Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

githubzilla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiexiaoy commented Jan 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 12, 2026 •

edited

Loading