~ServiceStateMachine: Avoid call munmap on coroutine stack#422
Conversation
WalkthroughServiceStateMachine switches coroutine stack allocation to mmap/mprotect and defers munmap to the ServiceExecutor when running on the originating thread group; otherwise it unmaps immediately. ServiceExecutor gains a new rvalue Changes
Sequence Diagram(s)sequenceDiagram
participant SSM as ServiceStateMachine
participant TG as ThreadGroup (origin)
participant Exec as ServiceExecutor
participant Kernel as Kernel (munmap)
SSM->>TG: is current thread group == origin?
alt different thread group
SSM->>Kernel: munmap(stack) -- immediate
else same thread group
SSM->>Exec: deferCallOnMainStack(threadGroupId, lambda capturing stack)
Exec->>TG: schedule/resume lambda on main stack
TG->>Kernel: lambda runs later -> munmap(stack) -- deferred
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings🧬 Code graph analysis (2)src/mongo/transport/service_executor.h (1)
src/mongo/transport/service_state_machine.cpp (1)
🔇 Additional comments (4)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/mongo/transport/service_state_machine.cpp:
- Around line 280-290: The dassert after _serviceExecutor->schedule(...) can be
a no-op in release builds, risking that _coroStack is never freed if schedule()
fails; replace the dassert with real error handling in the block that schedules
deferred unmap: check the returned Status from _serviceExecutor->schedule (for
the call using ServiceExecutor::ScheduleFlags::kDeferredTask and
transport::ServiceExecutorTaskName::kSSMProcessMessage with _threadGroupId), and
on failure either call ::munmap(_coroStack, kCoroStackSize) directly as a
fallback, log the failure and/or use invariant() to abort on unexpected
failures, and also add a sanity check or comment ensuring _serviceExecutor
remains valid during teardown so schedule() can be relied upon.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/mongo/transport/service_state_machine.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/local_thread_state.cpp (2)
ID(19-29)ID(19-19)
🔇 Additional comments (1)
src/mongo/transport/service_state_machine.cpp (1)
282-288: Verify executor lifecycle guarantee and error handling in destructor.The destructor calls
_serviceExecutor->schedule()to schedule a munmap task, but the returnedStatusis captured but never checked. If the executor has been shut down or cannot accept tasks, this failure is silently ignored, leaking the coroutine stack memory.Confirm:
- The
ServiceExecutorhas a shutdown-safe guarantee that it outlives allServiceStateMachineinstances and can acceptkDeferredTasktasks until the end of the process, or- Check the
Statusreturn value and handle scheduling failures (log, verify success, or handle the cleanup differently)
ec00ef5 to
a793ee9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/mongo/transport/service_state_machine.cpp:
- Line 289: The destructor currently calls schedule() and uses
invariant(status.isOK()), which can crash during shutdown when schedule()
returns ShutdownInProgress; replace the invariant with the same graceful
handling used elsewhere: check the returned Status, and on error call
_terminateAndLogIfError(status, "scheduling cleanup task") (or log and return)
instead of asserting, so cleanup in the destructor won't abort the process;
update the destructor's schedule() call site to mirror the error-checking
pattern used around _terminateAndLogIfError().
🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.cpp (1)
280-290: Signed/unsigned comparison may cause incorrect behavior whenLocalThread::ID()returns -1.Based on the relevant snippet,
LocalThread::ID()returnsint16_tand can be -1 when not in a bthread context. If_threadGroupIdis unsigned (constructor takesuint16_t groupId), comparing -1 with an unsigned value involves implicit conversion: -1 becomes 65535. If_threadGroupIdever equals 65535, the comparison incorrectly evaluates as equal, causing unnecessary deferred scheduling.When
LocalThread::ID()returns -1, we're definitively not on any coroutine stack, so immediate unmap is always safe. Consider making this explicit:Suggested fix
- if (LocalThread::ID() != _threadGroupId.load(std::memory_order_relaxed)) { + auto currentThreadGroup = LocalThread::ID(); + // If currentThreadGroup is -1 (not in any thread group) or differs from the SSM's + // thread group, we're not on the coroutine stack and can safely unmap immediately. + if (currentThreadGroup < 0 || + static_cast<uint16_t>(currentThreadGroup) != _threadGroupId.load(std::memory_order_relaxed)) { ::munmap(_coroStack, kCoroStackSize); } else {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/mongo/transport/service_state_machine.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/local_thread_state.cpp (2)
ID(19-29)ID(19-19)
| ServiceExecutor::ScheduleFlags::kDeferredTask, | ||
| transport::ServiceExecutorTaskName::kSSMProcessMessage, | ||
| _threadGroupId.load(std::memory_order_relaxed)); | ||
| invariant(status.isOK()); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and find the relevant code
cd /tmp && git ls-files | grep -E "(service_state_machine|service_executor)" | head -20Repository: eloqdata/eloqdoc
Length of output: 128
🏁 Script executed:
# Read the specific area around line 289 in service_state_machine.cpp
sed -n '280,300p' src/mongo/transport/service_state_machine.cppRepository: eloqdata/eloqdoc
Length of output: 1132
🏁 Script executed:
# Check the surrounding context to understand what schedule() is being called on
sed -n '270,310p' src/mongo/transport/service_state_machine.cppRepository: eloqdata/eloqdoc
Length of output: 1852
🏁 Script executed:
# Search for ServiceExecutor schedule method implementation
rg -n "class ServiceExecutor" --type cpp -A 30 | head -100Repository: eloqdata/eloqdoc
Length of output: 8192
🏁 Script executed:
# Look for schedule method signature and documentation
rg -n "schedule\s*\(" --type cpp -g "*service_executor*" -B 2 -A 8 | head -150Repository: eloqdata/eloqdoc
Length of output: 13068
🏁 Script executed:
# Search for other uses of schedule() to see typical error handling
rg -n "schedule\(" src/mongo/transport/service_state_machine.cpp -B 2 -A 2Repository: eloqdata/eloqdoc
Length of output: 1092
🏁 Script executed:
# Look for other places where schedule() return value is checked
rg -n "schedule\(" --type cpp -B 1 -A 1 | grep -E "(ASSERT|Status|if|invariant)" -A 1 -B 1 | head -80Repository: eloqdata/eloqdoc
Length of output: 5693
🏁 Script executed:
# Search for any tests that verify schedule() behavior during shutdown
rg -l "shutdown|ShutdownInProgress" --type cpp src/mongo/transport/ | head -10Repository: eloqdata/eloqdoc
Length of output: 492
🏁 Script executed:
# Check the ServiceExecutor interface definition for documentation
find . -name "service_executor.h" -type f -exec head -100 {} \;Repository: eloqdata/eloqdoc
Length of output: 4119
🏁 Script executed:
# Check for other invariant(status) patterns in service_state_machine.cpp
rg -n "invariant\(status" src/mongo/transport/service_state_machine.cppRepository: eloqdata/eloqdoc
Length of output: 97
🏁 Script executed:
# See how schedule() errors are handled elsewhere in the transport layer
rg -n "schedule\(" src/mongo/transport/ --type cpp -A 2 | grep -E "(if|Status|error)" -B 1 -A 1 | head -60Repository: eloqdata/eloqdoc
Length of output: 4266
🏁 Script executed:
# Check specifically at line 678 in service_state_machine.cpp where there's another schedule call
sed -n '675,690p' src/mongo/transport/service_state_machine.cppRepository: eloqdata/eloqdoc
Length of output: 711
🏁 Script executed:
# Look at the test for schedule failure handling
sed -n '480,520p' src/mongo/transport/service_state_machine_test.cppRepository: eloqdata/eloqdoc
Length of output: 1829
🏁 Script executed:
# See what happens after the if check at line 691 (the else branch)
sed -n '688,710p' src/mongo/transport/service_state_machine.cppRepository: eloqdata/eloqdoc
Length of output: 682
🏁 Script executed:
# Get more context on how schedule failures should be handled
rg -n "ShutdownInProgress.*schedule" --type cpp -B 3 -A 3Repository: eloqdata/eloqdoc
Length of output: 5358
Handle schedule() failure in destructor instead of using invariant().
At line 289, schedule() can fail with ShutdownInProgress during shutdown (confirmed by ServiceExecutor implementations and existing tests). Using invariant(status.isOK()) will crash the process. The same file demonstrates proper error handling at line 680: check the status and call _terminateAndLogIfError(). Either handle the error gracefully or log before the invariant for debuggability, especially since this is cleanup code in a destructor.
🤖 Prompt for AI Agents
In @src/mongo/transport/service_state_machine.cpp at line 289, The destructor
currently calls schedule() and uses invariant(status.isOK()), which can crash
during shutdown when schedule() returns ShutdownInProgress; replace the
invariant with the same graceful handling used elsewhere: check the returned
Status, and on error call _terminateAndLogIfError(status, "scheduling cleanup
task") (or log and return) instead of asserting, so cleanup in the destructor
won't abort the process; update the destructor's schedule() call site to mirror
the error-checking pattern used around _terminateAndLogIfError().
| ServiceExecutor::ScheduleFlags::kDeferredTask, | ||
| transport::ServiceExecutorTaskName::kSSMProcessMessage, | ||
| _threadGroupId.load(std::memory_order_relaxed)); | ||
| invariant(status.isOK()); |
There was a problem hiding this comment.
How to guarantee the scheduled mumap is executed after this coroutine finished?
a793ee9 to
0e61944
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/mongo/transport/service_state_machine.cpp:
- Around line 280-290: The destructor of ServiceStateMachine calls
_serviceExecutor->coroutineResumeFunctor(...) without verifying _serviceExecutor
is valid; either confirm/document that ServiceContext or ServiceEntryPointImpl
guarantees the executor outlives ServiceStateMachine instances, or add a
defensive check/assert in the destructor (e.g., if (!_serviceExecutor) skip
submitting deferMunmap or dassert(_serviceExecutor) before calling
coroutineResumeFunctor). Also make the dassert compare LocalThread::ID() to
_threadGroupId.load(std::memory_order_relaxed) for consistency with the later
explicit .load() usage (reference symbols: ServiceStateMachine destructor,
_serviceExecutor, coroutineResumeFunctor, ServiceContext, ServiceEntryPointImpl,
LocalThread::ID, _threadGroupId, dassert).
🧹 Nitpick comments (2)
src/mongo/transport/service_executor_coroutine.cpp (1)
322-328: Consider usingstd::move(task)inside the lambda to avoid an extra copy.The task is correctly moved into the lambda capture, but inside the lambda body,
taskis an lvalue. When passed toresumeTask(Task task), this triggers a copy construction. Since this lambda is only invoked once, moving is safe.♻️ Suggested improvement
std::function<void()> ServiceExecutorCoroutine::coroutineResumeFunctor(uint16_t threadGroupId, Task&& task) { invariant(threadGroupId < _threadGroups.size()); - return [thd_group = &_threadGroups[threadGroupId], task = std::move(task)]() { - thd_group->resumeTask(task); + return [thd_group = &_threadGroups[threadGroupId], task = std::move(task)]() mutable { + thd_group->resumeTask(std::move(task)); }; }src/mongo/transport/service_state_machine.cpp (1)
283-283: Type mismatch in debug assertion:int16_tcompared withuint16_t.
LocalThread::ID()returnsint16_t(can be -1), but_threadGroupIdstoresuint16_t. While the comparison here is after confirmingID() != -1, an explicit cast or using the same type would improve clarity.♻️ Suggested fix
- dassert(LocalThread::ID() == _threadGroupId); + dassert(static_cast<uint16_t>(LocalThread::ID()) == _threadGroupId.load(std::memory_order_relaxed));
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/transport/service_executor.hsrc/mongo/transport/service_executor_coroutine.cppsrc/mongo/transport/service_executor_coroutine.hsrc/mongo/transport/service_state_machine.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🧬 Code graph analysis (3)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/local_thread_state.cpp (2)
ID(19-29)ID(19-19)
src/mongo/transport/service_executor_coroutine.h (2)
src/mongo/transport/service_executor.h (3)
std(108-110)std(112-114)std(116-119)src/mongo/transport/service_executor_coroutine.cpp (4)
coroutineResumeFunctor(316-320)coroutineResumeFunctor(316-317)coroutineResumeFunctor(322-328)coroutineResumeFunctor(322-323)
src/mongo/transport/service_executor.h (1)
src/mongo/transport/service_executor_coroutine.cpp (4)
coroutineResumeFunctor(316-320)coroutineResumeFunctor(316-317)coroutineResumeFunctor(322-328)coroutineResumeFunctor(322-323)
🔇 Additional comments (2)
src/mongo/transport/service_executor_coroutine.h (1)
144-145: LGTM!The new rvalue overload declaration is correctly added alongside the existing const lvalue reference overload, providing move semantics support for deferred task execution.
src/mongo/transport/service_executor.h (1)
112-114: LGTM!The new virtual overload is correctly added with a default empty implementation, consistent with the existing
coroutineResumeFunctorandcoroutineLongResumeFunctormethods.
0e61944 to
c01f945
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @src/mongo/transport/service_executor_coroutine.cpp:
- Around line 322-328: In ServiceExecutorCoroutine::coroutineResumeFunctor, the
lambda currently captures task by move but calls thd_group->resumeTask(task)
which passes task as an lvalue; change the lambda to be mutable and call
thd_group->resumeTask(std::move(task)) so the moved task is forwarded into
resumeTask, keeping the thd_group capture as-is to access _threadGroups.
In @src/mongo/transport/service_state_machine.cpp:
- Around line 277-291: The dassert in ServiceStateMachine::~ServiceStateMachine
that checks LocalThread::ID() == _threadGroupId.load(...) should be changed to
an invariant so the check runs in release builds; replace the dassert call with
invariant(LocalThread::ID() == _threadGroupId.load(std::memory_order_relaxed))
(or equivalent) immediately before constructing deferMunmap/submit to ensure we
detect a thread-group mismatch at runtime before calling
_serviceExecutor->coroutineResumeFunctor.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/transport/service_executor.hsrc/mongo/transport/service_executor_coroutine.cppsrc/mongo/transport/service_executor_coroutine.hsrc/mongo/transport/service_state_machine.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- src/mongo/transport/service_executor.h
- src/mongo/transport/service_executor_coroutine.h
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/local_thread_state.cpp (2)
ID(19-29)ID(19-19)
🔇 Additional comments (2)
src/mongo/transport/service_state_machine.cpp (1)
284-289: LGTM - Safe capture and deferred execution.The lambda correctly captures
_coroStackby value before the destructor completes, ensuring the pointer is valid when the deferredmunmapexecutes. Usingstd::move(deferMunmap)properly leverages the new rvalue overload to avoid unnecessary copies.src/mongo/transport/service_executor_coroutine.cpp (1)
316-320: Existing overload captures by reference - ensure caller maintains task lifetime.This overload captures
taskby reference, meaning the caller must ensure theTaskobject outlives the returned functor. This is appropriate for its use case in_migrateThreadGroup(line 755 in service_state_machine.cpp) where_resumeTaskis a member variable with suitable lifetime.
| ServiceStateMachine::~ServiceStateMachine() { | ||
| MONGO_LOG(1) << "ServiceStateMachine::~ServiceStateMachine"; | ||
| _source = {}; | ||
| ::munmap(_coroStack, kCoroStackSize); | ||
| if (LocalThread::ID() == -1) { | ||
| ::munmap(_coroStack, kCoroStackSize); | ||
| } else { | ||
| dassert(LocalThread::ID() == _threadGroupId.load(std::memory_order_relaxed)); | ||
| auto deferMunmap = [coroStack = _coroStack] { ::munmap(coroStack, kCoroStackSize); }; | ||
| // Enqueue the deferMunmap task to the resume queue instead of the task queue, because the | ||
| // task queue has closed when shutdown. | ||
| auto submit = _serviceExecutor->coroutineResumeFunctor( | ||
| _threadGroupId.load(std::memory_order_relaxed), std::move(deferMunmap)); | ||
| submit(); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how ThreadGroup handles tasks during/after shutdown
rg -n -A5 "terminate\(\)" --type=cpp src/mongo/transport/Repository: eloqdata/eloqdoc
Length of output: 6382
🏁 Script executed:
#!/bin/bash
# Look for ServiceExecutor shutdown and resume queue behavior
rg -n "resumeQueue\|resume_queue" --type=cpp src/mongo/transport/ | head -30Repository: eloqdata/eloqdoc
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Check ServiceExecutor interface and shutdown
rg -n -A10 "class ServiceExecutor" --type=cpp src/mongo/transport/ | head -40Repository: eloqdata/eloqdoc
Length of output: 3188
🏁 Script executed:
#!/bin/bash
# Look for coroutineResumeFunctor implementation
rg -n -B2 -A8 "coroutineResumeFunctor" --type=cpp src/mongo/transport/ | head -50Repository: eloqdata/eloqdoc
Length of output: 4739
🏁 Script executed:
#!/bin/bash
# Check the actual destructor implementation for context about _serviceExecutor lifetime
cat -n src/mongo/transport/service_state_machine.cpp | sed -n '277,300p'Repository: eloqdata/eloqdoc
Length of output: 1361
🏁 Script executed:
#!/bin/bash
# Find coroutineResumeFunctor implementation
rg -n -A15 "std::function<void\(\)> ServiceExecutorCoroutine::coroutineResumeFunctor" --type=cpp src/mongo/transport/Repository: eloqdata/eloqdoc
Length of output: 2482
🏁 Script executed:
#!/bin/bash
# Check _serviceExecutor member and null checks in ServiceStateMachine
rg -n "_serviceExecutor" --type=cpp src/mongo/transport/service_state_machine.cpp | head -20Repository: eloqdata/eloqdoc
Length of output: 1010
🏁 Script executed:
#!/bin/bash
# Look for _serviceExecutor initialization in ServiceStateMachine
cat -n src/mongo/transport/service_state_machine.h | grep -A5 -B5 "_serviceExecutor"Repository: eloqdata/eloqdoc
Length of output: 515
🏁 Script executed:
#!/bin/bash
# Check if there are any null checks for _serviceExecutor before use
rg -n "if.*_serviceExecutor|_serviceExecutor.*nullptr" --type=cpp src/mongo/transport/service_state_machine.cppRepository: eloqdata/eloqdoc
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Look at the resumeTask implementation to understand how resume queue works
rg -n -A10 "void ThreadGroup::resumeTask" --type=cpp src/mongo/transport/service_executor_coroutine.cppRepository: eloqdata/eloqdoc
Length of output: 417
Consider using invariant instead of dassert to validate thread group consistency in production.
The dassert on line 283 only validates in debug builds. In release builds, if the current thread group doesn't match _threadGroupId, the deferred munmap task would be enqueued to a potentially incorrect thread group without detection. While coroutineResumeFunctor includes an invariant bounds check on threadGroupId, this only catches invalid IDs—not logical mismatches between LocalThread::ID() and the stored _threadGroupId. Consider promoting this to an invariant check if thread group consistency is critical for correctness.
The comment about the resume queue remaining operational during shutdown (while the task queue closes) is accurate—ThreadGroup::resumeTask will continue to accept and enqueue tasks.
🤖 Prompt for AI Agents
In @src/mongo/transport/service_state_machine.cpp around lines 277 - 291, The
dassert in ServiceStateMachine::~ServiceStateMachine that checks
LocalThread::ID() == _threadGroupId.load(...) should be changed to an invariant
so the check runs in release builds; replace the dassert call with
invariant(LocalThread::ID() == _threadGroupId.load(std::memory_order_relaxed))
(or equivalent) immediately before constructing deferMunmap/submit to ensure we
detect a thread-group mismatch at runtime before calling
_serviceExecutor->coroutineResumeFunctor.
f5afed5 to
bd0c5f5
Compare
bd0c5f5 to
e734b28
Compare
Summary by CodeRabbit
Bug Fixes
Chores
Stability
✏️ Tip: You can customize this high-level summary in your review settings.