getMoreCmd: migrate to original thread group#411
Conversation
WalkthroughNon-global CursorIds now embed an 8-bit thread group ID. Cursor allocation and a new extractor expose that thread group. GetMore now migrates execution to the cursor's thread group (via coroutine migration) for non-globally managed cursors; minor error-message reformatting in getmore validation. Changes
Sequence DiagramsequenceDiagram
participant Client
participant SEP as ServiceEntryPoint
participant GetMore as GetMoreCmd
participant CursorMgr as CursorManager
participant ThreadGrp as ThreadGroup
participant ClientCtx as ClientContext
Client->>SEP: send getMore request
SEP->>GetMore: dispatch request
GetMore->>CursorMgr: lookup cursor by CursorId
CursorMgr-->>GetMore: cursor (non-global) + metadata
rect `#f0f8ff`
note over GetMore,CursorMgr: determine target thread group
GetMore->>CursorMgr: threadGroupIdFromCursorId(id)
CursorMgr-->>GetMore: targetThreadGroupId
alt targetThreadGroupId != currentThreadGroupId
GetMore->>ThreadGrp: migrate coroutine to targetThreadGroupId
ThreadGrp->>ClientCtx: attach client to migrated context
ClientCtx-->>ThreadGrp: invariant: same client preserved
ThreadGrp-->>GetMore: migration complete
end
end
rect `#f0fff0`
note over GetMore: perform getMore on cursor
GetMore->>GetMore: resume execution, fetch next batch
GetMore-->>Client: return results
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @src/mongo/db/cursor_manager.cpp:
- Around line 702-709: The dassert check on threadGroupId (from
LocalThread::ID()) is only active in debug builds, so replace the debug-only
check with a production-enforced assertion (e.g., fassert or invariant) to
prevent silent truncation; specifically, change the dassert(threadGroupId <=
0xFF) to a production check (fassertFailedWithStatusNoTrace or
invariant(threadGroupId <= 0xFF)) before computing myPart and calling
cursorIdFromParts(_collectionCacheRuntimeId, myPart) so any ID > 255 fails
loudly instead of being masked.
🧹 Nitpick comments (1)
src/mongo/db/commands/getmore_cmd.cpp (1)
218-230: Consider reducing log verbosity for production.Lines 221–222 and 226–227 log before and after migration. While useful for debugging, this double logging on every cross-thread-group getMore may produce excessive log volume in production environments.
Suggested refinement
Consider consolidating to a single log entry or using a lower log level:
if (threadGroupId != LocalThread::ID()) { - log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " - << request.cursorid << ". Current ThreadGroup " << LocalThread::ID(); Client* client = Client::getCurrent(); const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors(); (*coro.migrateThreadGroupFuncPtr)(threadGroupId); log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " - << request.cursorid << " done."; + << request.cursorid << " (from ThreadGroup " << LocalThread::ID() << ")"; invariant(Client::getCurrent() == client); }Alternatively, use
LOG(1)orLOGV2_DEBUGfor detailed migration traces.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/commands/getmore_cmd.cppsrc/mongo/db/cursor_manager.cppsrc/mongo/db/cursor_manager.h
🧰 Additional context used
🧬 Code graph analysis (3)
src/mongo/db/cursor_manager.cpp (1)
src/mongo/db/cursor_manager.h (1)
isGloballyManagedCursor(223-228)
src/mongo/db/cursor_manager.h (1)
src/mongo/db/cursor_manager.cpp (3)
threadGroupIdFromCursorId(718-723)threadGroupIdFromCursorId(718-718)id(96-96)
src/mongo/db/commands/getmore_cmd.cpp (2)
src/mongo/db/cursor_manager.cpp (2)
threadGroupIdFromCursorId(718-723)threadGroupIdFromCursorId(718-718)src/mongo/db/cursor_manager.h (1)
isGloballyManagedCursor(223-228)
🔇 Additional comments (2)
src/mongo/db/cursor_manager.h (1)
251-251: LGTM!The declaration correctly matches the implementation in
cursor_manager.cpp. The method provides the necessary interface for extracting thread group information from cursor IDs to support the getMore migration logic.src/mongo/db/commands/getmore_cmd.cpp (1)
86-87: LGTM!The formatting adjustments to error messages improve readability without changing the observable message content.
Also applies to: 93-94, 114-115, 120-121
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @src/mongo/db/commands/getmore_cmd.cpp:
- Around line 218-230: The code dereferences
Client::getCurrent()->coroutineFunctors() and calls
(*coro.migrateThreadGroupFuncPtr) without verifying that Client::getCurrent() is
non-null, that coroutineFunctors() returns a valid object, and that
migrateThreadGroupFuncPtr is non-null; add checks to fetch Client* client =
Client::getCurrent() once, verify client is non-null, get const
CoroutineFunctors& coro = client->coroutineFunctors() (or a pointer/reference
that can be null-checked), ensure coro.migrateThreadGroupFuncPtr is not null
before calling it, and if any check fails log an error and return or abort the
migration path (preserving the invariant(Client::getCurrent() == client)
behavior when migration succeeds).
In @src/mongo/db/cursor_manager.cpp:
- Around line 718-723: Update CursorManager::threadGroupIdFromCursorId: correct
the comment to say it extracts bits 24–31 of the full 64-bit CursorId (i.e., the
high byte of the lower 32 bits), and strengthen the precondition by replacing
the debug-only dassert(!isGloballyManagedCursor(id)) with a runtime check
(invariant or uassert) so calling with a globally managed cursor fails in all
builds and avoids returning a meaningless thread group id.
- Around line 702-709: The dassert on threadGroupId is only active in debug
builds and must be enforced in production: replace the dassert(threadGroupId <=
0xFF) with a hard runtime check that fails loudly (use fassert/invariant/uassert
style used in this codebase) if LocalThread::ID() > 0xFF, providing a clear
error message; keep the subsequent masking and id =
cursorIdFromParts(_collectionCacheRuntimeId, myPart) logic unchanged so the
function behavior is consistent when the check passes.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/commands/getmore_cmd.cppsrc/mongo/db/cursor_manager.cppsrc/mongo/db/cursor_manager.hsrc/mongo/transport/service_entry_point_impl.cpp
🧰 Additional context used
🧬 Code graph analysis (3)
src/mongo/db/cursor_manager.h (1)
src/mongo/db/cursor_manager.cpp (3)
threadGroupIdFromCursorId(718-723)threadGroupIdFromCursorId(718-718)id(96-96)
src/mongo/db/commands/getmore_cmd.cpp (2)
src/mongo/db/cursor_manager.cpp (2)
threadGroupIdFromCursorId(718-723)threadGroupIdFromCursorId(718-718)src/mongo/db/cursor_manager.h (1)
isGloballyManagedCursor(223-228)
src/mongo/db/cursor_manager.cpp (1)
src/mongo/db/cursor_manager.h (1)
isGloballyManagedCursor(223-228)
🔇 Additional comments (2)
src/mongo/db/cursor_manager.h (1)
251-251: LGTM!The function declaration is appropriate. The return type
int16_tcan accommodate the 8-bit thread group ID (0–255) being extracted.src/mongo/transport/service_entry_point_impl.cpp (1)
78-80: Validation correctly enforces the 8-bit thread group limit.The check ensures
reservedThreadNumfits within the 8-bit field allocated in the cursor ID encoding, preventing the silent truncation issue that would occur if this validation were missing.
| if (!CursorManager::isGloballyManagedCursor(request.cursorid)) { | ||
| int16_t threadGroupId = CursorManager::threadGroupIdFromCursorId(request.cursorid); | ||
| if (threadGroupId != LocalThread::ID()) { | ||
| log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " | ||
| << request.cursorid << ". Current ThreadGroup " << LocalThread::ID(); | ||
| Client* client = Client::getCurrent(); | ||
| const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors(); | ||
| (*coro.migrateThreadGroupFuncPtr)(threadGroupId); | ||
| log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " | ||
| << request.cursorid << " done."; | ||
| invariant(Client::getCurrent() == client); | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate coroutine functors before dereferencing.
Lines 224–225 access Client::getCurrent()->coroutineFunctors() and dereference the function pointer without null checks. If coroutine functors are not initialized or the migration function pointer is null, this will cause a null-pointer dereference.
🔎 Proposed fix: Add validation before migration
if (threadGroupId != LocalThread::ID()) {
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << ". Current ThreadGroup " << LocalThread::ID();
Client* client = Client::getCurrent();
const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors();
+ uassert(ErrorCodes::OperationFailed,
+ "Coroutine migration not available",
+ coro.migrateThreadGroupFuncPtr != nullptr);
(*coro.migrateThreadGroupFuncPtr)(threadGroupId);
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << " done.";
invariant(Client::getCurrent() == client);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!CursorManager::isGloballyManagedCursor(request.cursorid)) { | |
| int16_t threadGroupId = CursorManager::threadGroupIdFromCursorId(request.cursorid); | |
| if (threadGroupId != LocalThread::ID()) { | |
| log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " | |
| << request.cursorid << ". Current ThreadGroup " << LocalThread::ID(); | |
| Client* client = Client::getCurrent(); | |
| const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors(); | |
| (*coro.migrateThreadGroupFuncPtr)(threadGroupId); | |
| log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " | |
| << request.cursorid << " done."; | |
| invariant(Client::getCurrent() == client); | |
| } | |
| } | |
| if (!CursorManager::isGloballyManagedCursor(request.cursorid)) { | |
| int16_t threadGroupId = CursorManager::threadGroupIdFromCursorId(request.cursorid); | |
| if (threadGroupId != LocalThread::ID()) { | |
| log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " | |
| << request.cursorid << ". Current ThreadGroup " << LocalThread::ID(); | |
| Client* client = Client::getCurrent(); | |
| const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors(); | |
| uassert(ErrorCodes::OperationFailed, | |
| "Coroutine migration not available", | |
| coro.migrateThreadGroupFuncPtr != nullptr); | |
| (*coro.migrateThreadGroupFuncPtr)(threadGroupId); | |
| log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor " | |
| << request.cursorid << " done."; | |
| invariant(Client::getCurrent() == client); | |
| } | |
| } |
🤖 Prompt for AI Agents
In @src/mongo/db/commands/getmore_cmd.cpp around lines 218 - 230, The code
dereferences Client::getCurrent()->coroutineFunctors() and calls
(*coro.migrateThreadGroupFuncPtr) without verifying that Client::getCurrent() is
non-null, that coroutineFunctors() returns a valid object, and that
migrateThreadGroupFuncPtr is non-null; add checks to fetch Client* client =
Client::getCurrent() once, verify client is non-null, get const
CoroutineFunctors& coro = client->coroutineFunctors() (or a pointer/reference
that can be null-checked), ensure coro.migrateThreadGroupFuncPtr is not null
before calling it, and if any check fails log an error and return or abort the
migration path (preserving the invariant(Client::getCurrent() == client)
behavior when migration succeeds).
| // | ||
| // EloqDoc: The first 2 bits are 0, the next 30 bits are the database identifier, | ||
| // the next 8 bits are thread_group_id, the last 24 bits are random. | ||
| uint32_t threadGroupId = LocalThread::ID(); | ||
| dassert(threadGroupId <= 0xFF); | ||
| uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu; | ||
| uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24); | ||
| id = cursorIdFromParts(_collectionCacheRuntimeId, myPart); |
There was a problem hiding this comment.
Verify thread group ID constraint in production builds.
Line 706 uses dassert to check that threadGroupId <= 0xFF, which only executes in debug builds. In release builds, if LocalThread::ID() returns a value exceeding 255, line 708 will silently truncate it via the & 0xFFu mask. This could result in multiple thread groups mapping to the same 8-bit identifier, breaking cursor migration logic when a getMore arrives.
🔎 Proposed fix: Replace dassert with fassert or invariant
uint32_t threadGroupId = LocalThread::ID();
- dassert(threadGroupId <= 0xFF);
+ fassert(50900, threadGroupId <= 0xFF); // Use an appropriate error code
uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu;
uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // | |
| // EloqDoc: The first 2 bits are 0, the next 30 bits are the database identifier, | |
| // the next 8 bits are thread_group_id, the last 24 bits are random. | |
| uint32_t threadGroupId = LocalThread::ID(); | |
| dassert(threadGroupId <= 0xFF); | |
| uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu; | |
| uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24); | |
| id = cursorIdFromParts(_collectionCacheRuntimeId, myPart); | |
| // | |
| // EloqDoc: The first 2 bits are 0, the next 30 bits are the database identifier, | |
| // the next 8 bits are thread_group_id, the last 24 bits are random. | |
| uint32_t threadGroupId = LocalThread::ID(); | |
| fassert(50900, threadGroupId <= 0xFF); // Use an appropriate error code | |
| uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu; | |
| uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24); | |
| id = cursorIdFromParts(_collectionCacheRuntimeId, myPart); |
🤖 Prompt for AI Agents
In @src/mongo/db/cursor_manager.cpp around lines 702 - 709, The dassert on
threadGroupId is only active in debug builds and must be enforced in production:
replace the dassert(threadGroupId <= 0xFF) with a hard runtime check that fails
loudly (use fassert/invariant/uassert style used in this codebase) if
LocalThread::ID() > 0xFF, providing a clear error message; keep the subsequent
masking and id = cursorIdFromParts(_collectionCacheRuntimeId, myPart) logic
unchanged so the function behavior is consistent when the check passes.
| int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) { | ||
| // Extract bits 32-39 from the cursor id. | ||
| dassert(!isGloballyManagedCursor(id)); | ||
| uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu); | ||
| return static_cast<int16_t>((part >> 24) & 0xFFu); | ||
| } |
There was a problem hiding this comment.
Correct the comment and strengthen the precondition check.
Two issues:
-
Incorrect comment (Line 719): The comment states "Extract bits 32-39 from the cursor id" but the implementation extracts bits 24–31 of the lower 32 bits (i.e., bits 24–31 of the full 64-bit CursorId, not bits 32–39).
-
Weak precondition (Line 720): The
dassertonly checks in debug builds. In production, calling this function with a globally managed cursor will produce a meaningless thread group ID without any warning. This should be aninvariantoruassertto catch misuse in all builds.
🔎 Proposed fixes
int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) {
- // Extract bits 32-39 from the cursor id.
- dassert(!isGloballyManagedCursor(id));
+ // Extract bits 24-31 from the cursor id.
+ invariant(!isGloballyManagedCursor(id));
uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu);
return static_cast<int16_t>((part >> 24) & 0xFFu);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) { | |
| // Extract bits 32-39 from the cursor id. | |
| dassert(!isGloballyManagedCursor(id)); | |
| uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu); | |
| return static_cast<int16_t>((part >> 24) & 0xFFu); | |
| } | |
| int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) { | |
| // Extract bits 24-31 from the cursor id. | |
| invariant(!isGloballyManagedCursor(id)); | |
| uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu); | |
| return static_cast<int16_t>((part >> 24) & 0xFFu); | |
| } |
🤖 Prompt for AI Agents
In @src/mongo/db/cursor_manager.cpp around lines 718 - 723, Update
CursorManager::threadGroupIdFromCursorId: correct the comment to say it extracts
bits 24–31 of the full 64-bit CursorId (i.e., the high byte of the lower 32
bits), and strengthen the precondition by replacing the debug-only
dassert(!isGloballyManagedCursor(id)) with a runtime check (invariant or
uassert) so calling with a globally managed cursor fails in all builds and
avoids returning a meaningless thread group id.
findCmd might output many records, and MongoDB will return them batch-by-batch. The findCmd and subsequent getMoreCmd might come from different connections. The CursorManager object is a member of Collection, and EloqDoc has changed Collection to thread-local; therefore, subsequent getMoreCmd should be migrated to the original threadgroup who create the cursor.
The original CursorId is encoded as:
2-bit flag | 30-bit collection id | 32-bit random.Separate the last 32-bit into
8-bit ThreadGroupId | 24-bit random.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.