Skip to content

getMoreCmd: migrate to original thread group#411

Merged
xiexiaoy merged 1 commit into
eloqdata:mainfrom
xiexiaoy:getMoreCmd
Jan 7, 2026
Merged

getMoreCmd: migrate to original thread group#411
xiexiaoy merged 1 commit into
eloqdata:mainfrom
xiexiaoy:getMoreCmd

Conversation

@xiexiaoy

@xiexiaoy xiexiaoy commented Jan 7, 2026

Copy link
Copy Markdown
Collaborator

findCmd might output many records, and MongoDB will return them batch-by-batch. The findCmd and subsequent getMoreCmd might come from different connections. The CursorManager object is a member of Collection, and EloqDoc has changed Collection to thread-local; therefore, subsequent getMoreCmd should be migrated to the original threadgroup who create the cursor.

The original CursorId is encoded as: 2-bit flag | 30-bit collection id | 32-bit random.
Separate the last 32-bit into 8-bit ThreadGroupId | 24-bit random.

Summary by CodeRabbit

  • Refactor
    • Improved cursor handling so cursors migrate correctly between thread groups and cursor identifiers now carry thread-group info for more reliable multi-threaded behavior.
  • Bug Fix
    • Added startup validation to catch invalid reserved-thread configurations early, preventing misconfiguration-related runtime issues.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Jan 7, 2026

Copy link
Copy Markdown

Walkthrough

Non-global CursorIds now embed an 8-bit thread group ID. Cursor allocation and a new extractor expose that thread group. GetMore now migrates execution to the cursor's thread group (via coroutine migration) for non-globally managed cursors; minor error-message reformatting in getmore validation.

Changes

Cohort / File(s) Summary
Cursor ID generation & API
src/mongo/db/cursor_manager.h, src/mongo/db/cursor_manager.cpp
Non-global CursorId composition now includes an 8-bit thread_group_id in bits 24–31 and 24 random bits. Added public static helper threadGroupIdFromCursorId(CursorId) returning int16_t.
GetMore command migration logic
src/mongo/db/commands/getmore_cmd.cpp
Added coroutine-based migration to the cursor's thread group when servicing non-global cursors whose thread group differs from the current one. Includes invariant checks and logging; also reformatted some validation error messages.
Service executor validation
src/mongo/transport/service_entry_point_impl.cpp
Added runtime assertion that reservedThreadNum < 256 when coroutines are enabled and reserved thread count is non-zero.

Sequence Diagram

sequenceDiagram
    participant Client
    participant SEP as ServiceEntryPoint
    participant GetMore as GetMoreCmd
    participant CursorMgr as CursorManager
    participant ThreadGrp as ThreadGroup
    participant ClientCtx as ClientContext

    Client->>SEP: send getMore request
    SEP->>GetMore: dispatch request
    GetMore->>CursorMgr: lookup cursor by CursorId
    CursorMgr-->>GetMore: cursor (non-global) + metadata

    rect `#f0f8ff`
      note over GetMore,CursorMgr: determine target thread group
      GetMore->>CursorMgr: threadGroupIdFromCursorId(id)
      CursorMgr-->>GetMore: targetThreadGroupId

      alt targetThreadGroupId != currentThreadGroupId
        GetMore->>ThreadGrp: migrate coroutine to targetThreadGroupId
        ThreadGrp->>ClientCtx: attach client to migrated context
        ClientCtx-->>ThreadGrp: invariant: same client preserved
        ThreadGrp-->>GetMore: migration complete
      end
    end

    rect `#f0fff0`
      note over GetMore: perform getMore on cursor
      GetMore->>GetMore: resume execution, fetch next batch
      GetMore-->>Client: return results
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I tunneled bits into an ID so spry,

Eight bits to know which warren to try.
When GetMore calls, I nudge the thread—hop!
Migration done, the cursor keeps on top.
A hop, a nibble, and code that won't sigh.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the PR's primary objective: implementing thread group migration logic for getMoreCmd requests to ensure they execute in the correct thread group.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xiexiaoy xiexiaoy marked this pull request as ready for review January 7, 2026 07:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @src/mongo/db/cursor_manager.cpp:
- Around line 702-709: The dassert check on threadGroupId (from
LocalThread::ID()) is only active in debug builds, so replace the debug-only
check with a production-enforced assertion (e.g., fassert or invariant) to
prevent silent truncation; specifically, change the dassert(threadGroupId <=
0xFF) to a production check (fassertFailedWithStatusNoTrace or
invariant(threadGroupId <= 0xFF)) before computing myPart and calling
cursorIdFromParts(_collectionCacheRuntimeId, myPart) so any ID > 255 fails
loudly instead of being masked.
🧹 Nitpick comments (1)
src/mongo/db/commands/getmore_cmd.cpp (1)

218-230: Consider reducing log verbosity for production.

Lines 221–222 and 226–227 log before and after migration. While useful for debugging, this double logging on every cross-thread-group getMore may produce excessive log volume in production environments.

Suggested refinement

Consider consolidating to a single log entry or using a lower log level:

             if (threadGroupId != LocalThread::ID()) {
-                log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
-                      << request.cursorid << ". Current ThreadGroup " << LocalThread::ID();
                 Client* client = Client::getCurrent();
                 const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors();
                 (*coro.migrateThreadGroupFuncPtr)(threadGroupId);
                 log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
-                      << request.cursorid << " done.";
+                      << request.cursorid << " (from ThreadGroup " << LocalThread::ID() << ")";
                 invariant(Client::getCurrent() == client);
             }

Alternatively, use LOG(1) or LOGV2_DEBUG for detailed migration traces.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f82b030 and 7b4f33e.

📒 Files selected for processing (3)
  • src/mongo/db/commands/getmore_cmd.cpp
  • src/mongo/db/cursor_manager.cpp
  • src/mongo/db/cursor_manager.h
🧰 Additional context used
🧬 Code graph analysis (3)
src/mongo/db/cursor_manager.cpp (1)
src/mongo/db/cursor_manager.h (1)
  • isGloballyManagedCursor (223-228)
src/mongo/db/cursor_manager.h (1)
src/mongo/db/cursor_manager.cpp (3)
  • threadGroupIdFromCursorId (718-723)
  • threadGroupIdFromCursorId (718-718)
  • id (96-96)
src/mongo/db/commands/getmore_cmd.cpp (2)
src/mongo/db/cursor_manager.cpp (2)
  • threadGroupIdFromCursorId (718-723)
  • threadGroupIdFromCursorId (718-718)
src/mongo/db/cursor_manager.h (1)
  • isGloballyManagedCursor (223-228)
🔇 Additional comments (2)
src/mongo/db/cursor_manager.h (1)

251-251: LGTM!

The declaration correctly matches the implementation in cursor_manager.cpp. The method provides the necessary interface for extracting thread group information from cursor IDs to support the getMore migration logic.

src/mongo/db/commands/getmore_cmd.cpp (1)

86-87: LGTM!

The formatting adjustments to error messages improve readability without changing the observable message content.

Also applies to: 93-94, 114-115, 120-121

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @src/mongo/db/commands/getmore_cmd.cpp:
- Around line 218-230: The code dereferences
Client::getCurrent()->coroutineFunctors() and calls
(*coro.migrateThreadGroupFuncPtr) without verifying that Client::getCurrent() is
non-null, that coroutineFunctors() returns a valid object, and that
migrateThreadGroupFuncPtr is non-null; add checks to fetch Client* client =
Client::getCurrent() once, verify client is non-null, get const
CoroutineFunctors& coro = client->coroutineFunctors() (or a pointer/reference
that can be null-checked), ensure coro.migrateThreadGroupFuncPtr is not null
before calling it, and if any check fails log an error and return or abort the
migration path (preserving the invariant(Client::getCurrent() == client)
behavior when migration succeeds).

In @src/mongo/db/cursor_manager.cpp:
- Around line 718-723: Update CursorManager::threadGroupIdFromCursorId: correct
the comment to say it extracts bits 24–31 of the full 64-bit CursorId (i.e., the
high byte of the lower 32 bits), and strengthen the precondition by replacing
the debug-only dassert(!isGloballyManagedCursor(id)) with a runtime check
(invariant or uassert) so calling with a globally managed cursor fails in all
builds and avoids returning a meaningless thread group id.
- Around line 702-709: The dassert on threadGroupId is only active in debug
builds and must be enforced in production: replace the dassert(threadGroupId <=
0xFF) with a hard runtime check that fails loudly (use fassert/invariant/uassert
style used in this codebase) if LocalThread::ID() > 0xFF, providing a clear
error message; keep the subsequent masking and id =
cursorIdFromParts(_collectionCacheRuntimeId, myPart) logic unchanged so the
function behavior is consistent when the check passes.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b4f33e and fde8ee6.

📒 Files selected for processing (4)
  • src/mongo/db/commands/getmore_cmd.cpp
  • src/mongo/db/cursor_manager.cpp
  • src/mongo/db/cursor_manager.h
  • src/mongo/transport/service_entry_point_impl.cpp
🧰 Additional context used
🧬 Code graph analysis (3)
src/mongo/db/cursor_manager.h (1)
src/mongo/db/cursor_manager.cpp (3)
  • threadGroupIdFromCursorId (718-723)
  • threadGroupIdFromCursorId (718-718)
  • id (96-96)
src/mongo/db/commands/getmore_cmd.cpp (2)
src/mongo/db/cursor_manager.cpp (2)
  • threadGroupIdFromCursorId (718-723)
  • threadGroupIdFromCursorId (718-718)
src/mongo/db/cursor_manager.h (1)
  • isGloballyManagedCursor (223-228)
src/mongo/db/cursor_manager.cpp (1)
src/mongo/db/cursor_manager.h (1)
  • isGloballyManagedCursor (223-228)
🔇 Additional comments (2)
src/mongo/db/cursor_manager.h (1)

251-251: LGTM!

The function declaration is appropriate. The return type int16_t can accommodate the 8-bit thread group ID (0–255) being extracted.

src/mongo/transport/service_entry_point_impl.cpp (1)

78-80: Validation correctly enforces the 8-bit thread group limit.

The check ensures reservedThreadNum fits within the 8-bit field allocated in the cursor ID encoding, preventing the silent truncation issue that would occur if this validation were missing.

Comment on lines +218 to +230
if (!CursorManager::isGloballyManagedCursor(request.cursorid)) {
int16_t threadGroupId = CursorManager::threadGroupIdFromCursorId(request.cursorid);
if (threadGroupId != LocalThread::ID()) {
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << ". Current ThreadGroup " << LocalThread::ID();
Client* client = Client::getCurrent();
const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors();
(*coro.migrateThreadGroupFuncPtr)(threadGroupId);
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << " done.";
invariant(Client::getCurrent() == client);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate coroutine functors before dereferencing.

Lines 224–225 access Client::getCurrent()->coroutineFunctors() and dereference the function pointer without null checks. If coroutine functors are not initialized or the migration function pointer is null, this will cause a null-pointer dereference.

🔎 Proposed fix: Add validation before migration
             if (threadGroupId != LocalThread::ID()) {
                 log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
                       << request.cursorid << ". Current ThreadGroup " << LocalThread::ID();
                 Client* client = Client::getCurrent();
                 const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors();
+                uassert(ErrorCodes::OperationFailed,
+                        "Coroutine migration not available",
+                        coro.migrateThreadGroupFuncPtr != nullptr);
                 (*coro.migrateThreadGroupFuncPtr)(threadGroupId);
                 log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
                       << request.cursorid << " done.";
                 invariant(Client::getCurrent() == client);
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!CursorManager::isGloballyManagedCursor(request.cursorid)) {
int16_t threadGroupId = CursorManager::threadGroupIdFromCursorId(request.cursorid);
if (threadGroupId != LocalThread::ID()) {
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << ". Current ThreadGroup " << LocalThread::ID();
Client* client = Client::getCurrent();
const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors();
(*coro.migrateThreadGroupFuncPtr)(threadGroupId);
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << " done.";
invariant(Client::getCurrent() == client);
}
}
if (!CursorManager::isGloballyManagedCursor(request.cursorid)) {
int16_t threadGroupId = CursorManager::threadGroupIdFromCursorId(request.cursorid);
if (threadGroupId != LocalThread::ID()) {
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << ". Current ThreadGroup " << LocalThread::ID();
Client* client = Client::getCurrent();
const CoroutineFunctors& coro = Client::getCurrent()->coroutineFunctors();
uassert(ErrorCodes::OperationFailed,
"Coroutine migration not available",
coro.migrateThreadGroupFuncPtr != nullptr);
(*coro.migrateThreadGroupFuncPtr)(threadGroupId);
log() << "Migrate to ThreadGroup " << threadGroupId << " for getMore on cursor "
<< request.cursorid << " done.";
invariant(Client::getCurrent() == client);
}
}
🤖 Prompt for AI Agents
In @src/mongo/db/commands/getmore_cmd.cpp around lines 218 - 230, The code
dereferences Client::getCurrent()->coroutineFunctors() and calls
(*coro.migrateThreadGroupFuncPtr) without verifying that Client::getCurrent() is
non-null, that coroutineFunctors() returns a valid object, and that
migrateThreadGroupFuncPtr is non-null; add checks to fetch Client* client =
Client::getCurrent() once, verify client is non-null, get const
CoroutineFunctors& coro = client->coroutineFunctors() (or a pointer/reference
that can be null-checked), ensure coro.migrateThreadGroupFuncPtr is not null
before calling it, and if any check fails log an error and return or abort the
migration path (preserving the invariant(Client::getCurrent() == client)
behavior when migration succeeds).

Comment on lines +702 to 709
//
// EloqDoc: The first 2 bits are 0, the next 30 bits are the database identifier,
// the next 8 bits are thread_group_id, the last 24 bits are random.
uint32_t threadGroupId = LocalThread::ID();
dassert(threadGroupId <= 0xFF);
uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu;
uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24);
id = cursorIdFromParts(_collectionCacheRuntimeId, myPart);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Verify thread group ID constraint in production builds.

Line 706 uses dassert to check that threadGroupId <= 0xFF, which only executes in debug builds. In release builds, if LocalThread::ID() returns a value exceeding 255, line 708 will silently truncate it via the & 0xFFu mask. This could result in multiple thread groups mapping to the same 8-bit identifier, breaking cursor migration logic when a getMore arrives.

🔎 Proposed fix: Replace dassert with fassert or invariant
             uint32_t threadGroupId = LocalThread::ID();
-            dassert(threadGroupId <= 0xFF);
+            fassert(50900, threadGroupId <= 0xFF);  // Use an appropriate error code
             uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu;
             uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//
// EloqDoc: The first 2 bits are 0, the next 30 bits are the database identifier,
// the next 8 bits are thread_group_id, the last 24 bits are random.
uint32_t threadGroupId = LocalThread::ID();
dassert(threadGroupId <= 0xFF);
uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu;
uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24);
id = cursorIdFromParts(_collectionCacheRuntimeId, myPart);
//
// EloqDoc: The first 2 bits are 0, the next 30 bits are the database identifier,
// the next 8 bits are thread_group_id, the last 24 bits are random.
uint32_t threadGroupId = LocalThread::ID();
fassert(50900, threadGroupId <= 0xFF); // Use an appropriate error code
uint32_t rand24 = static_cast<uint32_t>(_random->nextInt32()) & 0x00FFFFFFu;
uint32_t myPart = rand24 | ((threadGroupId & 0xFFu) << 24);
id = cursorIdFromParts(_collectionCacheRuntimeId, myPart);
🤖 Prompt for AI Agents
In @src/mongo/db/cursor_manager.cpp around lines 702 - 709, The dassert on
threadGroupId is only active in debug builds and must be enforced in production:
replace the dassert(threadGroupId <= 0xFF) with a hard runtime check that fails
loudly (use fassert/invariant/uassert style used in this codebase) if
LocalThread::ID() > 0xFF, providing a clear error message; keep the subsequent
masking and id = cursorIdFromParts(_collectionCacheRuntimeId, myPart) logic
unchanged so the function behavior is consistent when the check passes.

Comment on lines +718 to +723
int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) {
// Extract bits 32-39 from the cursor id.
dassert(!isGloballyManagedCursor(id));
uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu);
return static_cast<int16_t>((part >> 24) & 0xFFu);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Correct the comment and strengthen the precondition check.

Two issues:

  1. Incorrect comment (Line 719): The comment states "Extract bits 32-39 from the cursor id" but the implementation extracts bits 24–31 of the lower 32 bits (i.e., bits 24–31 of the full 64-bit CursorId, not bits 32–39).

  2. Weak precondition (Line 720): The dassert only checks in debug builds. In production, calling this function with a globally managed cursor will produce a meaningless thread group ID without any warning. This should be an invariant or uassert to catch misuse in all builds.

🔎 Proposed fixes
 int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) {
-    // Extract bits 32-39 from the cursor id.
-    dassert(!isGloballyManagedCursor(id));
+    // Extract bits 24-31 from the cursor id.
+    invariant(!isGloballyManagedCursor(id));
     uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu);
     return static_cast<int16_t>((part >> 24) & 0xFFu);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) {
// Extract bits 32-39 from the cursor id.
dassert(!isGloballyManagedCursor(id));
uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu);
return static_cast<int16_t>((part >> 24) & 0xFFu);
}
int16_t CursorManager::threadGroupIdFromCursorId(CursorId id) {
// Extract bits 24-31 from the cursor id.
invariant(!isGloballyManagedCursor(id));
uint32_t part = static_cast<uint32_t>(id & 0xFFFFFFFFu);
return static_cast<int16_t>((part >> 24) & 0xFFu);
}
🤖 Prompt for AI Agents
In @src/mongo/db/cursor_manager.cpp around lines 718 - 723, Update
CursorManager::threadGroupIdFromCursorId: correct the comment to say it extracts
bits 24–31 of the full 64-bit CursorId (i.e., the high byte of the lower 32
bits), and strengthen the precondition by replacing the debug-only
dassert(!isGloballyManagedCursor(id)) with a runtime check (invariant or
uassert) so calling with a globally managed cursor fails in all builds and
avoids returning a meaningless thread group id.

@xiexiaoy xiexiaoy merged commit 84972d1 into eloqdata:main Jan 7, 2026
2 checks passed
@xiexiaoy xiexiaoy deleted the getMoreCmd branch January 7, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants