[v1.x] Fix registerToolTask's getTask and getTaskResult handlers not being invoked#1335
[v1.x] Fix registerToolTask's getTask and getTaskResult handlers not being invoked#1335LucaButBoring wants to merge 9 commits intomodelcontextprotocol:v1.xfrom
Conversation
🦋 Changeset detectedLatest commit: fcf3d49 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
|
@claude review |
| private _registeredTools: { [name: string]: RegisteredTool } = {}; | ||
| private _registeredPrompts: { [name: string]: RegisteredPrompt } = {}; | ||
| private _experimental?: { tasks: ExperimentalMcpServerTasks }; | ||
| private _taskToolMap: Map<string, string> = new Map(); |
There was a problem hiding this comment.
🟡 _taskToolMap entries (taskId → toolName) are added at line 248 but never removed — there is no .delete() or .clear() call anywhere. For long-running servers processing many tasks, this map grows unboundedly even after tasks reach terminal states or expire via TTL. Consider adding cleanup when a task completes/fails/is cancelled, or lazily when _getTaskHandler finds a task no longer exists in the store.
Extended reasoning...
What the bug is
The _taskToolMap field (declared at line 85 as Map<string, string>) stores a mapping from taskId to the tool name that created it. Entries are added at line 248 via this._taskToolMap.set(taskResult.task.taskId, request.params.name) whenever a task-augmented tool call returns a CreateTaskResult. However, there is no corresponding .delete() call anywhere in the codebase — entries persist for the lifetime of the McpServer instance.
How it manifests
Every task creation adds a string → string entry to the map. When a task completes, fails, is cancelled, or expires via its TTL and gets cleaned up from the TaskStore, the corresponding _taskToolMap entry remains. Over time, for a server that processes many tasks, this map grows monotonically.
Step-by-step proof
- A client calls
tools/callwithtask: { ttl: 60000 }for a registered tool task. - The
CallToolRequestSchemahandler at line 243-249 executes:this._taskToolMap.set(taskResult.task.taskId, request.params.name). - The task completes —
TaskStore.storeTaskResult()is called, the task enters a terminal state. - The task's TTL expires and
InMemoryTaskStorecleans it up internally. - The
_taskToolMapstill holds thetaskId → toolNameentry. There is no code path that removes it. - Repeat steps 1-5 thousands of times — the map now holds thousands of stale entries.
Why existing code doesn't prevent it
Searching for all references to _taskToolMap reveals exactly three: the declaration (line 85), a .get() call (line 111), and the .set() call (line 248). No .delete(), .clear(), or any other cleanup mechanism exists.
Impact
Each entry is two short strings (taskId + toolName), so individual entries are small. For typical short-lived MCP server instances or low task throughput, this is unlikely to cause issues. However, for long-running servers processing a high volume of tasks (e.g., a persistent production server), memory usage will grow linearly and unboundedly over time.
Suggested fix
The simplest approach would be to add a lazy cleanup in _getTaskHandler: if the taskId is found in _taskToolMap but the task no longer exists in the TaskStore, delete the entry. Alternatively, cleanup could be added when a task reaches a terminal state (in the taskHandlerHooks or after storeTaskResult calls).
There was a problem hiding this comment.
The cleanup added in the getTaskResult hook (using finally { this._taskToolMap.delete(taskId) }) addresses the happy path, but leaves entries in _taskToolMap for tasks that are cancelled or fail and where the client never calls tasks/result.
The getTaskResult hook is only invoked from the GetTaskPayloadRequestSchema handler (the tasks/result request). If a client cancels a task via tasks/cancel and moves on without calling tasks/result, the taskId → toolName entry remains in the map permanently. Same for tasks that expire via TTL without tasks/result being called.
The original suggestion of lazy cleanup in _getTaskHandler would cover these cases — when _getTaskHandler is called for a taskId that no longer exists in the store, it could remove the stale entry. Alternatively, cleaning up in the tasks/cancel handler path would handle the cancellation case specifically.
There was a problem hiding this comment.
The finally-based deletion issue from the previous follow-up is now fixed -- the delete is on the success path only, so a retry after a transient error in getTaskResult will correctly find the handler again.
However, the cancel-path leak is still present. The code comment says "Cleanup on tasks/cancel would require a protocol-level hook and is intentionally left out here", but the original comment suggested lazy cleanup in _getTaskHandler, which requires no protocol hook:
private _getTaskHandler(taskId: string): ToolTaskHandler<ZodRawShapeCompat | undefined> | null {
const toolName = this._taskToolMap.get(taskId);
if (\!toolName) return null;
const tool = this._registeredTools[toolName];
if (\!tool || \!("createTask" in (tool.handler as AnyToolHandler<ZodRawShapeCompat>))) {
this._taskToolMap.delete(taskId); // lazy-evict stale entry
return null;
}
return tool.handler as ToolTaskHandler<ZodRawShapeCompat | undefined>;
}
This covers the typical cancel-then-poll-to-confirm pattern (the subsequent tasks/get call would trigger lazy eviction). Truly abandoned entries (cancelled with no further polling) would still linger, but those are the smaller concern.
Previously, the code only called the underlying task store, and the tests were not complex enough to validate that the handlers were being called, so they missed this.
They weren't being populated correctly, and can't be without changing the TaskStore interface to require restoring the original request when retrieving a Task.
This removes the setTimeout logic we had in tests, which was masking an issue where the getTask handlers weren't being called. The appropriate logic has been moved into the getTask handlers themselves.
- Update collect-user-info-task example to use single-arg getTask/getTaskResult signature matching the updated TaskRequestHandler type - Clean up _taskToolMap entries after getTaskResult to prevent unbounded growth
30f2a51 to
a4e24fb
Compare
|
Rebased onto v1.x and pushed a couple of fixes from the review:
Build and tests pass locally now. @LucaButBoring — when you get a chance, mind taking a look to make sure this still matches your intent? Since #1332 was superseded by #1764 on main with a different approach, want to confirm we're happy keeping this simpler fix on v1.x rather than backporting the larger refactor. |
|
@claude review |
src/server/mcp.ts
Outdated
| private _taskToolMap: Map<string, string> = new Map(); | ||
|
|
||
| constructor(serverInfo: Implementation, options?: ServerOptions) { | ||
| this.server = new Server(serverInfo, options); | ||
| const taskHandlerHooks = { | ||
| getTask: async (taskId: string, extra: RequestHandlerExtra<ServerRequest, ServerNotification>) => { | ||
| // taskStore is guaranteed to exist here because Protocol only calls hooks when taskStore is configured | ||
| const taskStore = extra.taskStore!; | ||
| const handler = this._getTaskHandler(taskId); | ||
| if (handler) { | ||
| return await handler.getTask({ ...extra, taskId, taskStore }); | ||
| } | ||
| return await taskStore.getTask(taskId); | ||
| }, | ||
| getTaskResult: async (taskId: string, extra: RequestHandlerExtra<ServerRequest, ServerNotification>) => { | ||
| const taskStore = extra.taskStore!; | ||
| const handler = this._getTaskHandler(taskId); | ||
| try { | ||
| if (handler) { | ||
| return await handler.getTaskResult({ ...extra, taskId, taskStore }); | ||
| } | ||
| return await taskStore.getTaskResult(taskId); | ||
| } finally { | ||
| // Once the result has been retrieved the task is complete; | ||
| // drop the taskId → toolName mapping to avoid unbounded growth. | ||
| this._taskToolMap.delete(taskId); | ||
| } | ||
| } | ||
| }; | ||
| this.server = new Server(serverInfo, { ...options, taskHandlerHooks }); |
There was a problem hiding this comment.
🔴 The _taskToolMap lifecycle introduced by this PR has three related gaps: (1) the finally block in getTaskResult unconditionally deletes the entry even on transient errors, so a client retry bypasses the custom handler; (2) the cancel path (tasks/cancel) never notifies McpServer to clean up _taskToolMap, causing entries to leak for the lifetime of the server; (3) handleAutomaticTaskPolling creates a task but never populates _taskToolMap, so if a client learns the task ID via TaskStatusNotification and calls tasks/get directly, the custom handler is bypassed. Fix (1) by only deleting on success; fix (2) by adding a cancelTask hook or exposing a cleanup method; fix (3) by calling this._taskToolMap.set(taskId, toolName) after wrappedHandler.createTask.
Extended reasoning...
Gap 1 — Premature deletion on error (bug_003)
In the getTaskResult hook (lines 98–111 of src/server/mcp.ts), the finally block unconditionally calls this._taskToolMap.delete(taskId) regardless of whether the handler succeeded or threw. The developer comment reads "Once the result has been retrieved the task is complete", which implies the intent was to clean up only on success.
Step-by-step proof:
- Client calls
tasks/resultfor a completed task. _getTaskHandler(taskId)finds the registered custom handler in_taskToolMap.handler.getTaskResult({ ...extra, taskId, taskStore })throws a transient error (e.g., task store temporarily unavailable).- The
finallyblock runs and deletes the_taskToolMapentry. - The error propagates to the client.
- Client retries
tasks/result. _getTaskHandler(taskId)returnsnull— the entry was already deleted.- Code falls back to
taskStore.getTaskResult(taskId)directly, silently bypassing the custom handler.
The fix is straightforward: use a success flag (or move the delete call to after the return) so cleanup only happens when the result is successfully retrieved.
Gap 2 — Cancel path never cleans up (bug_001)
The _taskToolMap is only cleaned up inside the getTaskResult hook. When a client calls tasks/cancel, the CancelTaskRequestSchema handler in protocol.ts (lines ~533–574) calls this._clearTaskQueue(request.params.taskId) on the Protocol level, but McpServer._taskToolMap is a private field with no corresponding hook or callback. Since cancelled tasks are terminal and clients have no reason to call tasks/result after cancellation, the _taskToolMap entry for that task persists for the lifetime of the McpServer instance.
Step-by-step proof:
- Tool registered with
registerToolTask,taskSupport: "required". - Client calls
tools/callwith task augmentation —_taskToolMap.set(taskId, toolName)is called at line ~252. - Server creates the task and returns
CreateTaskResult. - Client calls
tasks/cancel— Protocol handler runs, calls_clearTaskQueue, butMcpServer._taskToolMapis not touched. - Client does NOT call
tasks/result(task is terminal). - The entry
taskId → toolNamepersists indefinitely.
Individual entries are small (two short strings), but on long-running servers handling many task cancellations, the map grows without bound. The fix requires either adding a cancelTask hook to taskHandlerHooks, or exposing a cleanup method that McpServer can wire up to the cancel path.
Gap 3 — Auto-polling path never populates the map (bug_004)
In handleAutomaticTaskPolling (lines ~401–437), a task is created via wrappedHandler.createTask(taskExtra) but this._taskToolMap.set(taskId, toolName) is never called. Compare with the isTaskRequest path (lines ~249–258) which does populate the map.
Step-by-step proof:
- Tool registered with
taskSupport: "optional". - Client calls
tools/callwithout task augmentation —handleAutomaticTaskPollingis invoked. wrappedHandler.createTask(taskExtra)runs; the task is stored intaskStore.requestTaskStore.storeTaskResultis called internally, which sends aTaskStatusNotification(seeprotocol.tslines ~1638–1641) containing thetaskId.- A client with an active SSE stream receives this notification and learns the
taskId. - Client calls
tasks/getfor thattaskId. - Protocol dispatches to the
getTaskhook;_getTaskHandler(taskId)returnsnull(no entry in_taskToolMap). - Hook falls back to
taskStore.getTask(taskId)directly, bypassing the registered custom handler.
Addressing the refutation: The refuter argues this scenario is implausible because clients using the optional-without-augmentation path are just waiting for a synchronous tools/call response and would not make parallel tasks/get calls. This is a fair observation for the common case. However, the server does emit TaskStatusNotification messages containing the taskId, and a well-behaved client with an open SSE stream can observe them. More importantly, the inconsistency is a design-level API contract violation: registerToolTask is supposed to wire up custom handlers for all task lifecycle requests, but for the optional-without-augmentation path, external calls to tasks/get silently bypass those handlers. The fix is simply to add this._taskToolMap.set(taskId, request.params.name) after wrappedHandler.createTask completes, matching the isTaskRequest path.
There was a problem hiding this comment.
Gap 1 (premature deletion on error) is fixed in the latest commit (d044ace): the _taskToolMap.delete(taskId) at line 108 now executes only after result is successfully assigned, so a transient error causes the exception to propagate before the delete runs, preserving the entry for client retries.
However, Gaps 2 and 3 remain unaddressed:
Gap 2 (cancel path leaks entries): The code comment at lines 106–107 explicitly acknowledges this — "Cleanup on tasks/cancel would require a protocol-level hook and is intentionally left out here" — but leaves it unresolved. Any task that is cancelled without the client calling tasks/result afterwards will keep its _taskToolMap entry indefinitely.
Gap 3 (auto-polling path never populates the map): handleAutomaticTaskPolling (lines 419–438) calls wrappedHandler.createTask(taskExtra) and retrieves a taskId, but never calls this._taskToolMap.set(taskId, request.params.name). The internal polling loop calls wrappedHandler.getTask/wrappedHandler.getTaskResult directly and works correctly. However, if a client observes the taskId via a TaskStatusNotification on an open SSE stream and then independently issues a tasks/get request, _getTaskHandler(taskId) returns null (no map entry) and falls back to taskStore.getTask(taskId), silently bypassing the registered custom handler. The fix is a single line: add this._taskToolMap.set(taskId, request.params.name) after wrappedHandler.createTask(taskExtra) completes, mirroring what the isTaskRequest path does at line 252.
|
@claude review |
| // taskStore is guaranteed to exist here because Protocol only calls hooks when taskStore is configured | ||
| const taskStore = extra.taskStore!; | ||
| const handler = this._getTaskHandler(taskId); | ||
| if (handler) { | ||
| return await handler.getTask({ ...extra, taskId, taskStore }); | ||
| } | ||
| return await taskStore.getTask(taskId); | ||
| }, | ||
| getTaskResult: async (taskId: string, extra: RequestHandlerExtra<ServerRequest, ServerNotification>) => { | ||
| const taskStore = extra.taskStore!; | ||
| const handler = this._getTaskHandler(taskId); | ||
| const result = handler | ||
| ? await handler.getTaskResult({ ...extra, taskId, taskStore }) | ||
| : await taskStore.getTaskResult(taskId); | ||
| // Once the result has been retrieved the task is complete; | ||
| // drop the taskId → toolName mapping to avoid unbounded growth. | ||
| // Cleanup on tasks/cancel would require a protocol-level hook and is | ||
| // intentionally left out here. | ||
| this._taskToolMap.delete(taskId); | ||
| return result; | ||
| } | ||
| }; | ||
| this.server = new Server(serverInfo, { ...options, taskHandlerHooks }); | ||
| } | ||
|
|
||
| private _getTaskHandler(taskId: string): ToolTaskHandler<ZodRawShapeCompat | undefined> | null { | ||
| const toolName = this._taskToolMap.get(taskId); |
There was a problem hiding this comment.
🔴 The McpServer constructor unconditionally overrides any taskHandlerHooks passed in options by spreading its own internal hooks last. Since this PR adds taskHandlerHooks to ProtocolOptions as a public documented field, developers who pass custom getTask/getTaskResult hooks to McpServer will have them silently discarded with no error or warning. Fix by merging user-provided hooks with the internal ones, removing taskHandlerHooks from the McpServer-facing type, or documenting that McpServer exclusively owns these hooks.
Extended reasoning...
What the bug is and how it manifests
This PR adds taskHandlerHooks to ProtocolOptions as a public, documented optional field (src/shared/protocol.ts:105-116). Since ServerOptions is ProtocolOptions & { capabilities? }, taskHandlerHooks is now a visible, typed parameter of McpServer's constructor. However, the constructor always overrides whatever the caller passes via: this.server = new Server(serverInfo, { ...options, taskHandlerHooks }). Because the internal taskHandlerHooks object is placed after ...options, any user-supplied hooks in options are silently overwritten.
The specific code path that triggers it
- A developer reads the ProtocolOptions JSDoc and sees taskHandlerHooks as an available option.
- They pass
{ taskHandlerHooks: { getTask: myCustomGetTask } }to new McpServer. - The constructor creates its own internal
const taskHandlerHooks = { getTask: internalHook, getTaskResult: internalHook }. - It calls
new Server(serverInfo, { ...options, taskHandlerHooks })- the spread places options.taskHandlerHooks first, then the internal object overwrites it. - When a client sends tasks/get, Protocol invokes the internal hook, which calls _getTaskHandler(taskId). For tasks not in _taskToolMap it falls back to taskStore.getTask. The user-supplied myCustomGetTask is never called.
Why existing safeguards do not catch it
TypeScript does not warn when a spread property is overridden by a later named property - { ...options, taskHandlerHooks } is valid TypeScript regardless of whether options.taskHandlerHooks is set. There is no runtime assertion, no deprecation notice, and no documentation stating that McpServer exclusively manages these hooks. The type system actively misleads developers by presenting the field as a supported configuration option.
Impact
Any developer who attempts to customize task-retrieval behavior through taskHandlerHooks when using McpServer gets silently incorrect behavior. This is hard to debug because: (1) no error is thrown, (2) the fallback to taskStore.getTask makes the code appear to work for the common case masking that all custom logic is bypassed, and (3) TypeScript types give no indication the option is ignored.
How to fix it
Three viable approaches: (a) Merge hooks - check for user-provided hooks inside the internal hook functions and call them as well; (b) Restrict the type - use Omit<ServerOptions, 'taskHandlerHooks'> for the McpServer constructor parameter so the field is not presented as valid; (c) Document the limitation - add a JSDoc warning that McpServer owns taskHandlerHooks exclusively and users needing custom hooks must access mcpServer.server directly.
Step-by-step proof
- Developer writes:
const server = new McpServer(info, { taskStore, taskHandlerHooks: { getTask: myGetTask } }) - Constructor runs:
const taskHandlerHooks = { getTask: internalHook, getTaskResult: internalHook }- a new object ignoring the user-provided one. new Server(serverInfo, { taskStore, taskHandlerHooks: internalHook_version })- the spread overwrote the user's version.- Client sends
{ method: 'tasks/get', params: { taskId: 'xyz' } }. - Protocol dispatches to GetTaskRequestSchema handler and reads _options?.taskHandlerHooks?.getTask - this is the internal hook.
- Internal hook calls _getTaskHandler('xyz') which returns null (no _taskToolMap entry) and falls back to taskStore.getTask('xyz').
- myGetTask is never called, no error is thrown, and the developer has no way to know.
There was a problem hiding this comment.
The attempted fix changes { ...options, taskHandlerHooks } to taskHandlerHooks: { ...options?.taskHandlerHooks, ...taskHandlerHooks }, but the core issue remains: internal hooks are still spread last, so they still win over any user-provided hooks.
For a user passing { taskHandlerHooks: { getTask: myGetTask } }:
...options?.taskHandlerHooksspreads{ getTask: myGetTask }...taskHandlerHooksspreads{ getTask: internalHook, getTaskResult: internalHook }- Result:
{ getTask: internalHook, getTaskResult: internalHook }—myGetTaskis still silently discarded
The only difference from the original bug is that the override now happens at the individual property level rather than at the object level, but the behavior is identical: user-supplied hooks are still never called.
To actually fix this, pick one of the three original approaches:
- (a) Call user hooks inside the internal hook functions: e.g., check for
options?.taskHandlerHooks?.getTaskinside the internalgetTaskclosure and call it (either first or as a fallback) - (b) Restrict the type: change the
McpServerconstructor to acceptOmit<ServerOptions, "taskHandlerHooks">so the field cannot be passed at all - (c) Document: add a JSDoc note on the
McpServerconstructor stating thattaskHandlerHooksin options is silently ignored and thatMcpServerexclusively owns those hooks
|
@claude review |
| test('should call custom getTask and getTaskResult handlers when client polls task directly', async () => { | ||
| vi.useFakeTimers(); | ||
| const taskStore = new InMemoryTaskStore(); | ||
|
|
||
| const getTaskSpy = vi.fn(); | ||
| const getTaskResultSpy = vi.fn(); | ||
| let taskCreatedAt: number; | ||
|
|
||
| const mcpServer = new McpServer( | ||
| { | ||
| name: 'test server', | ||
| version: '1.0' | ||
| }, | ||
| { | ||
| capabilities: { | ||
| tools: {}, | ||
| tasks: { | ||
| requests: { | ||
| tools: { | ||
| call: {} | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| taskStore | ||
| } | ||
| ); |
There was a problem hiding this comment.
🔴 The new test calls vi.useFakeTimers() at line 6875 but only calls vi.useRealTimers() at line 6980 in the happy path, with no try/finally guard. If any assertion between those lines throws, fake timers remain active for all subsequent tests in the file including the entire Zod v4 iteration of describe.each(zodTestMatrix). Fix by wrapping the test body in try/finally calling vi.useRealTimers(), or by adding afterEach(() => vi.useRealTimers()) to the enclosing describe block.
Extended reasoning...
What the bug is and how it manifests
The test added at line 6874 begins with vi.useFakeTimers() at line 6875 and ends with vi.useRealTimers() at line 6980. There is no try/finally block. If any assertion throws before line 6980 — e.g. expect(getResult.status).toBe('completed') at line 6972 or the spy assertions at lines 6973-6978 — the thrown error unwinds the call stack and vi.useRealTimers() is never reached. Fake timers remain silently active for the rest of the suite.
The specific code path that triggers it
The test calls vi.useFakeTimers(), sets up an MCP server/client pair over InMemoryTransport, registers a task tool whose getTask handler checks Date.now() - taskCreatedAt >= 50 to decide whether to complete the task, advances fake time with vi.advanceTimersByTime(60), then issues tasks/get and tasks/result requests. If the custom handler logic, schema parsing, or any spy assertion fails, the test throws before vi.useRealTimers().
Why existing code does not prevent it
There is no afterEach(() => vi.useRealTimers()) anywhere in the enclosing describe.each(zodTestMatrix) block or its parents. The only afterEach in the entire file is at line 293, scoped to describe('tool()'), and only calls vi.restoreAllMocks() which does not restore fake timers. The vitest config has no global fakeTimers restoration. All four verifiers independently confirmed zero matches for afterEach.*useRealTimers in the file.
What the impact would be
Since the outer suite is describe.each(zodTestMatrix) which runs the block twice (Zod v3 then Zod v4), a test failure in the v3 iteration leaves fake timers active for every test in the v4 iteration. Any test exercising setTimeout — including InMemoryTaskStore TTL cleanup, Protocol request timeout machinery, and the pollInterval-based loop in handleAutomaticTaskPolling — will silently hang, stall, or produce wrong timing results, causing hard-to-diagnose cascading failures.
How to fix it
Wrap the test body in try/finally so cleanup always runs even on assertion failure:
test('should call custom getTask...', async () => {
vi.useFakeTimers();
try {
// ... test body ...
} finally {
vi.useRealTimers();
taskStore.cleanup();
}
});
Alternatively, add afterEach(() => vi.useRealTimers()) to the enclosing describe block.
Step-by-step proof
- Zod v3 iteration starts; test reaches vi.useFakeTimers() at line 6875.
- vi.advanceTimersByTime(60) is called; Date.now() returns T+60ms.
- client.request tasks/get is awaited. Suppose the getTask handler returns status 'working' due to a timing edge case.
- expect(getResult.status).toBe('completed') at line 6972 throws AssertionError.
- The test unwinds; vi.useRealTimers() at line 6980 is never called.
- Vitest marks the test failed and continues with fake timers still active.
- Zod v4 iteration begins with fake timers in effect; setTimeout-dependent tests hang or produce wrong results.
|
|
||
| test('should automatically poll and return CallToolResult for tool with taskSupport "optional" called without task augmentation', async () => { | ||
| const taskStore = new InMemoryTaskStore(); | ||
| const { releaseLatch, waitForLatch } = createLatch(); | ||
|
|
||
| // Spies to verify handler invocations | ||
| const createTaskSpy = vi.fn(); | ||
| const getTaskSpy = vi.fn(); | ||
| const getTaskResultSpy = vi.fn(); | ||
|
|
||
| const mcpServer = new McpServer( | ||
| { |
There was a problem hiding this comment.
🟡 The test should automatically poll and return CallToolResult for tool with taskSupport "optional" called without task augmentation incurs a real 100ms wall-clock delay because handleAutomaticTaskPolling calls await new Promise(resolve => setTimeout(resolve, pollInterval)) with pollInterval: 100 and no fake timers are active. CLAUDE.md line 25 requires vi.useFakeTimers() instead of real setTimeout delays in tests. Fix by either activating fake timers with vi.advanceTimersByTimeAsync(100) or using pollInterval: 0 in createTask for this test.
Extended reasoning...
What the bug is and how it manifests
The test at test/server/mcp.test.ts (around line 6408) does not call vi.useFakeTimers(). The tool is registered with createTask calling extra.taskStore.createTask({ ttl: 60000, pollInterval: 100 }). When the client calls callTool without task augmentation, McpServer.handleAutomaticTaskPolling is invoked server-side, and the polling loop executes a real 100ms sleep.
The specific code path that triggers it
Inside handleAutomaticTaskPolling (src/server/mcp.ts), the polling loop runs:
while (task.status !== 'completed' ...) {
await new Promise(resolve => setTimeout(resolve, pollInterval)); // real 100ms wait
const getTaskResult = await wrappedHandler.getTask(taskExtraComplete);
...
}With pollInterval = 100 from createTask({ pollInterval: 100 }), this setTimeout fires after a real 100ms wall-clock delay since no fake timers are active.
Why existing code does not prevent it
The PR correctly refactored task completion to happen inside the getTask handler instead of an external setTimeout. This eliminates the old race condition, but the production polling loop's own sleep was never addressed. Because getTask completes the task on its first call, the polling loop runs exactly once — but that one iteration still sleeps 100ms for real.
What the impact would be
Each test run adds 100ms of real wall-clock time. This violates CLAUDE.md line 25: 'Use vi.useFakeTimers() instead of real setTimeout/await delays in tests'. On a loaded CI machine, accumulated real-timer tests degrade suite performance. There is no correctness issue — the test passes — but it is a test quality violation.
How to fix it
The simplest fix is to pass pollInterval: 0 in the createTask call for this test, eliminating the sleep entirely. Alternatively, call vi.useFakeTimers() before the callTool invocation and advance time with vi.advanceTimersByTimeAsync(100) concurrently to let the polling loop proceed without a real wait.
Step-by-step proof
client.callTool('task-tool', { value: 21 })dispatches tools/call with no task param.- McpServer routes to
handleAutomaticTaskPollingbecausetaskSupport === 'optional'and no task augmentation. wrappedHandler.createTask(taskExtra)runs;createTaskreturns a task withpollInterval: 100.- The polling loop starts:
await new Promise(resolve => setTimeout(resolve, 100))— real 100ms wall-clock delay begins. - After 100ms elapses,
wrappedHandler.getTask(taskExtraComplete)is called; the handler seestask.status === 'working', stores the result, returns a completed task. - The loop exits;
wrappedHandler.getTaskResultreturns the result. - Total real wall-clock time added to the test suite: 100ms per invocation.
| while (task.status !== 'completed' && task.status !== 'failed' && task.status !== 'cancelled') { | ||
| await new Promise(resolve => setTimeout(resolve, pollInterval)); | ||
| const updatedTask = await extra.taskStore.getTask(taskId); | ||
| const getTaskResult = await wrappedHandler.getTask(taskExtraComplete); | ||
| const updatedTask = getTaskResult; | ||
| if (!updatedTask) { | ||
| throw new McpError(ErrorCode.InternalError, `Task ${taskId} not found during polling`); | ||
| } | ||
| task = updatedTask; | ||
| } |
There was a problem hiding this comment.
🟣 This is a pre-existing issue: the polling loop in handleAutomaticTaskPolling (src/server/mcp.ts:427-435) uses await new Promise(resolve => setTimeout(resolve, pollInterval)) without checking extra.signal. If a client disconnects or sends notifications/cancelled mid-poll, the server continues polling until the task naturally reaches a terminal state, wasting resources and holding the handler slot open. This PR directly refactors this function and would be an ideal place to add abort signal support.
Extended reasoning...
What the bug is and how it manifests
The polling loop in handleAutomaticTaskPolling uses a bare setTimeout without any signal check:
while (task.status !== 'completed' && task.status !== 'failed' && task.status !== 'cancelled') {
await new Promise(resolve => setTimeout(resolve, pollInterval));
const getTaskResult = await wrappedHandler.getTask(taskExtraComplete);
...
}When extra.signal fires (client disconnect or notifications/cancelled), the server never observes the abort. The full pollInterval (default 5 seconds per iteration) elapses before each loop check, and since task.status is only updated by wrappedHandler.getTask, the loop runs until the underlying task naturally terminates. For long-running tasks (minutes or hours), this can hold a handler slot and consume server resources long after the originating client has gone away.
The specific code path that triggers it
- Tool registered with
taskSupport: optional - Client calls
tools/callwithout task augmentation →handleAutomaticTaskPollingis invoked wrappedHandler.createTask(taskExtra)creates the task and enters thewhileloop- Client disconnects or sends
notifications/cancelled, which firesextra.signal.abort await new Promise(resolve => setTimeout(resolve, pollInterval))does not observe the signal; it runs for the full poll intervalwrappedHandler.getTask(taskExtraComplete)is called, loop continues indefinitely
Why existing code does not prevent it
The extra.signal abort signal is present in RequestHandlerExtra (protocol.ts:255) and is wired up by Protocol._onrequest via an AbortController that is aborted when notifications/cancelled arrives. However, handleAutomaticTaskPolling never reads extra.signal. No if (extra.signal.aborted) guard exists, and the setTimeout is not raced against the signal.
Pre-existing nature and why it matters here
This bug predates the PR — the old code also had await new Promise(resolve => setTimeout(resolve, pollInterval)) without a signal check. The PR changed the inner loop body (extra.taskStore.getTask → wrappedHandler.getTask and added taskExtraComplete), but did not introduce the missing signal check. All verifiers confirmed this is pre-existing.
However, since the PR directly modifies handleAutomaticTaskPolling, it is the natural moment to add the fix. The _waitForTaskUpdate helper in protocol.ts already demonstrates the correct pattern — racing the setTimeout against the abort signal.
How to fix it
Replace the bare setTimeout with a signal-aware wait, matching the _waitForTaskUpdate pattern:
// Option A: check at loop top
if (extra.signal.aborted) throw extra.signal.reason;
await new Promise(resolve => setTimeout(resolve, pollInterval));
// Option B: race against signal (cleaner, immediate cancellation)
await Promise.race([
new Promise(r => setTimeout(r, pollInterval)),
new Promise((_, reject) =>
extra.signal.addEventListener('abort', () => reject(extra.signal.reason), { once: true })
)
]);Step-by-step proof
- Register a tool with
taskSupport: 'optional'and a 60-second task TTL,pollInterval: 5000. - Client calls
tools/callwithout task augmentation. - Server enters
handleAutomaticTaskPolling,createTaskreturns, task is'working'. - Client disconnects;
extra.signal.abort()fires immediately. - Server is stuck at
await new Promise(resolve => setTimeout(resolve, 5000))— signal ignored. - After 5 seconds,
wrappedHandler.getTaskis called; task is still'working'. - Loop repeats indefinitely (up to 60 seconds) until task reaches a terminal state.
- Server resources held for up to 60 seconds after client has gone.
Note: This is the v1 backport of #1332.
This PR fixes a bug where custom
getTaskandgetTaskResulthandlers registered viaregisterToolTaskwere never invoked. TheProtocolclass's task handlers bypassed them entirely and usedTaskStoredirectly. This was a refactoring oversight that was missed due to (1) the existing tests not explicitly checking if those handlers were called, and (2)setTimeoutbeing used increateTaskin many tests inadvertently masking the issue.This also removes the argument-forwarding to
getTaskandgetTaskResult, as that was originally built before the currentTaskStoredesign was finalized, which broke the assumption that the original request would reliably be stored by the implementor. The currentTaskStoredesign allows theRequestto be saved, but does notrequirethat, and also exposes no way to directly retrieve it ingetTaskorgetTaskResult(it was possible but no longer intended at the time of the rewrite).getTaskandgetTaskResultnow only have theextraargument.Motivation and Context
When using
registerToolTask, developers could provide customgetTaskandgetTaskResulthandlers:These handlers were never invoked because:
Protocolclass'stasks/getandtasks/resulthandlers directly calledTaskStoreinstead of forwarding to the custom handlers.McpServer's backwards-compat polling wrapper also bypassed the custom handlerssetTimeoutto complete tasks and did not explicitly assert on the handlers being called, inadvertently masking the issue since tasks completed regardless of whether handlers were invokedHow Has This Been Tested?
Updated unit tests with stricter/more robust assertions.
Breaking Changes
Yes, due to
argsno longer being passed togetTaskorgetTaskResult. We could defer this part of the PR to v2.Types of changes
Checklist
Additional context