GODRIVER-3810 Update WithTransaction to raise timeout error.#2344
GODRIVER-3810 Update WithTransaction to raise timeout error.#2344qingyang-hu wants to merge 2 commits intomongodb:masterfrom
Conversation
API Change Report./v2/mongocompatible changesTimeoutError: added |
🧪 Performance ResultsCommit SHA: 48ba30aThe following benchmark tests for version 69bd85a91f143b00077118bc had statistically significant changes (i.e., |z-score| > 1.96):
For a comprehensive view of all microbenchmark results for this PR's commit, please check out the Evergreen perf task for this patch. |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new TimeoutError type to better surface WithTransaction() timeouts (with proper timeout labeling), and updates the unified test runner plumbing so spec tests can assert on the WithTransaction() error value rather than treating it as a harness failure.
Changes:
- Add
mongo.TimeoutError(wrapping an underlying cause and reportingExceededTimeLimitErrorviaHasErrorLabel). - Update
Session.WithTransaction()to returnTimeoutError{Wrapped: err}for timeout-driven retry/backoff exits. - Update unified spec runner operation execution to propagate operation results (including
Err) and unskip a previously skipped CSOT convenient-transactions spec test.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
mongo/session.go |
Wraps certain overall-timeout exit paths in TimeoutError when retry/backoff would exceed the transaction timeout. |
mongo/errors.go |
Adds the exported TimeoutError type with Unwrap() and HasErrorLabel() support. |
internal/spectest/skip.go |
Removes a skip entry for the convenient-transactions timeout surfacing test. |
internal/integration/unified/unified_spec_runner.go |
Updates caller to handle operation.execute() returning a result + error. |
internal/integration/unified/testrunner_operation.go |
Updates thread/loop execution to match new operation.execute() signature. |
internal/integration/unified/session_operation_execution.go |
Changes withTransaction operation to return the WithTransaction() error in operationResult.Err for verification. |
internal/integration/unified/operation.go |
Changes operation.execute() signature to return *operationResult alongside an execution error. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // Error implements the error interface. | ||
| func (e TimeoutError) Error() string { |
There was a problem hiding this comment.
TimeoutError.Error() calls e.Wrapped.Error() directly, which will panic if Wrapped is nil. Even if Wrapped is expected to be non-nil today, this is an exported error type and should be defensive (e.g., handle nil Wrapped by returning a generic timeout message).
| func (e TimeoutError) Error() string { | |
| func (e TimeoutError) Error() string { | |
| if e.Wrapped == nil { | |
| return "timeout error" | |
| } |
| // TimeoutError represents an error that occurred due to a timeout. | ||
| type TimeoutError struct { | ||
| Wrapped error | ||
| } | ||
|
|
||
| // Error implements the error interface. | ||
| func (e TimeoutError) Error() string { | ||
| return e.Wrapped.Error() | ||
| } | ||
|
|
||
| // Unwrap returns the underlying error. | ||
| func (e TimeoutError) Unwrap() error { | ||
| return e.Wrapped | ||
| } | ||
|
|
||
| // HasErrorLabel returns true if the error contains the specified label. | ||
| func (e TimeoutError) HasErrorLabel(label string) bool { | ||
| if label == "ExceededTimeLimitError" { | ||
| return true | ||
| } else if le := LabeledError(nil); errors.As(e.Wrapped, &le) { | ||
| return le.HasErrorLabel(label) | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Consider adding a unit test for TimeoutError to ensure it integrates with existing timeout detection and labeling (e.g., IsTimeout should return true via the ExceededTimeLimitError label, and HasErrorLabel should delegate to the wrapped error for other labels). There are already table-driven tests for IsTimeout in mongo/errors_test.go that can be extended.
| _, err = threadOp.execute(ctx, loopDone) | ||
| return err |
There was a problem hiding this comment.
runOnThread: the added closure assigns to the outer-scoped err variable (_, err = threadOp.execute(...)). Because the task runs asynchronously, this captures and mutates a variable from the parent stack frame, which is a data race and can also clobber the error used earlier in the case block. Use a new local variable inside the closure (e.g., _, execErr := ...; return execErr) to avoid capturing/mutating outer state.
| _, err = threadOp.execute(ctx, loopDone) | |
| return err | |
| _, execErr := threadOp.execute(ctx, loopDone) | |
| return execErr |
| var cbErr error | ||
| for idx, oper := range operations { | ||
| if err := oper.execute(ctx, loopDone); err != nil { | ||
| return nil, fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, err) | ||
| res, execErr := oper.execute(ctx, loopDone) | ||
| if execErr != nil { | ||
| err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr) | ||
| return nil, nil | ||
| } | ||
| if cbErr == nil && res != nil { | ||
| cbErr = res.Err | ||
| } | ||
| } |
There was a problem hiding this comment.
executeWithTransaction callback: once an operation inside the callback produces a non-nil operation error (res.Err), the callback should return that error immediately so subsequent operations are not executed. The current logic records the first res.Err in cbErr but continues executing later operations, which can change transactional behavior compared to a real application callback and can cause extra side effects or different errors.
| err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr) | ||
| return nil, nil |
There was a problem hiding this comment.
executeWithTransaction callback: on execErr, the callback currently returns (nil, nil), which causes WithTransaction to commit even though the callback did not successfully execute all operations, and relies on mutating the outer err to fail later. It would be safer to return the wrapped execution error from the callback so WithTransaction aborts/cleans up consistently and avoids committing partial work.
| err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr) | |
| return nil, nil | |
| return nil, fmt.Errorf("error executing operation %q at index %d: %w", oper.Name, idx, execErr) |
| select { | ||
| case <-timeout.C: | ||
| return nil, err | ||
| return nil, TimeoutError{Wrapped: err} | ||
| default: | ||
| } |
There was a problem hiding this comment.
WithTransaction now wraps some timeout paths with TimeoutError, but other uses of the same overall timeout timer still return the raw err (e.g., the commit retry loop). This can lead to inconsistent timeout signaling/labeling for callers. If the goal is to surface a timeout error for WithTransaction timeouts, consider wrapping all returns caused by the overall timeout timer consistently.
tadjik1
left a comment
There was a problem hiding this comment.
hi @qingyang-hu , thanks for your work, it looks good!
I left few comments in the code on places I think should be improved, please let me know what you think about it.
| if e.Wrapped == nil { | ||
| return "operation timed out" | ||
| } | ||
| return e.Wrapped.Error() |
There was a problem hiding this comment.
I would suggest add some string prefix so this error is distinguishable (similar to what we do for other errors
| return e.Wrapped.Error() | |
| return "operation timed out: " + e.Wrapped.Error() |
| return nil, fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, err) | ||
| res, execErr := oper.execute(ctx, loopDone) | ||
| if execErr != nil { | ||
| err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr) |
There was a problem hiding this comment.
Just to clarify: we capture err here because we want to get "3rd type of result" from this function, errors that are not related to WithTransaction logic, right? So we do want to get error, but we don't want driver to handle this as if it's a transaction error.
| case <-timeout.C: | ||
| sleep.Stop() | ||
| return nil, err | ||
| return nil, TimeoutError{Wrapped: err} |
There was a problem hiding this comment.
The spec says we have to distinguish between CSOT and non-CSOT errors:
Note 1: When the TIMEOUT_MS (calculated in step [1.3] is reached we MUST report a timeout error wrapping the last error that was encountered which triggered the retry behavior. If timeoutMS is set, then timeout error is a special type which is defined in CSOT , If timeoutMS is not set, then propagate it as timeout error if the language allows to expose the underlying error as a cause of a timeout error. If timeout error is thrown then it SHOULD expose error label(s) from the transient error.
https://github.com/mongodb/specifications/blob/master/source/transactions-convenient-api/transactions-convenient-api.md#sequence-of-actions
In the current approach we always return TimeoutErrors which are indistinguishable.
GODRIVER-3810
Summary
Create a
TimeoutErrortype to be used for backoff timeouts inWithTransaction().Background & Motivation