Skip to content

GODRIVER-3810 Update WithTransaction to raise timeout error.#2344

Open
qingyang-hu wants to merge 2 commits intomongodb:masterfrom
qingyang-hu:godriver3810
Open

GODRIVER-3810 Update WithTransaction to raise timeout error.#2344
qingyang-hu wants to merge 2 commits intomongodb:masterfrom
qingyang-hu:godriver3810

Conversation

@qingyang-hu
Copy link
Contributor

GODRIVER-3810

Summary

Create a TimeoutError type to be used for backoff timeouts in WithTransaction().

Background & Motivation

@mongodb-drivers-pr-bot
Copy link
Contributor

API Change Report

./v2/mongo

compatible changes

TimeoutError: added

@mongodb-drivers-pr-bot
Copy link
Contributor

mongodb-drivers-pr-bot bot commented Mar 19, 2026

🧪 Performance Results

Commit SHA: 48ba30a

The following benchmark tests for version 69bd85a91f143b00077118bc had statistically significant changes (i.e., |z-score| > 1.96):

Benchmark Measurement % Change Patch Value Stable Region H-Score Z-Score
BenchmarkBSONDeepDocumentDecoding ops_per_second_med -3.2753 16150.3925 Avg: 16697.2764
Med: 16656.1178
Stdev: 258.7851
0.7428 -2.1133
BenchmarkMultiFindMany ops_per_second_max -3.1489 4098360.6557 Avg: 4231608.6170
Med: 4237288.1356
Stdev: 56386.4810
0.7789 -2.3631
BenchmarkSmallDocInsertOne allocated_bytes_per_op -0.2312 5674.0000 Avg: 5687.1461
Med: 5688.0000
Stdev: 5.1291
0.8343 -2.5630
BenchmarkBSONDeepDocumentDecoding allocated_bytes_per_op 0.0113 15101.0000 Avg: 15099.2947
Med: 15099.0000
Stdev: 0.7127
0.7895 2.3927

For a comprehensive view of all microbenchmark results for this PR's commit, please check out the Evergreen perf task for this patch.

@qingyang-hu qingyang-hu marked this pull request as ready for review March 20, 2026 12:16
Copilot AI review requested due to automatic review settings March 20, 2026 12:16
@qingyang-hu qingyang-hu requested a review from a team as a code owner March 20, 2026 12:16
@qingyang-hu qingyang-hu requested a review from tadjik1 March 20, 2026 12:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new TimeoutError type to better surface WithTransaction() timeouts (with proper timeout labeling), and updates the unified test runner plumbing so spec tests can assert on the WithTransaction() error value rather than treating it as a harness failure.

Changes:

  • Add mongo.TimeoutError (wrapping an underlying cause and reporting ExceededTimeLimitError via HasErrorLabel).
  • Update Session.WithTransaction() to return TimeoutError{Wrapped: err} for timeout-driven retry/backoff exits.
  • Update unified spec runner operation execution to propagate operation results (including Err) and unskip a previously skipped CSOT convenient-transactions spec test.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
mongo/session.go Wraps certain overall-timeout exit paths in TimeoutError when retry/backoff would exceed the transaction timeout.
mongo/errors.go Adds the exported TimeoutError type with Unwrap() and HasErrorLabel() support.
internal/spectest/skip.go Removes a skip entry for the convenient-transactions timeout surfacing test.
internal/integration/unified/unified_spec_runner.go Updates caller to handle operation.execute() returning a result + error.
internal/integration/unified/testrunner_operation.go Updates thread/loop execution to match new operation.execute() signature.
internal/integration/unified/session_operation_execution.go Changes withTransaction operation to return the WithTransaction() error in operationResult.Err for verification.
internal/integration/unified/operation.go Changes operation.execute() signature to return *operationResult alongside an execution error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

// Error implements the error interface.
func (e TimeoutError) Error() string {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeoutError.Error() calls e.Wrapped.Error() directly, which will panic if Wrapped is nil. Even if Wrapped is expected to be non-nil today, this is an exported error type and should be defensive (e.g., handle nil Wrapped by returning a generic timeout message).

Suggested change
func (e TimeoutError) Error() string {
func (e TimeoutError) Error() string {
if e.Wrapped == nil {
return "timeout error"
}

Copilot uses AI. Check for mistakes.
Comment on lines +830 to +853
// TimeoutError represents an error that occurred due to a timeout.
type TimeoutError struct {
Wrapped error
}

// Error implements the error interface.
func (e TimeoutError) Error() string {
return e.Wrapped.Error()
}

// Unwrap returns the underlying error.
func (e TimeoutError) Unwrap() error {
return e.Wrapped
}

// HasErrorLabel returns true if the error contains the specified label.
func (e TimeoutError) HasErrorLabel(label string) bool {
if label == "ExceededTimeLimitError" {
return true
} else if le := LabeledError(nil); errors.As(e.Wrapped, &le) {
return le.HasErrorLabel(label)
}
return false
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a unit test for TimeoutError to ensure it integrates with existing timeout detection and labeling (e.g., IsTimeout should return true via the ExceededTimeLimitError label, and HasErrorLabel should delegate to the wrapped error for other labels). There are already table-driven tests for IsTimeout in mongo/errors_test.go that can be extended.

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +263
_, err = threadOp.execute(ctx, loopDone)
return err
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runOnThread: the added closure assigns to the outer-scoped err variable (_, err = threadOp.execute(...)). Because the task runs asynchronously, this captures and mutates a variable from the parent stack frame, which is a data race and can also clobber the error used earlier in the case block. Use a new local variable inside the closure (e.g., _, execErr := ...; return execErr) to avoid capturing/mutating outer state.

Suggested change
_, err = threadOp.execute(ctx, loopDone)
return err
_, execErr := threadOp.execute(ctx, loopDone)
return execErr

Copilot uses AI. Check for mistakes.
Comment on lines +108 to 118
var cbErr error
for idx, oper := range operations {
if err := oper.execute(ctx, loopDone); err != nil {
return nil, fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, err)
res, execErr := oper.execute(ctx, loopDone)
if execErr != nil {
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
return nil, nil
}
if cbErr == nil && res != nil {
cbErr = res.Err
}
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executeWithTransaction callback: once an operation inside the callback produces a non-nil operation error (res.Err), the callback should return that error immediately so subsequent operations are not executed. The current logic records the first res.Err in cbErr but continues executing later operations, which can change transactional behavior compared to a real application callback and can cause extra side effects or different errors.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +113
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
return nil, nil
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executeWithTransaction callback: on execErr, the callback currently returns (nil, nil), which causes WithTransaction to commit even though the callback did not successfully execute all operations, and relies on mutating the outer err to fail later. It would be safer to return the wrapped execution error from the callback so WithTransaction aborts/cleans up consistently and avoids committing partial work.

Suggested change
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
return nil, nil
return nil, fmt.Errorf("error executing operation %q at index %d: %w", oper.Name, idx, execErr)

Copilot uses AI. Check for mistakes.
Comment on lines 179 to 183
select {
case <-timeout.C:
return nil, err
return nil, TimeoutError{Wrapped: err}
default:
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithTransaction now wraps some timeout paths with TimeoutError, but other uses of the same overall timeout timer still return the raw err (e.g., the commit retry loop). This can lead to inconsistent timeout signaling/labeling for callers. If the goal is to surface a timeout error for WithTransaction timeouts, consider wrapping all returns caused by the overall timeout timer consistently.

Copilot uses AI. Check for mistakes.
@qingyang-hu qingyang-hu marked this pull request as draft March 20, 2026 12:38
@qingyang-hu qingyang-hu marked this pull request as ready for review March 20, 2026 20:03
Copy link
Member

@tadjik1 tadjik1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @qingyang-hu , thanks for your work, it looks good!

I left few comments in the code on places I think should be improved, please let me know what you think about it.

if e.Wrapped == nil {
return "operation timed out"
}
return e.Wrapped.Error()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest add some string prefix so this error is distinguishable (similar to what we do for other errors

Suggested change
return e.Wrapped.Error()
return "operation timed out: " + e.Wrapped.Error()

return nil, fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, err)
res, execErr := oper.execute(ctx, loopDone)
if execErr != nil {
err = fmt.Errorf("error executing operation %q at index %d: %v", oper.Name, idx, execErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: we capture err here because we want to get "3rd type of result" from this function, errors that are not related to WithTransaction logic, right? So we do want to get error, but we don't want driver to handle this as if it's a transaction error.

case <-timeout.C:
sleep.Stop()
return nil, err
return nil, TimeoutError{Wrapped: err}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says we have to distinguish between CSOT and non-CSOT errors:
Note 1: When the TIMEOUT_MS (calculated in step [1.3] is reached we MUST report a timeout error wrapping the last error that was encountered which triggered the retry behavior. If timeoutMS is set, then timeout error is a special type which is defined in CSOT , If timeoutMS is not set, then propagate it as timeout error if the language allows to expose the underlying error as a cause of a timeout error. If timeout error is thrown then it SHOULD expose error label(s) from the transient error.
https://github.com/mongodb/specifications/blob/master/source/transactions-convenient-api/transactions-convenient-api.md#sequence-of-actions

In the current approach we always return TimeoutErrors which are indistinguishable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants