-
Notifications
You must be signed in to change notification settings - Fork 1.5k
JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: backpressure
Are you sure you want to change the base?
JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1852
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements exponential backoff with jitter for transaction retries in MongoDB's withTransaction convenience API. The implementation adds a configurable backoff mechanism that applies delays between retry attempts when transient transaction errors occur, following the MongoDB specification with a growth factor of 1.5 for transactions.
Key Changes
- Introduces
ExponentialBackoffutility class with factory methods for transaction retries (5ms base, 500ms max, 1.5x growth) and command retries (100ms base, 10s max, 2.0x growth) - Integrates backoff logic into
ClientSessionImpl.withTransaction()to delay between retry attempts - Adjusts test configuration to verify backoff behavior with multiple retry attempts
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java | New utility class implementing exponential backoff with jitter using ThreadLocalRandom |
| driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java | Adds backoff delay before transaction retries and uses CSOT timeout when available |
| driver-core/src/test/unit/com/mongodb/internal/ExponentialBackoffTest.java | Comprehensive unit tests validating backoff calculations, growth factors, and maximum caps |
| driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java | New functional test verifying exponential backoff behavior and adjusted existing test configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java
Outdated
Show resolved
Hide resolved
| AtomicInteger retryCount = new AtomicInteger(0); | ||
|
|
||
| session.withTransaction(() -> { | ||
| retryCount.incrementAndGet(); // Count the attempt before the operation that might fail |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test verifies the retry count but does not validate that exponential backoff delays are actually applied. Consider measuring elapsed time and asserting minimum expected delays to ensure backoff is functioning correctly. For example, with 3 retries at delays of ~5ms, ~7.5ms, and ~11.25ms, the total elapsed time should be at least the sum of minimum expected delays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExponentialBackoffTest covers these unit tests already.
…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())
|
@nhachicha Please take note of mongodb/specifications#1868 |
stIncMale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I haven't reviewed
ExponentialBackoffTest, because it depends onExponentialBackoff, where I left many suggestions. - I haven't reviewed
ClientSessionImpl, because it has to implement the new specification change DRIVERS-1934: clarify drivers back off before all transaction retries (#1868). - The last reviewed commit is 90ec4d5.
| * | ||
| * @return ExponentialBackoff configured for command retries | ||
| */ | ||
| public static ExponentialBackoff forCommandRetry() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not introduce unused code that we foresee to be needed in the future. In this PR, we only need backoffs for the convenient transaction API.
| public static ExponentialBackoff forTransactionRetry() { | ||
| return new ExponentialBackoff( | ||
| TRANSACTION_BASE_BACKOFF_MS, | ||
| TRANSACTION_MAX_BACKOFF_MS, | ||
| TRANSACTION_BACKOFF_GROWTH | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within this PR, I failed to find any benefits of expressing the backoff computation as behavior of an object (an instance of ExponentialBackoff), rather than just a static method; regardless of the approach, the ClientSessionImpl.withTransaction method has to maintain one new local variable: either the lazily initialized transactionBackoff, or the transactionAttempt (that's how it is named in the spec, but we are free to use a different name). Given this, I propose to go with the more straightforward approach of expressing the backoff computation as a static method1, rather than as an object behavior.
If in the future we observe that this is not enough and the "object" approach is needed, we'll be able to change the approach. But we will have a clear reason for that.
Also, storing all the constants in each instance of ExponentialBackoff as instance fields is unnecessary. If we have to use the "object" approach in the future, we better implement it in such a way that does not duplicate constants as instance fields in each object instance (an interface / abstract class with two implementations is one way to achieve that).
P.S. Some other review comments I left are written within the current "object" approach (as opposed to he proposed "method" approach). If the suggestion in this comment is applied, those comments should not be automatically discarded, but rather each should be examined and adopted, if applicable, to the "method" approach.
1 If in the future we need another static method for command retries, we will be able to move the computational logic in a private static method, and call that method from two public methods, passing the suitable constants as method arguments.
| * limitations under the License. | ||
| */ | ||
|
|
||
| package com.mongodb.internal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the existing com.mongodb.internal.time package is a better place for ExponentialBackoff?
| * <li>{@link #forTransactionRetry()} - For withTransaction retries (5ms base, 500ms max, 1.5 growth)</li> | ||
| * <li>{@link #forCommandRetry()} - For command retries with overload (100ms base, 10000ms max, 2.0 growth)</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not duplicate method-level documentation in the class-level documentation. Any duplication opens up opportunities for inconsistencies. We duplicate public API documentation for the sake of user convenience (though I has always been opposed to that, because it have lead to inconsistencies, and will lead to them again, which harm users), but internally, there seem to be no reason to duplicate documentation at all.
|
|
||
| /** | ||
| * Creates a backoff instance configured for withTransaction retries. | ||
| * Uses: 5ms base, 500ms max, 1.5 growth factor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist that documenting this internally is valuable, which I doubt, let's do that in a way that prevents the documentation and the actual values from becoming inconsistent:
| * Uses: 5ms base, 500ms max, 1.5 growth factor. | |
| * Uses: {@value TRANSACTION_BASE_BACKOFF_MS} ms base, {@value TRANSACTION_MAX_BACKOFF_MS} ms max, | |
| * {@value TRANSACTION_BACKOFF_GROWTH} growth factor. |
P.S. Notice that I used a space between the numerical value and unit symbol. This is in accordance with the "5.4.3 Formatting the value of a quantity" section in The International System of Units (SI) Brochure released by The International Bureau of Weights and Measures (BIPM).
| session.withTransaction(() -> { | ||
| retryCount.incrementAndGet(); // Count the attempt before the operation that might fail | ||
| collection.insertOne(session, Document.parse("{ _id : 'backoff-test' }")); | ||
| return retryCount; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's simplify:
| session.withTransaction(() -> { | |
| retryCount.incrementAndGet(); // Count the attempt before the operation that might fail | |
| collection.insertOne(session, Document.parse("{ _id : 'backoff-test' }")); | |
| return retryCount; | |
| }); | |
| session.withTransaction(() -> { | |
| retryCount.incrementAndGet(); // Count the attempt before the operation that might fail | |
| return collection.insertOne(session, Document.parse("{ _id : 'backoff-test' }")); | |
| }); |
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| throw new MongoClientException("Transaction retry interrupted", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use InterruptionUtil.interruptAndCreateMongoInterruptedException.
| long backoffMs = transactionBackoff.calculateDelayMs(); | ||
| try { | ||
| if (backoffMs > 0) { | ||
| Thread.sleep(backoffMs); | ||
| } | ||
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| throw new MongoClientException("Transaction retry interrupted", e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extract this code to a private static method. The withTransaction method is already too long, we should not make it longer without good reason.
| try { | ||
| outer: | ||
| while (true) { | ||
| // Apply exponential backoff before retrying transaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment adds anything of value to a reader. Let's remove it.
| @Override | ||
| public <T> T withTransaction(final TransactionBody<T> transactionBody, final TransactionOptions options) { | ||
| notNull("transactionBody", transactionBody); | ||
| long startTime = ClientSessionClock.INSTANCE.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[just a comment on a code that wasn't changed in this PR]
I have just noticed this ClientSessionClock - it uses non-monotonic clock. Horrendous.
Relevant specification changes:
JAVA-5950