Skip to content

feat: add EndPoint.resolveAll() for multi-address DNS expansion (DRIVER-201) — Part 2/2#890

Draft
nikagra wants to merge 1 commit into
scylladb:scylla-4.xfrom
nikagra:fix/DRIVER-201-endpoint-resolve-all
Draft

feat: add EndPoint.resolveAll() for multi-address DNS expansion (DRIVER-201) — Part 2/2#890
nikagra wants to merge 1 commit into
scylladb:scylla-4.xfrom
nikagra:fix/DRIVER-201-endpoint-resolve-all

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented May 15, 2026

Problem

`EndPoint.resolve()` returns a single `SocketAddress`. When a hostname maps to multiple IPs, the driver can only try the first one at the connection layer. If that IP is unreachable the driver fails with `AllNodesFailedException` even though other IPs are available.

Fixes DRIVER-201 at the general connection layer.

Note: This is part 2 of 2. Part 1 (#889) fixes the initial contact endpoints by expanding hostnames in the load-balancing query plan. This PR fixes the underlying `EndPoint` API and `ChannelFactory` so that any connection attempt — pool connections, reconnections, cloud SNI — benefits from multi-address fallback.

Changes

`EndPoint` interface

  • `resolve()` is now `@Deprecated`.
  • New `resolveAll()` default method returns `SocketAddress[]`. The default implementation wraps `resolve()` for backward compatibility with third-party implementations.

`DefaultEndPoint`

  • Overrides `resolveAll()`: for unresolved addresses calls `InetAddress.getAllByName()` and returns one `InetSocketAddress` per IP. Falls back to a single-element array (the unresolved address) if DNS fails, so the connect attempt surfaces a descriptive error.

`SniEndPoint`

  • Overrides `resolveAll()`: re-resolves the proxy hostname on each call, sorts all A-records by IP, and returns one `InetSocketAddress` per record — enabling the driver to try each proxy IP in sequence.

`ClientRoutesEndPoint`

  • Overrides `resolveAll()`: wraps the single topology-monitor-resolved address in a one-element array (single-address by design).

`ChannelFactory`

  • `connect()` now calls `endPoint.resolveAll()` instead of `endPoint.resolve()`.
  • New `tryNextCandidate()` iterates through the returned array; on per-address failure it logs and tries the next; only fails the overall `resultFuture` when all candidates are exhausted.
  • New `connectToAddress()` scopes protocol-version negotiation (downgrade retries) to a single address, which is semantically correct.

Tests

  • `DefaultEndPointTest`: 3 new cases — already-resolved passthrough, unresolved hostname expansion, unresolvable hostname fallback.
  • `SniEndPointTest`: new class covering `resolveAll()` happy path, unresolvable host exception, and `resolve()` sanity check.
  • All 13 existing `ChannelFactory` tests pass unchanged (`LocalEndPoint` uses the default single-element `resolveAll()` via the interface default).

…ER-201)

Addresses the endpoint-API aspect of DRIVER-201.

Problem: EndPoint.resolve() returns a single SocketAddress. When a
hostname maps to multiple IPs, the driver can only try the first one
and fails with AllNodesFailedException if it is unreachable — the
remaining IPs are invisible to the connection layer.

Solution (per @dkropachev's architectural direction):
- Deprecate EndPoint.resolve(). Add EndPoint.resolveAll() with a default
  implementation that wraps resolve() in a single-element array for
  backward compatibility with third-party implementations.
- DefaultEndPoint.resolveAll(): if the stored InetSocketAddress is
  unresolved, calls InetAddress.getAllByName() to expand the hostname
  to all known IPs, returning one InetSocketAddress per IP. Falls back
  to the single-element unresolved address if DNS fails, so the connect
  attempt surfaces a descriptive error rather than returning empty.
- SniEndPoint.resolveAll(): re-resolves the proxy hostname on each call
  and returns all A-records sorted by IP, enabling the caller to try
  each proxy address in sequence.
- ClientRoutesEndPoint.resolveAll(): delegates to resolve() (single-
  address topology-monitor lookup) and wraps in a one-element array.
- ChannelFactory.connect(): replaced endPoint.resolve() with
  endPoint.resolveAll(). Iterates through the returned candidates via
  tryNextCandidate(); on per-address failure logs and tries the next;
  only fails the overall resultFuture when all candidates are exhausted.
  Protocol-version negotiation (downgrade retries) is scoped to the
  same address via connectToAddress(), which is semantically correct.

Tests:
- DefaultEndPointTest: 3 new cases — already-resolved passthrough,
  unresolved hostname expansion, unresolvable hostname fallback.
- SniEndPointTest: new class with cases for resolveAll() happy path,
  unresolvable host exception, and resolve() sanity check.
- All 13 existing ChannelFactory tests continue to pass (LocalEndPoint
  uses the default single-element resolveAll() via the interface default).
@nikagra nikagra marked this pull request as draft May 15, 2026 18:18
nikagra added a commit to nikagra/java-driver that referenced this pull request May 15, 2026
…VER-201)

newControlReconnectionQueryPlan() now creates copies of the original
contact-point nodes (with their unresolved hostname endpoints) instead
of synthetic nodes with resolved IPs. This ensures the control channel
carries the hostname endpoint, which is preserved in metadata after
topology refresh.

DNS expansion for connection fallback is handled by ChannelFactory
(PR scylladb#890), so the control-reconnection path does not need to inject
resolved-IP nodes into the query plan.

Also adds getContactPoints() stub back to LoadBalancingPolicyWrapperTest
so tests that cover the control-reconnect path continue to pass.
nikagra added a commit to nikagra/java-driver that referenced this pull request May 15, 2026
Before-init query plan now uses getContactPoints() (original unresolved
hostname nodes) instead of getResolvedContactPoints(). The DNS expansion
to all IPs happens at the ChannelFactory level (PR scylladb#890), so expanding
here was redundant and broke should_connect_with_mocked_hostname by
replacing hostname endpoints with resolved-IP endpoints.

Also remove the should_connect_when_first_dns_entry_is_non_responsive
integration test from this PR; it belongs in PR scylladb#890 where ChannelFactory
expansion actually enables it to pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant