Skip to content

perf: replace SELECT * with explicit projections in topology monitor queries#858

Merged
dkropachev merged 1 commit into
scylladb:scylla-4.xfrom
nikagra:driver-367-optimize-system-local-query
Apr 3, 2026
Merged

perf: replace SELECT * with explicit projections in topology monitor queries#858
dkropachev merged 1 commit into
scylladb:scylla-4.xfrom
nikagra:driver-367-optimize-system-local-query

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented Mar 27, 2026

Summary

  • Replaces SELECT * queries in DefaultTopologyMonitor with dynamic column projections that only fetch the columns the driver actually reads.
  • On the first query, SELECT * is still issued to discover which columns the server actually has. The response column list is then intersected with LOCAL_COLUMNS_OF_INTEREST / PEERS_COLUMNS_OF_INTEREST / PEERS_V2_COLUMNS_OF_INTEREST — three ImmutableSet<String> constants listing every column the driver reads from each table. Subsequent queries use the resulting projected column list.
  • A private intersectWithNeeded(List<String> serverColumns, ImmutableSet<String> needed) helper performs the intersection, preserving server-response order and silently dropping columns absent from the server (e.g. DSE-specific columns on non-DSE clusters, or version-specific columns).
  • DSE-specific columns (dse_version, graph, workload, workloads, server_id, storage_port, storage_port_ssl, jmx_port) are included in all *_COLUMNS_OF_INTEREST sets and are silently dropped by the intersection when the server does not expose them.
  • Calling resetColumnCaches() resets all three caches to null, causing the next query to re-issue SELECT * and re-learn the projection (used on schema-change events).

Motivation

SELECT * FROM system.local WHERE key='local' and SELECT * FROM system.peers* fetch 30–40 columns per row, many of which are never accessed by the topology monitor (e.g. key, bootstrapped, cluster_name, cql_version, supported_features). This causes unnecessary serialization on the server, wire transfer, and deserialization on the client — contributing to the session initialization slowdown reported in #282.

Note: tokens is still fetched — it is required for token-aware routing and is read by nodeInfoBuilder(). The optimization eliminates columns the driver never reads, not all large columns.

SchemaAgreementChecker already follows the explicit-projection pattern. This PR applies the equivalent approach to DefaultTopologyMonitor, using a dynamic intersection rather than hard-coded query strings so that the projection adapts to whatever columns the server exposes.

Testing

  • All 27 DefaultTopologyMonitorTest tests pass.
  • ProtocolVersionMixedClusterIT updated to expect the 13-column projected query string that results from intersecting Simulacron's mock column list with LOCAL_COLUMNS_OF_INTEREST.

Closes: DRIVER-367
Parent epic: DRIVER-274

@nikagra nikagra force-pushed the driver-367-optimize-system-local-query branch 2 times, most recently from 5ae1d6d to 2cfced8 Compare March 31, 2026 18:06
@nikagra nikagra requested a review from Copilot April 3, 2026 11:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce topology-monitor overhead by changing system table queries away from SELECT * and introducing mechanisms to control/refresh the chosen projections across reconnects.

Changes:

  • Added column-name caching and a resetColumnCaches() hook to influence how DefaultTopologyMonitor builds system table SELECT statements.
  • Added AdminResult#getColumnNames() (plus tests/helpers) to expose response metadata for the caching logic.
  • Updated unit/integration tests to account for the new query strings / caching behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
integration-tests/src/test/java/com/datastax/oss/driver/core/ProtocolVersionMixedClusterIT.java Adjusts expected query sequence and adds a projected system.local query constant.
integration-tests/src/test/java/com/datastax/oss/driver/core/PeersV2NodeRefreshIT.java Relaxes query matching to tolerate different SELECT projections.
core/src/test/java/com/datastax/oss/driver/internal/core/metadata/DefaultTopologyMonitorTest.java Adds tests for projection caching behavior and cache reset semantics.
core/src/test/java/com/datastax/oss/driver/internal/core/metadata/AdminResultTestHelper.java Adds helper to stub AdminResult#getColumnNames() in tests.
core/src/test/java/com/datastax/oss/driver/internal/core/adminrequest/AdminResultTest.java New tests validating AdminResult#getColumnNames() behavior.
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/TopologyMonitor.java Adds default resetColumnCaches() API.
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/DefaultTopologyMonitor.java Implements column-name caches and uses them to build projected SELECT queries.
core/src/main/java/com/datastax/oss/driver/internal/core/control/ControlConnection.java Resets topology monitor column caches on successful reconnect before refresh.
core/src/main/java/com/datastax/oss/driver/internal/core/adminrequest/AdminResult.java Exposes result column names via getColumnNames().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nikagra nikagra force-pushed the driver-367-optimize-system-local-query branch from 2cfced8 to e5f65a3 Compare April 3, 2026 12:33
@nikagra nikagra requested a review from dkropachev April 3, 2026 12:52
@nikagra nikagra marked this pull request as ready for review April 3, 2026 12:52
@nikagra nikagra force-pushed the driver-367-optimize-system-local-query branch from e5f65a3 to 086dad9 Compare April 3, 2026 17:28
@nikagra nikagra requested a review from Copilot April 3, 2026 17:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…R-367)

Replace SELECT * with explicit column projections in system.local and
system.peers/peers_v2 queries issued by DefaultTopologyMonitor.

The column list is learned dynamically at runtime: the first SELECT *
response populates a volatile cache per table; all subsequent queries
use the cached column names to build projected SELECTs. The caches are
reset to null on control-connection reconnect, which causes the next
call to re-issue SELECT * and re-learn whatever columns are available.

This avoids sending unnecessary data (e.g. large tokens columns) on
every topology refresh while remaining compatible with any server schema
version without a hardcoded column list.

Implementation details:
- AdminResult.getColumnNames(): new method returning the set of column
  names from the result metadata, used to populate the caches.
- DefaultTopologyMonitor: three volatile caches (localColumns,
  peersColumns, peersV2Columns), buildQuery() helpers, and
  resetColumnCaches() called by ControlConnection.onSuccessfulReconnect.
- Cache-population sites guard against caching an empty column set
  (e.g. when the server returns no rows) so the next call retries with
  SELECT * instead of projecting zero columns.
- Narrow single-node refreshNode() WHERE-clause queries always use
  SELECT * because projecting a one-row result gives negligible benefit
  and the fixed form is required by test infrastructure (Simulacron
  primes only SELECT * for that query shape).

Tests:
- DefaultTopologyMonitorTest: 30 unit tests (2 new warm-cache/reset
  tests, 1 new empty-column-set guard test).
- AdminResultTest: 3 new tests for getColumnNames().
- ProtocolVersionMixedClusterIT: updated to expect projected query on
  the second system.local call after the cache is warmed.
- PeersV2NodeRefreshIT: updated hasNodeRefreshQuery() to match on the
  stable WHERE-clause suffix rather than the SELECT * literal.
@nikagra nikagra force-pushed the driver-367-optimize-system-local-query branch from 086dad9 to 4a63624 Compare April 3, 2026 17:50
@dkropachev dkropachev merged commit 9e5b95d into scylladb:scylla-4.x Apr 3, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants