[Client] Kvscan java client#3295
Conversation
|
@wuchong @fresh-borzoni PTAL |
There was a problem hiding this comment.
Pull request overview
Extends the Fluss Java client to support full primary-key (KV) table scans (“KvScan”) via a new KvBatchScanner, exposing it through the existing TableScan#createBatchScanner(...) APIs and documenting the new behavior and configuration.
Changes:
- Add
KvBatchScannerimplementation for streaming full-bucket KV scans viaScanKvRPC with snapshot isolation semantics. - Wire KV batch scanning into
TableScan#createBatchScanner(...)for primary-key tables when nolimitis set, and add a new client config option to control per-RPC payload size. - Add unit/integration tests plus Java client documentation for full PK-table batch scans.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/apis/java-client.md | Documents full PK-table batch scanning usage, semantics, projection, and related configs. |
| fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java | Adds client.scanner.kv.fetch.max-bytes config option for KV scan batch sizing. |
| fluss-client/src/main/java/org/apache/fluss/client/table/scanner/Scan.java | Updates scan API Javadocs to describe KV batch scan behavior vs. log-table limit requirement. |
| fluss-client/src/main/java/org/apache/fluss/client/table/scanner/TableScan.java | Routes PK-table batch scans (no limit) to KvBatchScanner and reads the new config. |
| fluss-client/src/main/java/org/apache/fluss/client/table/scanner/batch/KvBatchScanner.java | New KV full-scan batch scanner implementation (ScanKv open/continue/close, retries, projection). |
| fluss-client/src/test/java/org/apache/fluss/client/table/scanner/batch/KvBatchScannerTest.java | New protocol-level unit tests for request/response sequencing and error handling. |
| fluss-client/src/test/java/org/apache/fluss/client/table/TableKvScanITCase.java | New end-to-end IT coverage for whole-table, per-bucket, partitioned, projection, and snapshot behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f2f01a7 to
4879eb2
Compare
wuchong
left a comment
There was a problem hiding this comment.
I pushed a commit to fix the call_seq_id issue.
KvBatchScanner.sendContinuation() sends the wrong call_seq_id for the first continuation, which the server will reject with InvalidScanRequestException. Client initializes callSeqId = 0 and pre-increments, so the first continuation carries 1; but the server's ScannerContext initializes callSeqId = -1 (so the first valid client value is 0) and TabletService rejects when requestSeqId != context.getCallSeqId() + 1. The new IT tests don't catch this because the scanned data fits in one batch and no continuation is ever sent; KvBatchScannerTest only asserts the client's locally-emitted sequence against a fake gateway. To fix, either initialize callSeqId = -1 and keep pre-increment, or use post-increment so wire values start at 0.
This can be reproduced by TableKvScanITCase#snapshotIsolationHidesPostOpenWrites I updated in the appended commit.
@polyzos Please take a looks at the changes. If you don't have any problems, I can merge this once the CI is passed.
closes #3126