Skip to content

GCS client library migration in Java SDK - part 3#37900

Open
shunping wants to merge 4 commits intoapache:masterfrom
shunping:gcs-migration-3
Open

GCS client library migration in Java SDK - part 3#37900
shunping wants to merge 4 commits intoapache:masterfrom
shunping:gcs-migration-3

Conversation

@shunping
Copy link
Copy Markdown
Collaborator

  • Add open and create methods for GcsUtilV2.
  • Add bridge channel classes to ensure we keep the function signature as much as we can.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, part three of the GCS client library migration, focuses on enabling fundamental read and write capabilities using the updated GcsUtilV2. It introduces new methods for opening and creating GCS files, along with adapter classes to ensure seamless integration with existing Java I/O channel interfaces. This change is crucial for progressing the migration while maintaining functional parity and allowing for a controlled rollout of the new client library features.

Highlights

  • GcsUtilV2 Read/Write Operations: Implemented open and create methods within GcsUtilV2 to enable direct read and write operations to Google Cloud Storage using the new client library.
  • Channel Bridge Classes: Introduced GcsSeekableByteChannel and GcsWritableByteChannel to act as bridge classes, adapting the com.google.cloud library's ReadChannel and WriteChannel to the standard Java SeekableByteChannel and WritableByteChannel interfaces, maintaining API compatibility.
  • GcsUtil Delegation: Modified the existing GcsUtil class to conditionally delegate open and create calls to the new GcsUtilV2 implementation when the V2 delegate is initialized, facilitating a gradual migration.
  • Integration Tests: Added new integration tests (testRead and testWriteAndRead) in GcsUtilParameterizedIT to verify the functionality of the new GCS read and write operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @ahmedabu98 for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shunping
Copy link
Copy Markdown
Collaborator Author

shunping commented Mar 25, 2026

cc'ed @BenWhitehead

Comment on lines +564 to +565
return new GcsSeekableByteChannel(
blob.getStorage().reader(blob.getBlobId(), sourceOptions), blob.getSize());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to the following to avoid unnecessary ByteBuffer allocations.

Suggested change
return new GcsSeekableByteChannel(
blob.getStorage().reader(blob.getBlobId(), sourceOptions), blob.getSize());
ReadChannel reader = blob.getStorage().reader(blob.getBlobId(), sourceOptions);
// disable internal buffering, and make the channel non-blocking
reader.setChunkSize(0);
return new GcsSeekableByteChannel(reader, blob.getSize());

Comment on lines +612 to +614
if (options.getExpectFileToNotExist()) {
writeOptionList.add(BlobWriteOption.doesNotExist());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be flushed out more. If there isn't a precondition[1] present when the writer is created, some internal rpcs won't be able to be automatically retried. And example of the other branch can be seen in this code sample for create from array https://github.com/googleapis/java-storage/blob/ba5daed0c1d306f821cc26549142ff0bcfb80cbb/samples/snippets/src/main/java/com/example/storage/object/UploadObjectFromMemory.java#L53-L64


@Override
public int read(ByteBuffer dst) throws IOException {
int count = reader.read(dst);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the channel non-blocking, you'll probably want to add the following change if you need to always fill the provided buffer as much as possible:

Suggested change
int count = reader.read(dst);
int count = StorageChannelUtils.blockingFillFrom(dst, reader);

assertEquals(0, channel.position());

// Read content into ByteBuffer
ByteBuffer buffer = ByteBuffer.allocate((int) expectedSize);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, I generally recommend allocating larger buffers than the actual size of the expected download. By doing so, you ensure that the EOF you receive is where you expect and not that you simply have a common subrange.

Comment on lines +621 to +628
int bytesRead = 0;
while (buffer.hasRemaining()) {
int read = channel.read(buffer);
if (read == -1) {
break;
}
bytesRead += read;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use our util so that your test is more clearly focused on the logic of the data than the movement of that data.

Suggested change
int bytesRead = 0;
while (buffer.hasRemaining()) {
int read = channel.read(buffer);
if (read == -1) {
break;
}
bytesRead += read;
}
int bytesRead = StorageChannelUtils.blockingFillFrom(buffer, channel);

Comment on lines +638 to +647
MessageDigest digest = MessageDigest.getInstance("SHA-256");
digest.update(buffer);
byte[] hashBytes = digest.digest();

// Convert bytes to Hex String
StringBuilder sb = new StringBuilder();
for (byte b : hashBytes) {
sb.append(String.format("%02x", b));
}
String actualHash = sb.toString();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i'd make this a helper method to make it easier to read the test, since the test isn't actually testing sha256 computation and encoding.

@Test
public void testWriteAndRead() throws IOException {
final String bucketName = "apache-beam-temp-bucket-12345";
final GcsPath targetPath = GcsPath.fromComponents(bucketName, "test-object.txt");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this object name unique so that another test, or a past test doesn't interfere with the expectation that this test should create a new object.

public void testWriteAndRead() throws IOException {
final String bucketName = "apache-beam-temp-bucket-12345";
final GcsPath targetPath = GcsPath.fromComponents(bucketName, "test-object.txt");
final String content = "Hello, GCS!";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default chunk size for an upload using the writer is 16MiB. This means that if creating an object with less than the chunk size there will be 1 rpc to create the upload, and 1 rpc to upload and finalize the bytes. If verifying chunk size is important, you'll want this content to be larger.

}

// Verify content
assertEquals(content, readContent.toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I generally prefer asserting bytes rather than strings, that way non-printed characters or similar looking characters aren't sources of confusion in any assertion failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants