Add Cloud Bigtable Change Stream integration tests#29127
Add Cloud Bigtable Change Stream integration tests#29127Abacn merged 8 commits intoapache:masterfrom
Conversation
Change-Id: I68a877d5686f1898686b18491c6b4aff5e699862
Change-Id: I490fca7ba2f24b15faa288d6f2c3f209db59f948
… set instanceId Change-Id: I4d5e5347dbba09a0e1ee170b8aa911d2f0a772ef
…set it explicitly prior to this Change-Id: I5ee3c663d3120ee85970ae5f24a962b9535323b3
Abacn
left a comment
There was a problem hiding this comment.
Thanks for adding the test. Had a few comment about the overall structure/scope of the change. Not yet reviewed the test (preferably bigtable team could review)
| @@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions { | |||
| String getInstanceId(); | |||
|
|
|||
| void setInstanceId(String value); | |||
|
|
|||
| @Description("Project for Bigtable") | |||
There was a problem hiding this comment.
Interesting that this file has been moved back and forth: 4833b3f
what is the consideration that moving this back to main?
Also, what is the context that removing "setBigtableProject" option? In theory the project that Bigtable instance locates can be different from the project used to run Dataflow.
There was a problem hiding this comment.
A few reasons.
- I'm moving it so I can register it in
GcpIoPipelineOptionsRegistrar.javafor reasons explained in the comment below. - I'm removing
setBigtableProjectbecause of the reason below, we can't actually setbigtableProjectto pass to the tests anyways. So I don't think anything is depending on this so it should be safely removable. - By registering
bigtableProject, it conflicts with some of the tests we have because they also try to registerbigtableProject. So it was either remove here or change it in the tests. - I took a brief look at firestore and spanner, neither has an option to set a different project. So it doesn't seem like it's a common use case.
There was a problem hiding this comment.
If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment)
in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion.
If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level.
ack for removal of setBigtableProject
| .add(PubsubOptions.class) | ||
| .add(FirestoreOptions.class) | ||
| .add(TestBigQueryOptions.class) | ||
| .add(BigtableTestOptions.class) |
There was a problem hiding this comment.
If this is the only place that refers to BigtableTestOption class, it's not necessary to move the Option class. We can register the option in the test class.
There was a problem hiding this comment.
Maybe there's an alternate solution. I want to be able to pass in "instanceId" as a parameter to the IT. But gcp project defines integrationTest to explicitly take a few parameters and pass them to beam beamTestPipelineOptions. So I added instanceId. But, unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.
So this ended up being the solution. Happy to try something else.
There was a problem hiding this comment.
In each IT class, one can get a TestPipelineOptions instance like this
and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g.
No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.
unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.
That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail.
| return builder.build(); | ||
| } | ||
|
|
||
| public static BigtableIO.ReadChangeStream buildTestPipelineInput( |
There was a problem hiding this comment.
if this is the only method that org.apache.beam.sdk.io.gcp.bigtable.changestreams.it refers to, it is not necessary to change the Util to public class. Just spin up the pipeline in place or create BigtableChangeStreamTestUtils in the changestream package.
There was a problem hiding this comment.
The problem, which we (bigtable team) dug ourselves into, is that endTime is a package private field. We don't want to expose endTime as an option because it's generally not well supported. But endTime is useful to terminate tests.
So we have to create the pipeline in the same package as BigtableIO so we actually can't create the pipeline with endTime in the changestreams package.
There was a problem hiding this comment.
ack. If there are other options needed then good to promote the scope of this Util class
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
@jackdingilian to review the integration tests. |
…package Change-Id: I228dc61ca0b27131cd38a3ab24f136d0f924d9f7
| dataSettingsBuilder.setInstanceId(instanceId); | ||
| tableAdminSettingsBuilder.setInstanceId(instanceId); | ||
| dataSettingsBuilder.setAppProfileId(appProfileId); | ||
| // TODO: Remove this later. But for now, disable direct path. |
There was a problem hiding this comment.
Do we still need this? We should get this from the default config now I think, but I don't remember 100% why it was necessary here.
There was a problem hiding this comment.
We still disable direct path in BigtableChangeStreamAccessor. So it would make sense to keep it disabled when we write/read from bigtable.
|
Assigning reviewers. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
@Abacn Could you please take a look at the structure again. And if my responses make sense. Happy to make changes. Jack from Cloud Bigtable approved the test content. |
Abacn
left a comment
There was a problem hiding this comment.
Thanks, commented about the context of why we wanted test options stay in test and possible solutions to keep the TestOption in test scope. Actually FIreStoreOptions is an antipattern (see above) and deemed change
| @@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions { | |||
| String getInstanceId(); | |||
|
|
|||
| void setInstanceId(String value); | |||
|
|
|||
| @Description("Project for Bigtable") | |||
There was a problem hiding this comment.
If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment)
in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion.
If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level.
ack for removal of setBigtableProject
| .add(PubsubOptions.class) | ||
| .add(FirestoreOptions.class) | ||
| .add(TestBigQueryOptions.class) | ||
| .add(BigtableTestOptions.class) |
There was a problem hiding this comment.
In each IT class, one can get a TestPipelineOptions instance like this
and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g.
No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.
unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.
That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail.
| return builder.build(); | ||
| } | ||
|
|
||
| public static BigtableIO.ReadChangeStream buildTestPipelineInput( |
There was a problem hiding this comment.
ack. If there are other options needed then good to promote the scope of this Util class
| "--tempRoot=${gcpTempRoot}", | ||
| "--firestoreDb=${firestoreDb}", | ||
| "--host=${host}", | ||
| "--instanceId=${instanceId}", |
There was a problem hiding this comment.
@Abacn Want to bring your attention here. I want to be able to run something like this
./gradlew :sdks:java:io:google-cloud-platform:integrationTest --tests=org.apache.beam.sdk.io.gcp.bigtable.changestreams.it.BigtableChangeStreamIT -PgcpProject=my-test-project -PinstanceId=my-test-instance
BigtableTestOptions has a field instanceId but in order to pass a specific string that's not the default string, the only that I found to do it is by adding it here.
This works fine for ChangeStreamIT because it registers BigtableTestOptions which defines instanceId, so we can pass --instanceId to beamTestPipelineOptions. However, all other integration tests that do not register BigtableTestOptions will fail because we are passing --instanceId to beamTestPipelineOptions but they don't have an option that reigsters instanceId.
That's why I moved the BigtableTestOptions to the main directory and registered it by default.
Is there an alternative way to pass instanceId to my test that does not require explicitly passing instanceId here?
There was a problem hiding this comment.
This works fine for ChangeStreamIT because it registers BigtableTestOptions which defines instanceId, so we can pass --instanceId to beamTestPipelineOptions. However, all other integration tests that do not register BigtableTestOptions will fail because we are passing --instanceId to beamTestPipelineOptions but they don't have an option that reigsters instanceId.
Thanks, understand. For irrelevant test this was due to --instanceId pipeline option not recognized, as you stated. This happens because currently all gcp integration tests are triggered by a single gradle command. Restructure test is a more substantial change. For now, we can keep it in main if this is the only approach to make it work, but consider to make the option specialized, like
- use "--bigtableInstanceId" instanceId is too generic. The options in main scope will affect all users including customers
…ecifying Change-Id: Ic66c4c061ed2f5979f6a530905e3cbbddd238f18
…es to avoid conflicts Change-Id: I489850e07812058e8c7ebb3c9878eae9d4bc9f06
Add a basic set of integration tests for Cloud Bigtable Change Stream.
Add functionality
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.