[Python] Disable soft delete policy when creating new default bucket.#31344
[Python] Disable soft delete policy when creating new default bucket.#31344Abacn merged 4 commits intoapache:masterfrom
Conversation
89ca593 to
79fbb3a
Compare
|
Assigning reviewers. If you would like to opt out of this review, comment R: @riteshghorse for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
79fbb3a to
b2874bd
Compare
b2874bd to
ddfdbc8
Compare
|
R: @Abacn |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Abacn
left a comment
There was a problem hiding this comment.
thanks. Left a few comments regarding the integration test added
| result[1], 're-delete should not throw error: %s' % result[1]) | ||
|
|
||
| @pytest.mark.it_postcommit | ||
| @mock.patch('apache_beam.io.gcp.gcsio.default_gcs_bucket_name') |
There was a problem hiding this comment.
it doesn't sounds quite right that a PostCommit needs a mock. And this mock isn't mock a fake service, it's used to override nomenclature of temp bucket. What happens if we do not hack it?
Also, this test does not run a pipeline, should we configure it only run on test-suites:direct:py3xx:postCommitIT. Persumably currently it is running on Dataflow PostCommit IT suites which is not quite right
There was a problem hiding this comment.
Thanks for the review, @Abacn. Below are my responses.
it doesn't sounds quite right that a PostCommit needs a mock. And this mock isn't mock a fake service, it's used to override nomenclature of temp bucket. What happens if we do not hack it?
For a given project, the function of default_gcs_bucket_name will return a fixed bucket name as the default. If we don't override this, we need to create a particular project (other than using apache-beam-testing or whatever project the users want to provide during running this test) to test this. Per the offline discussion with @damccorm, it seems a bit overkill to create a project and then remove it afterward for this test. I think using mocking is kind of a "hack" but the code is clean. I am open to any better suggestion though.
Also, this test does not run a pipeline, should we configure it only run on test-suites:direct:py3xx:postCommitIT. Persumably currently it is running on Dataflow PostCommit IT suites which is not quite right
If you look at the other tests under gcsio_integration_test.py, they are also testing the gcsio functionality with an actual gcs operation. However, they don't trigger any pipeline running either.
Abacn
left a comment
There was a problem hiding this comment.
thanks for the explanation. This sounds good to me.
| if existing_bucket: | ||
| existing_bucket.delete() | ||
|
|
||
| bucket = gcsio.get_or_create_default_gcs_bucket(google_cloud_options) |
There was a problem hiding this comment.
Just realize, in case upstream code changed and the mock no longer effective, the following will delete the default bucket. We should assert that the created bucket is the one that with injected name, thus guard from deleting the real default bucket
There was a problem hiding this comment.
Good idea. Added the check. PTAL
|
Run Python_Transforms PreCommit 3.8 |
|
Run Python_Transforms PreCommit 3.9 |
|
Run Python_Transforms PreCommit 3.10 |
|
Run Python_Transforms PreCommit 3.11 |
addresses #31330 in Python SDK. The logic is similar to #31324.