Skip to content

AWS: Don't complete multipart upload on finalize for S3OutputStream#10874

Merged
amogh-jahagirdar merged 1 commit into
apache:mainfrom
jasonf20:dont-upload-partial-files
Aug 20, 2024
Merged

AWS: Don't complete multipart upload on finalize for S3OutputStream#10874
amogh-jahagirdar merged 1 commit into
apache:mainfrom
jasonf20:dont-upload-partial-files

Conversation

@jasonf20
Copy link
Copy Markdown
Contributor

@jasonf20 jasonf20 commented Aug 4, 2024

If a user didn't call close on the stream the file shouldn't be uploaded.

This means that if the user forgets to call close() on a completed file it will no longer be uploaded. However, this is preferred to having files which were intentionally not closed being uploaded and creating garbage.

@github-actions github-actions Bot added the AWS label Aug 4, 2024
If a user didn't call close on the stream the file shouldn't be
uploaded.

This means that if the user forgets to call close() on a completed file
it will no longer be uploaded. However, this is preferred to having
files which were intentionally not closed being uploaded.
@jasonf20 jasonf20 force-pushed the dont-upload-partial-files branch from aecc55f to 1a5b6ee Compare August 4, 2024 11:33
@amogh-jahagirdar amogh-jahagirdar changed the title S3OutputStream: Don't complete upload on finalize AWS: Don't complete upload on finalize for S3OutputStream Aug 6, 2024
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally support this change. We should not finish upload on GC's because anything that should be written out as part of the table must have been explicitly closed. It also saves machines from doing extra work during the garbage collection (theoretically reducing pause times)

I also don't think we should be relying on GC finishing uploads so I'm not really concerned about that side effect change.

Would be great to get @danielcweeks @jackye1995 opinion as well in case they see any issues.

@jasonf20
Copy link
Copy Markdown
Contributor Author

HI @danielcweeks @jackye1995 would appreciate it if you can take a look.

@jasonf20
Copy link
Copy Markdown
Contributor Author

@amogh-jahagirdar any way to move this forward?

Copy link
Copy Markdown
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me as well

@amogh-jahagirdar amogh-jahagirdar changed the title AWS: Don't complete upload on finalize for S3OutputStream AWS: Don't complete multipart upload on finalize for S3OutputStream Aug 20, 2024
@amogh-jahagirdar
Copy link
Copy Markdown
Contributor

@jasonf20 Thanks for your patience I think we can move forward as is. @Fokko Thank you for the review!

@amogh-jahagirdar amogh-jahagirdar merged commit b76d81a into apache:main Aug 20, 2024
@jasonf20 jasonf20 deleted the dont-upload-partial-files branch August 21, 2024 18:02
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants