Persist S3 client through Bucket __init__ by jimas14 · Pull Request #1 · amanagr/aws

jimas14 · 2021-05-17T16:42:44Z

Our team was tasked to upgrade Thumbor (from 6.7.0) to be able to preserve IPTC metadata. This requires a custom engine https://github.com/scorphus/thumbor-wand-engine, which requires python 3, which requires Thumbor 7.0.0a5. After putting all the pieces together, including the code from @amanagr's PR thumbor-community#147 to accommodate async Thumbor 7, things worked, but we noticed a ton of asyncio errors and a minimum response time of > 3000ms on load testing.

We utilize S3 loader, loader storage & result storage for our requests, and we noticed that the S3 client creation/Bucket initialization was being called upwards of 5 times for each request.

We took a pass at trying to improve this by only creating the S3 client on Bucket's first init call, and reusing that client for every subsequent request. These changes alone took the minimum response time down to < 200ms (like our 6.7.0 implementation did), and performed very well in load tests (15 containers, numprocs=2, 1000 random requests by 10 users -- same parameters as first load test).

Update (7/14): After noticing huge latency spikes with this code in production, we took another pass at optimization. Testing locally I saw that with N number of consecutive requests, processing would get blocked and lead to a ~45 second response. After analyzing the code again I realized that Bucket.exists was never actually awaiting the response['Body'].read() call from the storage.get response, so the stream was never closed, leading to lots of asyncio Unclosed response errors (see aio-libs/aiobotocore#68). I updated the exists method to just drop the storage.get altogether in favor of using HeadObject which is just metadata, no streams to be read from. Then I updated the rest of the response['Body'].read() calls to be wrapped in a try/with block so it's more resource-safe. I also removed the extra changes here which @kkopachev recommended leaving out.

I'm very interested to see what you all think. There was a couple other small refactors added as well. Credits to @cselvaraj for the idea & implementation.

cc @amanagr @kkopachev

~~A couple explanations:~~

~~The name change from Storage -> S3ResultStorage class was purely for easier identification in the profiler.~~
The broad except Exception in s3_storage.py was added because the NoSuchKey exception wasn't being caught by ClientError or BotoCoreError, and I couldn't find where NoSuchKey exception actually lives.

tc_aws/aws/bucket.py

kkopachev · 2021-06-16T18:25:22Z

tc_aws/aws/bucket.py

-            endpoint_url=endpoint,
-            config=config
-        )
+        if self._client is None:


so, from what I understand, this is the only required change here. Other changes are rather cosmetic.

Yes, this is the operative change. The rest are PEP8 changes and very minor performance changes.

kkopachev

This looks good. I'd only keep Bucket.__init__ changes as most relevant.

kkopachev · 2021-06-16T18:31:39Z

Ultimate solution to this trouble would be to modify Thumbor so it is able to call preinit methods on storage and other modules. And cleanup right before server shutdown.

jimas14 · 2021-07-14T18:02:24Z

@kkopachev Thanks so much for the review. I just updated the PR if you could take another look. I updated the PR description with our updated findings.

kkopachev · 2021-07-15T17:32:27Z

This is awesome @jimas14! Great catch on HeadObject operation!
My only concern here is that we're pinning aibotocore to an older version.

peterrus · 2022-02-03T16:49:09Z

Potentially relevant: thumbor-community#147 (comment)

jimas14 mentioned this pull request May 17, 2021

Refactor to use aiobotocore, python3 and tornado 7 thumbor-community/aws#147

Merged

kkopachev reviewed Jun 16, 2021

View reviewed changes

tc_aws/aws/bucket.py Outdated Show resolved Hide resolved

kkopachev reviewed Jun 16, 2021

View reviewed changes

kkopachev approved these changes Jun 16, 2021

View reviewed changes

Persist S3 client through Bucket __init__

eb6937b

jimas14 force-pushed the thumbor-7 branch from a1b83c9 to eb6937b Compare July 14, 2021 17:29

jimas14 added 2 commits July 14, 2021 13:53

Modify Bucket.exists to call HeadObject and return boolean

68525e2

Use try/with block when reading response['Body']

5274bd3

jimas14 force-pushed the thumbor-7 branch from f9b8144 to 5274bd3 Compare July 14, 2021 17:54

kkopachev approved these changes Jul 15, 2021

View reviewed changes

peterrus mentioned this pull request Feb 23, 2022

Merge jimas14 improvements thumbor-community/aws#155

Merged

mello7tre mentioned this pull request Feb 23, 2023

lower performance compared to tc_aws thumbor/thumbor-aws#114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist S3 client through Bucket init#1

Persist S3 client through Bucket init#1
jimas14 wants to merge 3 commits intoamanagr:thumbor-7from
jimas14:thumbor-7

jimas14 commented May 17, 2021 •

edited

Loading

Uh oh!

Uh oh!

kkopachev Jun 16, 2021

Uh oh!

cselvaraj Jun 16, 2021

Uh oh!

kkopachev left a comment

Uh oh!

kkopachev commented Jun 16, 2021

Uh oh!

jimas14 commented Jul 14, 2021

Uh oh!

kkopachev commented Jul 15, 2021

Uh oh!

peterrus commented Feb 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jimas14 commented May 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kkopachev Jun 16, 2021

Choose a reason for hiding this comment

Uh oh!

cselvaraj Jun 16, 2021

Choose a reason for hiding this comment

Uh oh!

kkopachev left a comment

Choose a reason for hiding this comment

Uh oh!

kkopachev commented Jun 16, 2021

Uh oh!

jimas14 commented Jul 14, 2021

Uh oh!

kkopachev commented Jul 15, 2021

Uh oh!

peterrus commented Feb 3, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jimas14 commented May 17, 2021 •

edited

Loading