Skip to content
This repository was archived by the owner on Feb 11, 2026. It is now read-only.

B 22658 enhancements#13

Merged
cameroncaci merged 5 commits into
mainfrom
b-22658-enhancements
Mar 20, 2025
Merged

B 22658 enhancements#13
cameroncaci merged 5 commits into
mainfrom
b-22658-enhancements

Conversation

@cameroncaci
Copy link
Copy Markdown
Contributor

@cameroncaci cameroncaci commented Mar 19, 2025

B-22658

Summary

This enhancement allows for parallel processing at the table extraction level. Previously, we introduced parallel processing to allow for more than 1 table to be extracted at a time, and this introduced a new bottleneck. At the end of a full database extraction, if there was a super large table (audit_history for example), then we would only have 1 processor extracting it from SQL to JSON. This caused timeouts because it was too slow and we were sitting on 5 unused processors. Now, all processors we will used to convert every batch in the table to JSON, and then uploaded to S3. So we will queue all tables and batches to the CPU. By queueing them all without limit, we let the processor execute every job as it needs to.

At the parent table level, the “manager” awaits a response from the worker, which does not queue a job until the worker event triggers it to act. This prevents thread hogging and allows the workers to extract SQL to JSON without interruption, and then when the lower-level manager receives its event, it will upload if the byte size meets the criteria of 50MB buffer or it’s the last batch. We then clear the buffer from memory immediately and let the next worker send its SQL -> JSON data up for the next upload.

Links

Link to a fresh database extraction in Loadtest (6 processors)
https://us-gov-west-1.console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fdata-warehouse-data-warehouse/log-events/2025$252F03$252F19$252F$255B$2524LATEST$255D111ca372e0d3471bb1edd8c60e795515

Link to an incremental extraction in Loadtest (2 processors)
https://us-gov-west-1.console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fdata-warehouse-data-warehouse/log-events/2025$252F03$252F19$252F$255B$2524LATEST$255D96b4ac03e9ac4d4d8d6fe2d8c8a30923

Per docs for database extraction for Advana, a full, fresh extraction must use 6 processors. Incremental works fine with 2 processors (From loadtest, we'll see once in stg/prd if it needs more. Sent a notice to Advana that we'd like to test it in stg)

Copy link
Copy Markdown

@traskowskycaci traskowskycaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significant performance increase here. Very nice!

@cameroncaci
Copy link
Copy Markdown
Contributor Author

@cameroncaci cameroncaci merged commit 72244c2 into main Mar 20, 2025
@cameroncaci cameroncaci deleted the b-22658-enhancements branch March 20, 2025 12:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants