B 22658 enhancements by cameroncaci · Pull Request #13 · transcom/aws-data-warehouse-lambda

cameroncaci · 2025-03-19T18:23:19Z

B-22658

Summary

This enhancement allows for parallel processing at the table extraction level. Previously, we introduced parallel processing to allow for more than 1 table to be extracted at a time, and this introduced a new bottleneck. At the end of a full database extraction, if there was a super large table (audit_history for example), then we would only have 1 processor extracting it from SQL to JSON. This caused timeouts because it was too slow and we were sitting on 5 unused processors. Now, all processors we will used to convert every batch in the table to JSON, and then uploaded to S3. So we will queue all tables and batches to the CPU. By queueing them all without limit, we let the processor execute every job as it needs to.

At the parent table level, the “manager” awaits a response from the worker, which does not queue a job until the worker event triggers it to act. This prevents thread hogging and allows the workers to extract SQL to JSON without interruption, and then when the lower-level manager receives its event, it will upload if the byte size meets the criteria of 50MB buffer or it’s the last batch. We then clear the buffer from memory immediately and let the next worker send its SQL -> JSON data up for the next upload.

Links

Link to a fresh database extraction in Loadtest (6 processors)
https://us-gov-west-1.console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fdata-warehouse-data-warehouse/log-events/2025$252F03$252F19$252F$255B$2524LATEST$255D111ca372e0d3471bb1edd8c60e795515

Link to an incremental extraction in Loadtest (2 processors)
https://us-gov-west-1.console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fdata-warehouse-data-warehouse/log-events/2025$252F03$252F19$252F$255B$2524LATEST$255D96b4ac03e9ac4d4d8d6fe2d8c8a30923

Per docs for database extraction for Advana, a full, fresh extraction must use 6 processors. Incremental works fine with 2 processors (From loadtest, we'll see once in stg/prd if it needs more. Sent a notice to Advana that we'd like to test it in stg)

…t. WARNING: This commit currently thread locks and is not 1-2 processor friendly

traskowskycaci

Significant performance increase here. Very nice!

cameroncaci · 2025-03-19T19:45:02Z

Advana concurred for a stg dump, logs here: https://us-gov-west-1.console.amazonaws-us-gov.com/cloudwatch/home?region=us-gov-west-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fdata-warehouse-data-warehouse/log-events/2025$252F03$252F19$252F$255B$2524LATEST$255D050f7fcf099c45ed8f40154ac1dbb618

cameroncaci added 5 commits March 14, 2025 14:45

swap if statement for readability

593c25f

Initial implementation of table extraction parallel processing suppor…

5434faa

…t. WARNING: This commit currently thread locks and is not 1-2 processor friendly

yield processors for workers

b961ab8

proper manager waiting and multi threading

4cd0dea

pass batch as a generator to workers and improve logging

9d65076

cameroncaci self-assigned this Mar 19, 2025

cameroncaci requested review from a team, TevinAdams, danieljordan-caci, deandreJones and traskowskycaci March 19, 2025 18:23

traskowskycaci approved these changes Mar 19, 2025

View reviewed changes

danieljordan-caci approved these changes Mar 19, 2025

View reviewed changes

cameroncaci merged commit 72244c2 into main Mar 20, 2025

cameroncaci deleted the b-22658-enhancements branch March 20, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B 22658 enhancements#13

B 22658 enhancements#13
cameroncaci merged 5 commits into
mainfrom
b-22658-enhancements

cameroncaci commented Mar 19, 2025 •

edited

Loading

Uh oh!

traskowskycaci left a comment

Uh oh!

cameroncaci commented Mar 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

cameroncaci commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

B-22658

Summary

Links

Uh oh!

traskowskycaci left a comment

Choose a reason for hiding this comment

Uh oh!

cameroncaci commented Mar 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

cameroncaci commented Mar 19, 2025 •

edited

Loading