Skip to content

sources: Speed up MySQL ingestion by parallelizing#35569

Open
def- wants to merge 1 commit into
MaterializeInc:mainfrom
def-:pr-pg-mysql-speedup
Open

sources: Speed up MySQL ingestion by parallelizing#35569
def- wants to merge 1 commit into
MaterializeInc:mainfrom
def-:pr-pg-mysql-speedup

Conversation

@def-

@def- def- commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

On a cluster with 8 workers, running on my 8 core / 16 thread dev server:

$ bin/mzcompose --find feature-benchmark down && bin/mzcompose --find feature-benchmark --release run default --scenario=MySqlInitialLoadMultiWorker --scale=+1
[...]
NAME                                | TYPE            |      THIS       |      OTHER      |  UNIT  | 'THIS' is
---------------------------------------------------------------------------------------------------------------------------------
MySqlInitialLoadMultiWorker         | wallclock       |           2.042 |          11.416 |   s    |    better:  5.6 times faster
MySqlInitialLoadMultiWorker         | memory_mz       |        1213.754 |        1019.012 |   MB   |    worse:  19.1% more

Co-written with Claude 🤖

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@def- def- force-pushed the pr-pg-mysql-speedup branch 3 times, most recently from 9c63b2a to f7effd1 Compare March 20, 2026 14:17
@def- def- changed the title Try to speed up Postgres/MySQL ingestion Try to speed up MySQL ingestion Mar 20, 2026
@def- def- force-pushed the pr-pg-mysql-speedup branch 10 times, most recently from 2ee8d42 to 2e9f0da Compare March 22, 2026 13:04
@def- def- changed the title Try to speed up MySQL ingestion Speed up MySQL ingestion by parallelizing Mar 22, 2026
@def- def- changed the title Speed up MySQL ingestion by parallelizing sources: Speed up MySQL ingestion by parallelizing Mar 23, 2026
@def- def- force-pushed the pr-pg-mysql-speedup branch 4 times, most recently from db20263 to f97084a Compare March 26, 2026 14:40
@def- def- marked this pull request as ready for review March 26, 2026 16:44
@def- def- requested a review from a team as a code owner March 26, 2026 16:44
@def- def- force-pushed the pr-pg-mysql-speedup branch from f97084a to 96687b4 Compare April 20, 2026 00:00
@def- def- force-pushed the pr-pg-mysql-speedup branch from 96687b4 to 792128d Compare May 27, 2026 13:02
@def- def- requested a review from a team as a code owner May 27, 2026 13:02
@def- def- requested a review from martykulma May 27, 2026 16:12
@def- def- force-pushed the pr-pg-mysql-speedup branch from 792128d to 11d7b39 Compare June 18, 2026 21:48
Partition each table's primary-key range across timely workers so they
read disjoint PK ranges of the initial snapshot concurrently, reducing
initial-load time for large tables with a single-column integer primary
key. A snapshot leader establishes the consistent point and broadcasts
SnapshotInfo to all workers over a timely feedback loop; each worker then
reads its range under a CONSISTENT SNAPSHOT transaction. Tables without a
suitable PK fall back to single-worker-per-table mode. Also adds a
MySqlInitialLoadMultiWorker feature benchmark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@def- def- force-pushed the pr-pg-mysql-speedup branch from 11d7b39 to a7eed48 Compare June 19, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant