feat: lower repartition_file_min_size default from 10 MiB to 1 MiB#22439
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lower-repartition-file-min-size (5f6b84f) to 50d74a7 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lower-repartition-file-min-size (5f6b84f) to 50d74a7 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing lower-repartition-file-min-size (5f6b84f) to 50d74a7 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
5f6b84f to
918f2f5
Compare
`repartition_file_min_size` gates how aggressively `repartitioned()` splits file groups by byte range to fan a scan out across `target_partitions` worth of cores. At 10 MiB the default leaves several SF1-sized dimension tables (TPC-H `part` ≈ 24 MiB, TPC-DS `customer_address` ≈ 7 MiB, …) on a single partition, so any CPU-bound per-batch work in the scan (filter eval, dictionary expansion, etc.) is single-threaded even when the cluster has plenty of idle cores. At 1 MiB those same files split cleanly into `target_partitions` byte ranges, e.g. TPC-H Q22 drops from 30 ms → 17 ms (~1.75× faster) on a 12-core SF1 run by parallelising the `part_with_promo` filter. The cost — more `open()` calls, more metadata loads — is small (10 vs 1 extra opens per file in the worst case, each amortised over the row-group / page-index reads) and the existing knob is still available for workloads where it matters. The csv_files.slt reset is switched from `SET ... = 10485760` to `RESET ...` so the test continues to round-trip the configured default regardless of what that default is.
918f2f5 to
af56183
Compare
|
@Dandandan I know you evaluated this in #19690 and then closed the issue, presumably because the benchmarks did not reproduce an improvement you ere seeing locally. Maybe it's just the new benchmark runner, but this looks like there are consistent improvements for TPCH and some minor improvements for TPC-DS |
Yeah, I agree it looks good! |
|
Thanks! |
Summary
repartition_file_min_sizegates how aggressivelyrepartitioned()splits file groups by byte range to fan a scan out acrosstarget_partitionsworth of cores. At 10 MiB the default leaves several SF1-sized dimension tables (TPC-H `part` ≈ 24 MiB, TPC-DS `customer_address` ≈ 7 MiB, …) on a single partition, so any CPU-bound per-batch work in the scan (filter eval, dictionary expansion, etc.) is single-threaded even when the cluster has plenty of idle cores.At 1 MiB those same files split cleanly into `target_partitions` byte ranges. The cost (more `open()` calls, more metadata loads) is small in absolute terms (≤10 extra opens per file in the worst case, each amortised over the row-group / page-index reads) and the existing knob is still available for workloads where it matters.
Benchmark numbers
12-core, SF1, with the existing dynamic-filter-pushdown defaults preserved:
Test plan