feat: Plumb Parquet virtual columns (row_number) through TableSchema and ParquetOpener by mbutrovich · Pull Request #22026 · apache/datafusion

mbutrovich · 2026-05-05T17:49:11Z

Which issue does this PR close?

Part of [EPIC] A collection of support for metadata columns in ListingTable #20135 (epic: virtual / metadata columns). Does not close that epic; see this comment describing the scope split.
Revives Expose virtual columns from the Arrow Parquet reader in datasource-parquet #20133 (auto-closed stale) — same core plumbing, credit to @jkylling.
Unblocks [native_datafusion] Add support for reading row index metadata columns datafusion-comet#3432 (remove native_datafusion fallback for Spark's _tmp_metadata_row_index).

Rationale for this change

arrow-rs 57.1.0+ supports Parquet virtual columns (row_number, row_group_index) via ArrowReaderOptions::with_virtual_columns, and DataFusion pins a new-enough arrow-rs for the API to be available. DataFusion does not yet plumb the option through ParquetOpener, so consumers (notably Comet) cannot project Spark's _tmp_metadata_row_index through the native_datafusion scan path.

This PR adds the minimal opener-boundary plumbing so TableSchema can carry virtual columns and the Parquet reader produces them. UX / SQL-layer surface for virtual columns stays deferred to the epic in #20135 — this follows the same framing alamb blessed for #20071 (the input_file_name() UDF).

What changes are included in this PR?

TableSchema::with_virtual_columns(...) builder + virtual_columns() getter. Layout: [file, partition, virtual]. Composable with with_table_partition_cols in either order.
TableSchema::schema_without_virtual_columns() — file + partition schema used by pushdown-planning paths that can't evaluate virtual-col refs.
ParquetOpener forwards the fields to ArrowReaderOptions::with_virtual_columns; augments the schemas passed to the expr-adapter / simplifier with virtual fields so virtual-col refs identity-rewrite; strips them from the projection fed to ProjectionMask::roots (which only understands file columns) and appends them to stream_schema so reassign_expr_columns resolves them by name.
New ParquetVirtualColumn enum with TryFrom<&FieldRef> (in datasource-parquet::virtual_column) gates which arrow-rs virtual extension types are accepted. Currently only RowNumber; adding a variant (e.g. RowGroupIndex) is a compile-time obligation. Replaces the earlier runtime string-allowlist so the contract lives in the type system.
ParquetSource::try_pushdown_filters classifies filters against the file+partition schema (not the full table schema) so predicates referencing virtual columns are reported as PushedDown::No and the FilterExec stays above the scan — arrow-rs's RowFilter addresses parquet leaves only and can't evaluate virtual-column refs, so silently pushing them would produce wrong results.
Defensive check in the opener: build_virtual_columns_state (run once per scan partition at morselizer-build time) errors when pushdown_filters=true and the predicate references a virtual column, with a clear remediation message pointing at try_pushdown_filters. This catches callers that bypass the optimizer and set the predicate on ParquetSource directly.
arrow-schema added as a direct dep (previously transitive via arrow) so the enum references RowNumber::NAME from arrow-rs instead of hardcoding the string.
Explicitly not in scope (follow-ups): ListingTable / SQL-layer surface, a three-arg constructor on TableSchema, ParquetSource::with_virtual_columns, and RowGroupIndex support.

Are these changes tested?

Yes. New unit tests in opener.rs:

test_row_index_basic — single row group, select data + row_number.
test_row_index_projection_only — select only row_number.
test_row_index_multi_row_group — 3 × 100 rows, verify absolute 0..300 across boundaries.
test_row_index_with_row_group_skip — predicate stats-prunes the middle row group; verify row numbers stay absolute (0..100 ++ 200..300). Critical correctness gate for Spark (and for Fix RowNumberReader when not all row groups are selected arrow-rs#8863).
test_row_index_with_partition_cols — partition + virtual + data columns compose correctly.
test_row_index_nullable_int64 — nullability flag flows through unchanged (matches Spark's _tmp_metadata_row_index declaration).
test_unsupported_virtual_extension_type_rejected — using RowGroupIndex (a real arrow-rs type deliberately not in the enum yet) errors with NotImplemented instead of silently forwarding.
test_row_index_predicate_pushdown_mixed_or_errors / _virtual_only_errors / _allowed_when_pushdown_disabled — exercise the opener's defensive check for virtual-col predicate refs with pushdown_filters=true, and confirm the pushdown_filters=false path is unaffected.

In source.rs: test_try_pushdown_filters_rejects_virtual_column_refs pins the planner-boundary contract — file-col filters are PushedDown::Yes, virtual-only and mixed filters are PushedDown::No.

In virtual_column.rs: unit tests covering TryFrom<&FieldRef> for valid, missing-extension-type, and unsupported-extension-type inputs.

Plus a TableSchema unit test verifying the [file, partition, virtual] layout is stable regardless of builder-call order.

Are there any user-facing changes?

Public API additions: TableSchema::with_virtual_columns(...), TableSchema::virtual_columns(), TableSchema::schema_without_virtual_columns(), and ParquetVirtualColumn (re-exported from datafusion-datasource-parquet). No existing API changed; no breaking changes.

…and ParquetOpener, gated behind a tested-only extension-type allowlist, to unblock Comet's native-DataFusion support for Spark's _tmp_metadata_row_index.

adriangb · 2026-05-05T19:20:52Z

My main concern is #22026 (comment).

The various schemas in opener.rs are already quite complex, this risks making it worse.

mbutrovich · 2026-05-05T20:08:37Z

My main concern is #22026 (comment).

The various schemas in opener.rs are already quite complex, this risks making it worse.

Thanks for the review @adriangb! Agreed it could make things more complicated, but if DataFusion is ever going to support these virtual columns it might be unavoidable. I think it's good to hash this stuff out in the smallest possible PR at the opener level. I'll push an update later today.

mbutrovich · 2026-05-05T20:52:29Z

Thanks again for the review @adriangb! Hopefully I addressed all of the feedback, but happy to keep chatting about it.

Mixed virtual/file predicates with pushdown_filters=true

Confirmed the silent-drop bug with failing tests. Root cause: ParquetSource::try_pushdown_filters called can_expr_be_pushed_down_with_schemas with the full table schema (now including virtual columns), so filters referencing row_number were marked PushedDown::Yes → FilterExec removed → the scan's build_row_filter couldn't resolve the virtual-col ref against physical_file_schema and silently dropped the conjunct.

Arrow-rs can't accept virtual-column refs in a RowFilter at all: ArrowPredicate::projection() returns a ProjectionMask over parquet leaves only, and virtual columns are synthesized after filter evaluation. So virtual columns are projectable but never pushable.

Fix: added TableSchema::schema_without_virtual_columns() (file + partition, excluding virtual) and try_pushdown_filters uses that. Virtual-col filters are now reported PushedDown::No and the FilterExec stays above the scan.

Defense-in-depth in the opener for callers who bypass the optimizer (e.g. manual plan builders): prepare_open_file rejects pushdown_filters=true + virtual-col predicate with a clear error pointing at with_pushdown_filters(false) or keeping the filter above the scan.

Tests: source.rs::test_try_pushdown_filters_rejects_virtual_column_refs (planner boundary), plus three opener-level tests covering mixed OR, virtual-only, and the allowed pushdown_filters=false case.

Ordering doc on virtual_columns

Struct field doc now spells out the [file, partition, virtual] layout, matching the builder methods.

Enum + TryFrom

Added ParquetVirtualColumn with TryFrom<&FieldRef> in a new virtual_column.rs. The runtime allowlist in the opener is replaced with ParquetVirtualColumn::try_from(field)?. Adding a new variant (e.g. RowGroupIndex) is now a compile-time obligation, and consumers can pattern-match instead of string-comparing extension-type names. Exposed as pub use ParquetVirtualColumn at the crate root.

adriangb · 2026-05-05T21:03:00Z

I think this would then have a negative interaction with the goal of turning filter pushdown on by default. Maybe we'll always have to apply some filters as a FilterExec and that's fine...

mbutrovich · 2026-05-05T21:24:54Z

I think this would then have a negative interaction with the goal of turning filter pushdown on by default. Maybe we'll always have to apply some filters as a FilterExec and that's fine...

Comet conservatively never removes FilterExec nodes above scans with pushed down filters, though that maybe shouldn't be the case.

Wouldn't this only prevent filter pushdown for filters that reference virtual columns?

adriangb · 2026-05-05T22:03:48Z

Wouldn't this only prevent filter pushdown for filters that reference virtual columns?

Yeah but it means we'll have to keep the split forever. Which might have been the case anyway and maybe a non issue.

And that any filter that does reference virtual columns cannot be pushed down even if a part of it would benefit from doing so, e..g row_id = 1 and pk = 1, but I'm not sure that's a realistic scenario. In the past we prevented pushdown of projection columns and that was a real issue, we'd see queries in prod from users along the lines of day = '...' OR pk = 1 that could not get pushed down.

adriangb · 2026-05-05T22:04:05Z

I plan to give this another review tomorrow.

comphead · 2026-05-05T23:28:32Z

run benchmark tpch tpcds

comphead · 2026-05-05T23:29:47Z

@mbutrovich from high level perspective how row_number virtual column would work when reading multiple parquet files?

adriangbot · 2026-05-05T23:31:21Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4383929017-2034-5dnfv 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing virtual-columns-table-schema (bd513ec) to 2c7af17 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-05T23:31:52Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4383929017-2033-f8cjt 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing virtual-columns-table-schema (bd513ec) to 2c7af17 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-05T23:44:45Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and virtual-columns-table-schema
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃   virtual-columns-table-schema ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 40.03 / 41.45 ±1.55 / 43.79 ms │ 39.49 / 40.62 ±1.19 / 42.39 ms │ no change │
│ QQuery 2  │ 21.07 / 21.56 ±0.69 / 22.87 ms │ 20.75 / 20.84 ±0.10 / 21.04 ms │ no change │
│ QQuery 3  │ 35.68 / 38.24 ±1.33 / 39.30 ms │ 35.44 / 37.61 ±1.64 / 39.23 ms │ no change │
│ QQuery 4  │ 18.04 / 18.37 ±0.17 / 18.52 ms │ 18.06 / 18.12 ±0.05 / 18.19 ms │ no change │
│ QQuery 5  │ 43.56 / 45.18 ±2.01 / 48.95 ms │ 43.18 / 44.20 ±0.87 / 45.76 ms │ no change │
│ QQuery 6  │ 17.06 / 17.19 ±0.14 / 17.45 ms │ 17.06 / 17.16 ±0.08 / 17.28 ms │ no change │
│ QQuery 7  │ 49.86 / 50.64 ±0.54 / 51.27 ms │ 50.04 / 52.14 ±2.32 / 56.63 ms │ no change │
│ QQuery 8  │ 46.49 / 46.74 ±0.14 / 46.88 ms │ 46.50 / 46.95 ±0.65 / 48.22 ms │ no change │
│ QQuery 9  │ 51.78 / 52.17 ±0.28 / 52.54 ms │ 51.72 / 52.33 ±0.52 / 53.01 ms │ no change │
│ QQuery 10 │ 65.29 / 65.42 ±0.11 / 65.57 ms │ 65.11 / 65.91 ±1.20 / 68.29 ms │ no change │
│ QQuery 11 │ 13.62 / 14.10 ±0.63 / 15.35 ms │ 13.68 / 14.39 ±1.31 / 17.00 ms │ no change │
│ QQuery 12 │ 26.16 / 26.42 ±0.24 / 26.78 ms │ 26.36 / 26.73 ±0.28 / 27.10 ms │ no change │
│ QQuery 13 │ 35.63 / 36.37 ±0.51 / 36.97 ms │ 35.10 / 36.02 ±0.71 / 36.92 ms │ no change │
│ QQuery 14 │ 26.54 / 27.04 ±0.62 / 28.24 ms │ 26.64 / 26.83 ±0.15 / 27.07 ms │ no change │
│ QQuery 15 │ 32.68 / 32.81 ±0.10 / 32.95 ms │ 32.57 / 33.23 ±0.62 / 34.39 ms │ no change │
│ QQuery 16 │ 15.17 / 15.27 ±0.06 / 15.36 ms │ 15.10 / 15.24 ±0.11 / 15.42 ms │ no change │
│ QQuery 17 │ 75.04 / 76.49 ±0.95 / 77.33 ms │ 75.97 / 77.19 ±1.14 / 79.00 ms │ no change │
│ QQuery 18 │ 67.84 / 68.82 ±0.96 / 70.42 ms │ 67.31 / 68.81 ±0.94 / 69.99 ms │ no change │
│ QQuery 19 │ 37.52 / 37.65 ±0.13 / 37.90 ms │ 37.42 / 37.70 ±0.22 / 38.08 ms │ no change │
│ QQuery 20 │ 38.52 / 38.72 ±0.15 / 38.88 ms │ 38.62 / 39.10 ±0.33 / 39.53 ms │ no change │
│ QQuery 21 │ 58.33 / 59.44 ±0.83 / 60.37 ms │ 59.62 / 60.74 ±0.71 / 61.68 ms │ no change │
│ QQuery 22 │ 23.78 / 23.97 ±0.18 / 24.28 ms │ 23.64 / 24.06 ±0.42 / 24.80 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                           ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 854.08ms │
│ Total Time (virtual-columns-table-schema)   │ 855.91ms │
│ Average Time (HEAD)                         │  38.82ms │
│ Average Time (virtual-columns-table-schema) │  38.90ms │
│ Queries Faster                              │        0 │
│ Queries Slower                              │        0 │
│ Queries with No Change                      │       22 │
│ Queries with Failure                        │        0 │
└─────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	32.0s
CPU sys	2.2s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	31.9s
CPU sys	2.3s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-05T23:47:00Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and virtual-columns-table-schema
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃          virtual-columns-table-schema ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.47 / 7.04 ±0.91 / 8.85 ms │           6.37 / 6.84 ±0.83 / 8.50 ms │     no change │
│ QQuery 2  │        82.70 / 83.63 ±0.58 / 84.33 ms │        83.93 / 84.87 ±0.51 / 85.28 ms │     no change │
│ QQuery 3  │        31.32 / 31.64 ±0.18 / 31.82 ms │        31.09 / 31.27 ±0.15 / 31.53 ms │     no change │
│ QQuery 4  │    563.85 / 580.17 ±13.51 / 595.57 ms │     573.63 / 590.61 ±9.15 / 599.40 ms │     no change │
│ QQuery 5  │        55.02 / 56.17 ±1.00 / 57.36 ms │        55.18 / 55.96 ±0.51 / 56.58 ms │     no change │
│ QQuery 6  │        38.16 / 38.73 ±0.55 / 39.46 ms │        38.26 / 39.97 ±2.02 / 43.82 ms │     no change │
│ QQuery 7  │     116.31 / 118.29 ±1.77 / 121.63 ms │     115.76 / 117.67 ±1.98 / 121.39 ms │     no change │
│ QQuery 8  │        41.49 / 41.70 ±0.24 / 42.13 ms │        40.89 / 40.96 ±0.07 / 41.06 ms │     no change │
│ QQuery 9  │        55.77 / 59.88 ±2.62 / 63.46 ms │        54.56 / 57.98 ±2.05 / 60.75 ms │     no change │
│ QQuery 10 │        86.67 / 87.58 ±0.77 / 88.54 ms │        85.72 / 86.23 ±0.52 / 87.06 ms │     no change │
│ QQuery 11 │    351.22 / 364.88 ±11.43 / 377.34 ms │     363.72 / 370.19 ±3.52 / 373.64 ms │     no change │
│ QQuery 12 │        30.78 / 31.07 ±0.18 / 31.24 ms │        30.51 / 30.80 ±0.26 / 31.28 ms │     no change │
│ QQuery 13 │     138.03 / 138.75 ±0.79 / 140.19 ms │     134.83 / 136.57 ±1.54 / 138.86 ms │     no change │
│ QQuery 14 │     527.83 / 534.55 ±3.64 / 538.71 ms │     532.98 / 536.60 ±1.87 / 538.43 ms │     no change │
│ QQuery 15 │        63.93 / 65.02 ±1.28 / 67.43 ms │        66.70 / 69.00 ±1.77 / 71.51 ms │  1.06x slower │
│ QQuery 16 │           7.18 / 7.40 ±0.23 / 7.83 ms │           7.29 / 7.43 ±0.11 / 7.57 ms │     no change │
│ QQuery 17 │        86.73 / 88.11 ±1.57 / 90.98 ms │        85.31 / 87.53 ±2.44 / 92.00 ms │     no change │
│ QQuery 18 │     163.53 / 164.50 ±0.72 / 165.59 ms │     159.80 / 163.91 ±2.25 / 166.47 ms │     no change │
│ QQuery 19 │        44.35 / 44.67 ±0.26 / 45.13 ms │        44.77 / 45.03 ±0.39 / 45.80 ms │     no change │
│ QQuery 20 │        37.61 / 38.23 ±0.45 / 38.98 ms │        37.94 / 38.58 ±0.43 / 39.23 ms │     no change │
│ QQuery 21 │        19.01 / 19.36 ±0.20 / 19.59 ms │        19.36 / 19.60 ±0.18 / 19.88 ms │     no change │
│ QQuery 22 │        64.17 / 65.22 ±0.94 / 66.71 ms │        68.63 / 69.51 ±0.58 / 70.44 ms │  1.07x slower │
│ QQuery 23 │    504.82 / 524.83 ±20.74 / 562.39 ms │     510.61 / 522.29 ±9.80 / 533.95 ms │     no change │
│ QQuery 24 │     249.60 / 252.27 ±2.50 / 255.57 ms │     250.70 / 259.29 ±7.06 / 271.39 ms │     no change │
│ QQuery 25 │     120.57 / 122.44 ±1.79 / 125.60 ms │     121.80 / 123.77 ±1.59 / 126.18 ms │     no change │
│ QQuery 26 │        76.56 / 77.43 ±0.76 / 78.68 ms │        76.28 / 77.69 ±0.90 / 78.91 ms │     no change │
│ QQuery 27 │           7.11 / 7.26 ±0.14 / 7.53 ms │           7.38 / 7.65 ±0.15 / 7.77 ms │  1.05x slower │
│ QQuery 28 │        65.63 / 67.13 ±0.81 / 68.00 ms │        65.96 / 67.41 ±0.74 / 67.93 ms │     no change │
│ QQuery 29 │     105.15 / 107.19 ±1.27 / 109.13 ms │     106.33 / 107.90 ±2.12 / 111.97 ms │     no change │
│ QQuery 30 │                                  FAIL │                                  FAIL │  incomparable │
│ QQuery 31 │     117.44 / 118.81 ±1.06 / 120.34 ms │     117.93 / 120.44 ±1.44 / 121.93 ms │     no change │
│ QQuery 32 │        22.60 / 22.92 ±0.18 / 23.14 ms │        22.60 / 22.97 ±0.23 / 23.24 ms │     no change │
│ QQuery 33 │        42.06 / 42.87 ±0.61 / 43.74 ms │        41.40 / 42.57 ±1.83 / 46.21 ms │     no change │
│ QQuery 34 │        10.86 / 11.45 ±0.43 / 12.08 ms │        10.67 / 11.08 ±0.33 / 11.53 ms │     no change │
│ QQuery 35 │        85.94 / 87.24 ±1.76 / 90.64 ms │        85.23 / 85.60 ±0.34 / 86.16 ms │     no change │
│ QQuery 36 │           6.91 / 7.06 ±0.10 / 7.21 ms │           6.55 / 6.71 ±0.13 / 6.92 ms │     no change │
│ QQuery 37 │           7.69 / 7.82 ±0.09 / 7.93 ms │           7.52 / 7.76 ±0.16 / 7.98 ms │     no change │
│ QQuery 38 │        73.83 / 74.10 ±0.29 / 74.63 ms │        76.04 / 76.73 ±0.57 / 77.56 ms │     no change │
│ QQuery 39 │     105.55 / 107.98 ±2.12 / 110.73 ms │     109.68 / 111.93 ±1.47 / 114.04 ms │     no change │
│ QQuery 40 │        24.25 / 24.44 ±0.10 / 24.51 ms │        24.98 / 25.22 ±0.19 / 25.57 ms │     no change │
│ QQuery 41 │        14.39 / 14.59 ±0.13 / 14.77 ms │        15.22 / 15.33 ±0.07 / 15.42 ms │  1.05x slower │
│ QQuery 42 │        25.59 / 26.12 ±0.36 / 26.66 ms │        26.28 / 26.66 ±0.33 / 27.10 ms │     no change │
│ QQuery 43 │           5.65 / 5.76 ±0.10 / 5.89 ms │           5.84 / 6.56 ±0.91 / 8.35 ms │  1.14x slower │
│ QQuery 44 │        11.66 / 11.80 ±0.08 / 11.91 ms │        11.75 / 12.08 ±0.25 / 12.51 ms │     no change │
│ QQuery 45 │        45.21 / 47.41 ±1.80 / 49.02 ms │        47.82 / 48.61 ±1.29 / 51.19 ms │     no change │
│ QQuery 46 │        14.16 / 14.51 ±0.27 / 14.87 ms │        14.85 / 15.15 ±0.23 / 15.47 ms │     no change │
│ QQuery 47 │     252.56 / 265.14 ±7.41 / 275.21 ms │     250.20 / 253.65 ±2.99 / 258.15 ms │     no change │
│ QQuery 48 │     109.27 / 110.30 ±1.01 / 112.03 ms │     109.47 / 110.67 ±1.38 / 113.27 ms │     no change │
│ QQuery 49 │        85.89 / 86.30 ±0.24 / 86.62 ms │        86.03 / 87.00 ±0.62 / 87.98 ms │     no change │
│ QQuery 50 │        63.08 / 64.30 ±1.68 / 67.59 ms │        63.11 / 65.72 ±2.31 / 69.81 ms │     no change │
│ QQuery 51 │       93.81 / 97.35 ±2.10 / 100.26 ms │       96.59 / 98.01 ±1.29 / 100.06 ms │     no change │
│ QQuery 52 │        26.20 / 27.15 ±1.01 / 29.08 ms │        25.82 / 26.11 ±0.25 / 26.41 ms │     no change │
│ QQuery 53 │        32.39 / 32.49 ±0.08 / 32.62 ms │        32.17 / 33.24 ±1.49 / 36.18 ms │     no change │
│ QQuery 54 │        57.61 / 58.25 ±0.51 / 59.05 ms │        56.43 / 58.52 ±2.13 / 62.45 ms │     no change │
│ QQuery 55 │        25.19 / 25.68 ±0.51 / 26.65 ms │        25.73 / 26.27 ±0.31 / 26.66 ms │     no change │
│ QQuery 56 │        41.62 / 42.07 ±0.57 / 43.19 ms │        42.96 / 43.28 ±0.24 / 43.64 ms │     no change │
│ QQuery 57 │     187.59 / 191.07 ±2.00 / 193.20 ms │     191.85 / 193.30 ±1.38 / 195.41 ms │     no change │
│ QQuery 58 │     123.84 / 124.66 ±0.44 / 125.07 ms │     120.61 / 123.17 ±1.51 / 124.88 ms │     no change │
│ QQuery 59 │     121.67 / 122.20 ±0.56 / 122.97 ms │     120.57 / 121.92 ±0.88 / 123.10 ms │     no change │
│ QQuery 60 │        41.96 / 42.50 ±0.39 / 43.13 ms │        42.25 / 42.78 ±0.38 / 43.32 ms │     no change │
│ QQuery 61 │        14.24 / 14.30 ±0.07 / 14.43 ms │        14.44 / 14.53 ±0.07 / 14.64 ms │     no change │
│ QQuery 62 │        49.34 / 49.86 ±0.29 / 50.24 ms │        48.86 / 49.80 ±1.52 / 52.82 ms │     no change │
│ QQuery 63 │        32.72 / 33.05 ±0.19 / 33.27 ms │        32.19 / 32.43 ±0.28 / 32.97 ms │     no change │
│ QQuery 64 │     495.24 / 501.59 ±6.70 / 513.86 ms │     492.56 / 497.63 ±3.75 / 502.42 ms │     no change │
│ QQuery 65 │     149.29 / 152.59 ±2.31 / 155.63 ms │     153.14 / 156.85 ±2.60 / 161.08 ms │     no change │
│ QQuery 66 │        86.71 / 88.91 ±1.30 / 90.44 ms │        86.27 / 90.26 ±4.05 / 98.06 ms │     no change │
│ QQuery 67 │     262.50 / 269.09 ±4.74 / 274.49 ms │     266.01 / 272.73 ±4.14 / 278.81 ms │     no change │
│ QQuery 68 │        14.25 / 14.64 ±0.23 / 14.85 ms │        14.85 / 15.03 ±0.21 / 15.38 ms │     no change │
│ QQuery 69 │        81.94 / 84.11 ±2.12 / 88.00 ms │        82.21 / 85.06 ±5.13 / 95.32 ms │     no change │
│ QQuery 70 │     110.46 / 112.49 ±2.02 / 116.35 ms │     109.60 / 115.95 ±6.54 / 124.14 ms │     no change │
│ QQuery 71 │        38.30 / 39.55 ±1.99 / 43.46 ms │        37.36 / 37.54 ±0.15 / 37.77 ms │ +1.05x faster │
│ QQuery 72 │ 2175.03 / 2325.52 ±88.51 / 2444.48 ms │ 2314.95 / 2373.68 ±38.31 / 2425.89 ms │     no change │
│ QQuery 73 │        10.79 / 11.10 ±0.29 / 11.51 ms │        10.45 / 10.62 ±0.12 / 10.76 ms │     no change │
│ QQuery 74 │     206.09 / 208.67 ±1.49 / 210.17 ms │     195.10 / 200.32 ±6.41 / 211.59 ms │     no change │
│ QQuery 75 │     155.97 / 158.32 ±1.80 / 160.67 ms │     156.02 / 158.77 ±1.86 / 161.77 ms │     no change │
│ QQuery 76 │        37.66 / 38.75 ±1.68 / 42.04 ms │        37.89 / 38.50 ±0.47 / 39.26 ms │     no change │
│ QQuery 77 │        64.99 / 66.20 ±0.67 / 66.91 ms │        64.74 / 65.94 ±0.70 / 66.89 ms │     no change │
│ QQuery 78 │     202.83 / 206.45 ±3.27 / 210.58 ms │     201.87 / 206.98 ±4.10 / 210.88 ms │     no change │
│ QQuery 79 │        69.64 / 71.02 ±1.23 / 72.96 ms │        71.07 / 71.50 ±0.39 / 72.21 ms │     no change │
│ QQuery 80 │     106.92 / 109.00 ±2.04 / 112.87 ms │     106.83 / 108.13 ±1.09 / 109.68 ms │     no change │
│ QQuery 81 │        26.49 / 27.59 ±1.67 / 30.86 ms │        26.37 / 26.78 ±0.23 / 27.03 ms │     no change │
│ QQuery 82 │        18.25 / 18.61 ±0.21 / 18.87 ms │        18.61 / 18.73 ±0.10 / 18.91 ms │     no change │
│ QQuery 83 │        39.97 / 40.40 ±0.29 / 40.88 ms │        40.16 / 41.17 ±1.42 / 43.96 ms │     no change │
│ QQuery 84 │        45.46 / 46.40 ±1.58 / 49.54 ms │        45.58 / 45.84 ±0.33 / 46.49 ms │     no change │
│ QQuery 85 │     145.11 / 146.33 ±1.23 / 148.46 ms │     144.07 / 144.94 ±0.48 / 145.39 ms │     no change │
│ QQuery 86 │        27.17 / 27.58 ±0.27 / 27.96 ms │        26.17 / 26.48 ±0.26 / 26.84 ms │     no change │
│ QQuery 87 │        72.71 / 74.93 ±1.56 / 76.67 ms │        71.79 / 72.42 ±0.39 / 72.87 ms │     no change │
│ QQuery 88 │        66.63 / 67.67 ±1.03 / 69.60 ms │        67.28 / 68.12 ±0.93 / 69.86 ms │     no change │
│ QQuery 89 │        38.56 / 38.95 ±0.33 / 39.55 ms │        38.47 / 39.04 ±0.69 / 40.37 ms │     no change │
│ QQuery 90 │        19.05 / 19.39 ±0.20 / 19.68 ms │        18.98 / 19.13 ±0.09 / 19.22 ms │     no change │
│ QQuery 91 │        55.48 / 56.13 ±0.40 / 56.69 ms │        55.26 / 55.55 ±0.32 / 56.15 ms │     no change │
│ QQuery 92 │        32.86 / 33.09 ±0.13 / 33.24 ms │        31.83 / 33.11 ±1.92 / 36.93 ms │     no change │
│ QQuery 93 │        54.32 / 56.40 ±1.53 / 58.12 ms │        54.62 / 56.82 ±2.18 / 60.32 ms │     no change │
│ QQuery 94 │        42.09 / 42.63 ±0.45 / 43.37 ms │        42.15 / 43.01 ±0.74 / 44.10 ms │     no change │
│ QQuery 95 │        91.09 / 91.95 ±0.72 / 93.02 ms │        92.95 / 93.70 ±0.51 / 94.19 ms │     no change │
│ QQuery 96 │        25.62 / 25.81 ±0.13 / 25.94 ms │        25.27 / 25.68 ±0.31 / 26.18 ms │     no change │
│ QQuery 97 │        48.19 / 49.05 ±0.78 / 50.37 ms │        48.75 / 49.22 ±0.31 / 49.68 ms │     no change │
│ QQuery 98 │        44.16 / 44.72 ±0.39 / 45.16 ms │        44.17 / 45.28 ±0.74 / 46.37 ms │     no change │
│ QQuery 99 │        72.41 / 73.55 ±1.25 / 75.89 ms │        71.23 / 71.83 ±0.39 / 72.45 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                           ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                           │ 11275.90ms │
│ Total Time (virtual-columns-table-schema)   │ 11351.25ms │
│ Average Time (HEAD)                         │   115.06ms │
│ Average Time (virtual-columns-table-schema) │   115.83ms │
│ Queries Faster                              │          1 │
│ Queries Slower                              │          5 │
│ Queries with No Change                      │         92 │
│ Queries with Failure                        │          1 │
└─────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	60.0s
Peak memory	6.9 GiB
Avg memory	6.2 GiB
CPU user	258.3s
CPU sys	6.7s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	60.0s
Peak memory	6.7 GiB
Avg memory	6.0 GiB
CPU user	261.9s
CPU sys	7.2s
Peak spill	0 B

File an issue against this benchmark runner

…to virtual-columns-table-schema

adriangb

I think this looks good. @mbutrovich do you intend to get this into 54? I think there's something to be said for waiting until 54 goes out at this point so we can do the rest of the work wiring up so we can derisk the design as a whole.

mbutrovich · 2026-05-14T18:48:09Z

I think this looks good. @mbutrovich do you intend to get this into 54? I think there's something to be said for waiting until 54 goes out at this point so we can do the rest of the work wiring up so we can derisk the design as a whole.

I'd like it in 54 if we think the API at this layer is stable, but I see your argument that if the API needs a tweak when we go to hook everything up that we hit API stability challenges. I am okay to defer, but also was not planning to do the work to hook it up to the front-end any time soon, so it becomes an indefinite merge/maybe not completely wired in 55 either.

adriangb · 2026-05-14T19:13:05Z

Gotcha. If you're okay deferring until 54 (which should just be a week or two) I think that'd make me feel more comfortable taking the risk. We don't have feature freezes officially but I think it's a good general approach to take. I asked in #20135 (comment) if anyone can drive the rest of this but I'd say once 54 is out we can merge this regardless. Thanks for working on this it's been quite the effort!

mbutrovich · 2026-05-14T19:40:54Z

No worries. This isn't urgently needed in Comet, it's just on the list of Spark gaps we want to close. Thanks for your help thus far!

adriangb · 2026-05-19T16:23:58Z

+        if self.virtual_columns.is_empty() {
+            self.virtual_columns = Arc::new(virtual_columns);
+        } else {
+            let existing = Arc::get_mut(&mut self.virtual_columns).expect(


I think this will panic if:

You make a TableSchema

Call with_virtual_columns to add some virtual columns

clone the TableSchema

Call with_virtual_columns again on one of the owned clones

Now with_table_partition_cols has the same bug so maybe it's okay, but I do think it's an unsafe contract. It also seems like the solution is relatively simple: use Arc::make_mut. I might make a PR for
with_table_partition_cols

Opened #22372 to fix with_table_partition_cols with Arc::make_mut (off main, so it can land independently of this PR).

adriangb · 2026-05-21T17:56:01Z

@mbutrovich I think we are ready to merge this. Sorry for the conflicts, we've been doing some cleanup / refactoring in opener.rs

mbutrovich · 2026-05-21T18:36:17Z

@mbutrovich I think we are ready to merge this. Sorry for the conflicts, we've been doing some cleanup / refactoring in opener.rs

I'll get it cleaned up by end of week, thanks for the reminder!

…F + RowNumber Reworks the Load sink path in three coupled directions: 1. **Streaming `LoadExec` -- unordered concurrent merger.** Replace the serial "open one file, drain it, open next" walker with a stream combinator chain: per-row `extract_row_inputs` -> per-row open future -> `futures::stream::buffer_unordered(N)` -> `try_flatten`. Concurrency cap N = `target_partitions` clamped [1, 64] (default 8). Output ordering across files is unspecified by design; intra-file batch order is preserved by sequential drain. Limit applied at the flattened output by slicing + early termination. Cancellation propagates via the standard stream-drop chain. 2. **Compile-time eager dispatch in `register_load_relation`.** When the compiled upstream `LogicalPlan` is bare `LogicalPlan::Values` and the sink has no DV, register a new `EagerLoadTableProvider` (custom `TableProvider`) instead of `LoadTableProvider`. The provider holds a pre-built `Vec<PartitionedFile>` with per-file `partition_values` populated from the row's passthrough literals -- same broadcast mechanism the streaming path uses. `scan()` returns a single `DataSourceExec` over a `FileGroup`, so DataFusion's native multi-partition fan-out, projection / limit pushdown, and `repartitioned()` apply for free. Anything else (non-Values upstream or DV present) falls through to the streaming path. 3. **DV applied via a `not_in_dv` ScalarUDF over the parquet `_row_number` virtual column.** Each per-file open future: - Resolves the kernel-side DV via `tokio::task::spawn_blocking(|| descriptor.read(...))` -- returns a `RoaringTreemap` (deleted row IDs) cheaply Arc-cloned into the UDF's closure. - Builds a per-file `DataSourceExec -> FilterExec(not_in_dv(_row_number)) -> ProjectionExec(drop _row_number)` stack via `build_per_file_plan`. The opener's `with_virtual_columns` (via `TableSchema::with_virtual_columns` from apache/datafusion#22026) injects `_row_number` into emitted batches so the FilterExec predicate can reference it; the trailing ProjectionExec drops it before the output stream sees it. - Reject `LoadSink` with `FileType::Json` AND `dv_ref.is_some()` at `LoadExec::new`: JSON has no row-number virtual column, and DVs only apply to Delta parquet data files in practice. Critical fix in `FieldIdPhysicalExprAdapter::rewrite_column`: virtual columns are now reindexed to their position in the **physical** file schema (`physical_file_schema.fields().len()` for a single virtual) rather than passing the original index unchanged. The original index came from the LoadExec's projected `TableSchema` (= file ++ partition ++ virtual), which doesn't match either `logical_for_rewrite` (file ++ virtual, no partition) or `physical_for_rewrite` (physical_file ++ virtual). Schema evolution can also make logical and physical schemas diverge in length; reindexing against the physical schema's actual length lands the virtual at the correct position regardless. Other touches: - **Phase 3 (sync I/O drop)**: `file_size_for_row` no longer falls back to `std::fs::metadata` when the size column is unset/null. Sizes get resolved per-file via async object-store HEAD inside `build_per_file_plan` (`resolve_size_if_unknown`), avoiding a sync call inside an async future. - **Phase 4 (projection-aware passthrough)**: precompute `projected_passthrough: Arc<Vec<usize>>` once at `LoadExec::new`; iterate that in `extract_row_inputs` instead of all of `sink.passthrough_columns`. - **Phase 6 (factor helpers)**: shared primitives -- `RowInputs`, `extract_row_inputs`, `build_file_source`, `into_partitioned_file`, `adapter_factory_for`, `strip_field_metadata_recursive`, `make_not_in_dv_udf`, `resolve_dv_async`, `resolve_size_if_unknown`, `build_per_file_plan` -- live in `load_helpers.rs` and feed both the streaming `LoadExec` and the eager `EagerLoadTableProvider`. Workspace patches: point `datafusion-*` at our local datafusion-fork on the `pr-22026` branch (open + approved by adriangb, 2026-05-14), and `datafusion-functions-json` at a local fork carrying the post-50.0 API-drift fix (drop redundant `as_any` impls, rename `Cast.data_type` -> `Cast.field`). Adds `roaring = "0.11.2"` and `tokio` direct deps. Tests: scan_correctness gains a `scan_with_row_index_and_utf8_column` repro that locks in the virtual-column + Utf8 + supplied_schema interaction (was a latent regression triggered only by acceptance fixtures). Acceptance: 3457 / 3457 pass (was 71 failures before this work). Three previously-expected-fail entries (`cdc_schema_evolution_read_all`, `cdf_with_schema_evolution_read_all`, `cm_id_matching_swapped_select_a_reads_e`) now pass via the eager / field-id-aware path and move into `FIXED_IN_DATAFUSION`.

…Arc (apache#22372) ## Which issue does this PR close? - No separate issue. This addresses a review observation from apache#22026: apache#22026 (comment) ## Rationale for this change `TableSchema::with_table_partition_cols` appended to an existing partition-column list via `Arc::get_mut(...).expect(...)`. The `expect` message assumed that owning `self` implies sole ownership of the inner `Arc<Vec<FieldRef>>` — but that is not true. `TableSchema` derives `Clone`, and cloning only bumps the `Arc` refcount without copying the `Vec`. So this sequence panicked: ```rust let ts = TableSchema::new(file_schema, vec![some_partition_col]); let cloned = ts.clone(); // Arc refcount is now 2 let _ = cloned.with_table_partition_cols(more); // Arc::get_mut -> None -> expect() panics ``` `with_table_partition_cols` taking `mut self` gives unique ownership of the *struct*, not of the inner `Arc`. ## What changes are included in this PR? - Make `with_table_partition_cols` **replace** the partition columns instead of appending to them, by assigning a fresh `Arc::new(partition_cols)`. This removes the in-place mutation branch entirely: - It never mutates the inner `Vec`, so it is safe even when the `Arc` is shared with a clone (copy-on-write isolation is automatic) — fixing the panic without needing `Arc::make_mut`. - It matches builder-API expectations (a `with_x` setter replaces) and removes the risk of accidentally duplicating partition columns, as raised in review. - No production code relied on the append behavior (every `TableSchema` is built via `new`/`from_file_schema`); only unit tests exercised it, and they are updated to assert replacement. ## Are these changes tested? Yes: - `test_with_table_partition_cols_replaces_existing` verifies that calling the method on a `TableSchema` that already has partition columns replaces them rather than appending. - `test_with_table_partition_cols_after_clone_does_not_panic` clones a `TableSchema` and sets partition columns on the clone, verifying it does not panic and that the other clone is left unmodified (copy-on-write isolation). Existing `TableSchema` tests continue to pass. ## Are there any user-facing changes? `TableSchema::with_table_partition_cols` now replaces existing partition columns instead of appending to them. The previous append path panicked on any shared/cloned `TableSchema`, so no working usage relied on it. There are no API signature changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Daniël Heres <danielheres@gmail.com>

alamb · 2026-05-26T17:53:50Z

I was excited to try this feature out -- but I couldn't figure out how to query the virtual columns from SQL -- is that possible?

Or does this sentence mean you plan to do it as a follow on PR

UX / SQL-layer surface for virtual columns stays deferred to the epic in #20135 — this follows the same framing alamb blessed for #20071 (the input_file_name() UDF).

I personally prefer having APIs in the code that we are sure can work / are documented with examples so we know they are adequate to actually support the usecase

I wonder if we should try and hook up the virtual columns in SQL as a draft PR to be sure this API is good enough 🤔

alamb · 2026-05-26T17:56:48Z

Actually, it seems like @AdamGS plans to make such a PR here: #20135 (comment)

adriangb · 2026-05-26T17:58:01Z

Yes the plan was to wire it up in a followup, ref discussion following #20135 (comment). @AdamGS is working on it. TLDR: it may not be trivial to hook up and get all cases working, but this low level implementation seems like it would likely work for any eventual front end implementation and it unblocks @mbutrovich

…pache#22496) ## Which issue does this PR close? - No separate issue. Follows up on apache#22372 (panic fix in `TableSchema::with_table_partition_cols`) and the API discussion it spawned, and is informed by apache#22026 (which adds a third column group, virtual columns, to `TableSchema`). ## Rationale for this change `TableSchema` has one required input (the file schema) and a growing set of *optional* column groups: partition columns today, virtual columns in apache#22026. The current API expresses this awkwardly: - `new(file_schema, partition_cols)` privileges partition columns with a positional slot while virtual columns only get a builder method — an asymmetry that grows with every new column kind. - `TableSchema` eagerly recomputes and caches the concatenated table schema on *every* incremental setter call, so `from_file_schema(s).with_table_partition_cols(p)` rebuilds it twice (three times once virtual columns are added). This is exactly why `new()`'s docs told callers to avoid the builder-style chain. - The setter mutated an inner `Arc<Vec<FieldRef>>` in place, which is what caused the shared-`Arc` panic fixed in apache#22372. A dedicated builder addresses all three, and mirrors the existing `FileScanConfigBuilder` (the type that *owns* a `TableSchema`). ## What changes are included in this PR? - **`TableSchemaBuilder`**: `new(file_schema)` → `.with_table_partition_cols(impl Into<Fields>)` → `.build()`. The concatenated table schema is computed exactly **once**, in `build()`. The setter takes `impl Into<Fields>`, so an existing schema's `Fields` is accepted zero-copy. - **Partition columns are now stored as `arrow::datatypes::Fields`** (an immutable `Arc<[FieldRef]>`) instead of `Arc<Vec<FieldRef>>`: one fewer indirection, shareable zero-copy, and — being immutable — the shared-`Arc` mutation panic is structurally impossible. - **`TableSchema::table_partition_cols()` and the delegating `FileScanConfig::table_partition_cols()` now return `&Fields`.** `Fields` derefs to `&[FieldRef]`, so iteration/indexing/`len`/`is_empty` are unchanged; only the arrow `FileFormat` path needed `.to_vec()`. - **`TableSchema::with_table_partition_cols` is deprecated** in favor of the builder. It now **replaces** rather than appends. (Note: `main` currently *appends* here — the replace change in apache#22372 was not captured by that PR's squash merge — so this also restores the intended replace semantics.) - `new` / `from_file_schema` are kept as conveniences that route through the builder. - Documented in the 54.0.0 upgrade guide. This intentionally leaves virtual columns out; apache#22026 should extend the builder with `with_virtual_columns` once it lands. ## Are these changes tested? Yes. New unit tests cover building with partition columns, replace-on-repeat, zero-copy `Fields` input, and the deprecated setter's behavior; existing `TableSchema` / `FileScanConfig` tests and doctests pass. `cargo clippy --all-targets -- -D warnings` is clean across the datasource/proto/arrow/parquet/catalog-listing crates. ## Are there any user-facing changes? Yes — please apply the `api change` label: - `TableSchema::table_partition_cols()` / `FileScanConfig::table_partition_cols()` return `&Fields` instead of `&Vec<FieldRef>` (source-compatible for most uses via `Deref`). - `TableSchema::with_table_partition_cols` is deprecated (use the builder) and now replaces rather than appends. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

mbutrovich added 2 commits May 5, 2026 13:21

Plumb Parquet virtual columns (e.g., row_number) through TableSchema …

09d32e9

…and ParquetOpener, gated behind a tested-only extension-type allowlist, to unblock Comet's native-DataFusion support for Spark's _tmp_metadata_row_index.

Cleanup.

f54b003

github-actions Bot added the datasource Changes to the datasource crate label May 5, 2026

Merge branch 'main' into virtual-columns-table-schema

62882a6

mbutrovich requested review from adriangb, andygrove and comphead and removed request for adriangb May 5, 2026 18:14

mbutrovich mentioned this pull request May 5, 2026

[native_datafusion] Add support for reading row index metadata columns apache/datafusion-comet#3432

Open

Fix cargo docs.

59fc97f

adriangb reviewed May 5, 2026

View reviewed changes

Comment thread datafusion/datasource-parquet/src/opener.rs

Comment thread datafusion/datasource/src/table_schema.rs

Comment thread datafusion/datasource/src/table_schema.rs

asolimando mentioned this pull request May 5, 2026

feat: pushdown OFFSET to parquet for RG-level skipping #21828

Draft

mbutrovich mentioned this pull request May 5, 2026

Add with_virtual_columns to ParquetSource for reading virtual columns #20132

Open

mbutrovich added 3 commits May 5, 2026 16:24

Address PR feedback.

8d455c5

Address PR feedback.

dbb8f3b

Address PR feedback.

bd513ec

mbutrovich requested a review from adriangb May 5, 2026 20:52

Merge branch 'main' into virtual-columns-table-schema

35b5714

mbutrovich mentioned this pull request May 12, 2026

chore: native_datafusion use try_pushdown_filters apache/datafusion-comet#4299

Merged

mbutrovich and others added 3 commits May 12, 2026 14:37

Only validate predicate pushdown on virtual columns in debug mode.

f15709b

Clean up docs.

5c45dd0

Merge branch 'main' into virtual-columns-table-schema

3d8c29f

mbutrovich requested a review from adriangb May 12, 2026 18:45

mbutrovich added 3 commits May 13, 2026 11:05

Fix cargo doc

c561124

Merge remote-tracking branch 'origin/virtual-columns-table-schema' in…

12858b1

…to virtual-columns-table-schema

Fix cargo doc

bf6ea39

adriangb reviewed May 13, 2026

View reviewed changes

Comment thread datafusion/datasource-parquet/src/opener.rs Outdated

Address more PR feedback.

20e1a1b

mbutrovich moved this from Todo to In progress in Comet Development May 13, 2026

mbutrovich requested a review from adriangb May 14, 2026 18:09

adriangb approved these changes May 14, 2026

View reviewed changes

adriangb mentioned this pull request May 14, 2026

[EPIC] A collection of support for metadata columns in ListingTable #20135

Open

6 tasks

adriangb reviewed May 19, 2026

View reviewed changes

adriangb mentioned this pull request May 19, 2026

fix: avoid panic in TableSchema::with_table_partition_cols on shared Arc #22372

Merged

adriangb mentioned this pull request May 24, 2026

feat: add TableSchemaBuilder and store partition columns as Fields #22496

Merged

alamb mentioned this pull request May 26, 2026

Release DataFusion 55.0.0 (Jul 2026) #22393

Open

18 tasks

Conversation

mbutrovich commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adriangb commented May 5, 2026

Uh oh!

mbutrovich commented May 5, 2026

Uh oh!

mbutrovich commented May 5, 2026

Uh oh!

adriangb commented May 5, 2026

Uh oh!

mbutrovich commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented May 5, 2026

Uh oh!

adriangb commented May 5, 2026

Uh oh!

comphead commented May 5, 2026

Uh oh!

comphead commented May 5, 2026

Uh oh!

adriangbot commented May 5, 2026

Uh oh!

adriangbot commented May 5, 2026

Uh oh!

adriangbot commented May 5, 2026

Uh oh!

adriangbot commented May 5, 2026

Uh oh!

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented May 14, 2026

Uh oh!

mbutrovich commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb May 19, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb May 19, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb commented May 21, 2026

Uh oh!

mbutrovich commented May 21, 2026

Uh oh!

alamb commented May 26, 2026

Uh oh!

alamb commented May 26, 2026

Uh oh!

adriangb commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mbutrovich commented May 5, 2026 •

edited

Loading

mbutrovich commented May 5, 2026 •

edited

Loading

mbutrovich commented May 14, 2026 •

edited

Loading

mbutrovich commented May 14, 2026 •

edited

Loading