chore: update datafusion and related crates#1635
Conversation
this is to fix a bug reported in DF 51.0.0 apache/datafusion#19086 this impacts approx percentile queries
WalkthroughThe PR upgrades Arrow/DataFusion dependencies (arrow to 58.0.0, datafusion to 53.0.0, object_store to 0.13.1) and adapts the codebase to breaking API changes, including new trait imports, updated parquet writer APIs, refactored metrics instrumentation, and adjusted statistics configuration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Cargo.toml (1)
12-29: ⚡ Quick winAdd a regression test for the reported
approx_percentile_cont(..., 0.999)panic.This PR’s user-visible fix is entirely dependency-driven, so without a targeted query test there’s nothing here that proves the panic is gone or keeps it from coming back on the next crate bump. A small end-to-end query test around the exact percentile value from the issue would make this upgrade much safer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Cargo.toml` around lines 12 - 29, Add an end-to-end regression test that runs a SQL query invoking approx_percentile_cont(..., 0.999) to ensure the panic no longer occurs; create a new test (e.g., tests/regress_percentile.rs or in an existing query integration test module) that builds a small table of numeric values, executes a query using approx_percentile_cont(column, 0.999), asserts the query completes without panicking and verifies the returned percentile value (or at least that the result is Some/NotNull); ensure the test runs with the current dependency set (DataFusion/arrow/parquet) and is included in CI so future crate bumps will exercise approx_percentile_cont at the 0.999 edge case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Cargo.toml`:
- Around line 12-29: Add an end-to-end regression test that runs a SQL query
invoking approx_percentile_cont(..., 0.999) to ensure the panic no longer
occurs; create a new test (e.g., tests/regress_percentile.rs or in an existing
query integration test module) that builds a small table of numeric values,
executes a query using approx_percentile_cont(column, 0.999), asserts the query
completes without panicking and verifies the returned percentile value (or at
least that the result is Some/NotNull); ensure the test runs with the current
dependency set (DataFusion/arrow/parquet) and is included in CI so future crate
bumps will exercise approx_percentile_cont at the 0.999 edge case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4edfca8c-fb35-4775-a4b4-2a04936c28e8
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Cargo.tomlsrc/hottier.rssrc/parseable/streams.rssrc/query/stream_schema_provider.rssrc/storage/azure_blob.rssrc/storage/gcs.rssrc/storage/metrics_layer.rssrc/storage/s3.rs
this is to fix a bug reported in DF 51.0.0
apache/datafusion#19086
this impacts approx percentile queries approx_percentile_cont(..., 0.999) triggers the panic
Summary by CodeRabbit
Release Notes
Chores
Refactor