Skip to content

Upgrade to DataFusion 15.0.0#949

Merged
ayushdg merged 10 commits into
dask-contrib:mainfrom
andygrove:datafusion-15.0.0
Jan 5, 2023
Merged

Upgrade to DataFusion 15.0.0#949
ayushdg merged 10 commits into
dask-contrib:mainfrom
andygrove:datafusion-15.0.0

Conversation

@andygrove

@andygrove andygrove commented Dec 1, 2022

Copy link
Copy Markdown
Contributor

DataFusion 15.0.0 will be released in the next few days. This PR is in preparation for upgrading.


#[pyo3(name = "getDecimal128Value")]
pub fn decimal_128_value(&mut self) -> PyResult<(Option<i128>, u8, u8)> {
pub fn decimal_128_value(&mut self) -> PyResult<(Option<i128>, u8, i8)> {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal scale is now a signed value

@codecov-commenter

codecov-commenter commented Dec 1, 2022

Copy link
Copy Markdown

Codecov Report

Merging #949 (eb4037c) into main (747d8ab) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #949      +/-   ##
==========================================
+ Coverage   77.67%   77.80%   +0.13%     
==========================================
  Files          75       75              
  Lines        4215     4218       +3     
  Branches      765      766       +1     
==========================================
+ Hits         3274     3282       +8     
+ Misses        775      766       -9     
- Partials      166      170       +4     
Impacted Files Coverage Δ
dask_sql/physical/rel/logical/subquery_alias.py 76.92% <100.00%> (+6.92%) ⬆️
dask_sql/_version.py 35.31% <0.00%> (+1.41%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines +129 to +130
SubqueryAlias: __sq_1
Projection: AVG(test.col_int32) AS __value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projections no longer have aliases, but use the generic SubqueryAlias operator instead

@andygrove andygrove marked this pull request as ready for review December 5, 2022 18:42
@ayushdg

ayushdg commented Dec 6, 2022

Copy link
Copy Markdown
Collaborator

rerun tests

Comment on lines -19 to +36
return dc

cc = dc.column_container

alias = rel.subquery_alias().getAlias()

return DataContainer(
dc.df,
cc.rename(
{
col: renamed_col
for col, renamed_col in zip(
cc.columns,
(f"{alias}.{col.split('.')[-1]}" for col in cc.columns),
)
}
),
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this DataFusion bump, we now optimize otherwise identical projections with different aliases into a single projection followed by the equivalent SubqueryAlias, e.g. turning

Projection: customer_total_return.ctr_store_sk, customer_total_return.ctr_total_return, alias=ctr2
  Projection: store_returns.sr_store_sk AS ctr_store_sk, SUM(store_returns.sr_fee) AS ctr_total_return, alias=customer_total_return

into

SubqueryAlias: ctr2
  SubqueryAlias: customer_total_return
    Projection: store_returns.sr_store_sk AS ctr_store_sk, SUM(store_returns.sr_fee) AS ctr_total_return

This exposed the fact that our SubqueryAlias plugin was a no-op that didn't actually do the required aliasing. These modifications should unblock failures stemming from this optimization.

@jdye64 jdye64 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking @charlesbluca comments about SubqueryAlias into account these changes LGTM

@ayushdg ayushdg left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since this pr was opened, but the changes lgtm!
I'm holding off on merging to run some benchmarks internally but should be good to merge in a day or so

@ayushdg ayushdg merged commit 767acf2 into dask-contrib:main Jan 5, 2023
@andygrove andygrove deleted the datafusion-15.0.0 branch January 5, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants