Skip to content

Add support for filter pushdown rule#924

Merged
ayushdg merged 21 commits into
dask-contrib:mainfrom
ayushdg:support-filter-pushdown-rule
Dec 1, 2022
Merged

Add support for filter pushdown rule#924
ayushdg merged 21 commits into
dask-contrib:mainfrom
ayushdg:support-filter-pushdown-rule

Conversation

@ayushdg

@ayushdg ayushdg commented Nov 16, 2022

Copy link
Copy Markdown
Collaborator

This PR enables using the filter_pushdown_rule from datafusion and allows passing filters down to the tablescan operation.

@codecov-commenter

codecov-commenter commented Nov 18, 2022

Copy link
Copy Markdown

Codecov Report

Merging #924 (acdab19) into main (6ae69a8) will increase coverage by 0.22%.
The diff coverage is 100.00%.

❗ Current head acdab19 differs from pull request most recent head 47eee60. Consider uploading reports for the commit 47eee60 to get more accurate results

@@            Coverage Diff             @@
##             main     #924      +/-   ##
==========================================
+ Coverage   75.81%   76.04%   +0.22%     
==========================================
  Files          73       73              
  Lines        4065     4082      +17     
  Branches      737      739       +2     
==========================================
+ Hits         3082     3104      +22     
+ Misses        823      814       -9     
- Partials      160      164       +4     
Impacted Files Coverage Δ
dask_sql/physical/rel/logical/table_scan.py 88.63% <100.00%> (+7.15%) ⬆️
dask_sql/_version.py 35.31% <0.00%> (+1.41%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines +160 to +165
[
[("a", "==", 1), ("b", "<", 10)],
[("a", "==", 1), ("b", ">", 5)],
[("b", ">", 5), ("b", "<", 10)],
[("a", "==", 1)],
],

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version of datafusion applies some conversions that convert filters to a cnf like format. So the dnf here
(b > 5 AND b < 10) OR a = 1 gets remapped to
(b>5 or a=1) and (b<10 or a=1) by datafusion which becomes this is dnf
(b>5 and a=1) or (b<10 and a=1) or (b>5 and b<10) or (a=1) .

I don't know if there's an easy way to simplify these predicate expressions but some cases might lead to larger number of redundant filters because of this.

Comment thread dask_planner/src/sql/table.rs Outdated
Comment thread dask_planner/src/sql/table.rs
Comment thread dask_planner/src/sql/table.rs
sarahyurick added a commit to sarahyurick/dask-sql that referenced this pull request Nov 30, 2022
14,
16,
18,
21,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into both these queries, for both it seems like they previously failed with metadata inference errors trying to do a comparison between a datetime64 and string object.

I'll open up another issue to investigate but in the meanwhile pushing those filters down to the io seems to fix the queries

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into this before it seemed like the underlying issue was that we were at some point casting a column to Date32 and trying to compare to a Date32 scalar, but their dtypes didn't match up - not sure if that's helpful to make a simple reproducer for this

@ayushdg

ayushdg commented Dec 1, 2022

Copy link
Copy Markdown
Collaborator Author

rerun tests

@ayushdg ayushdg merged commit cf719c5 into dask-contrib:main Dec 1, 2022
@ayushdg ayushdg deleted the support-filter-pushdown-rule branch February 6, 2023 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants