-
Notifications
You must be signed in to change notification settings - Fork 29.3k
[SPARK-54881][SQL] Improve BooleanSimplification to handle negation of conjunction and disjunction in one pass
#53658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
24141c4
SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…
ahshahid 9813a1e
SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…
ahshahid 8adb0da
SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…
ahshahid 0b8ac8e
SPARK-54881. Fix to simplify boolean expression of form !(expr1 || ex…
ahshahid ead1da5
SPARK-54881. refactored test
ahshahid cb645ed
SPARK-54881. Implemented review feedback by refactoring the code and …
ahshahid 174ff71
SPARK-54881. fixed comment in the test and added another test case
ahshahid File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe? I mean, before this PR the simplification logic of
actualExprTransformerwas called withtransformUp..., but now you call it withtransformDown...(please note that aNotnode can be deep down inaorb). Is there any reason why we invoke the logic withtransformUpor could the whole rule usetransformDownon expression trees?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not something like
And(actualExprTransformer.applyOrElse(Not(a), identity), actualExprTransformer.applyOrElse(Not(b), identity))just to be on the safe side?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's safe..
If the original logic is modified such that instead of transform up ,
transform down is used, then this bug would be fixed, but other cases like
that mentioned in Constant folding suite will break in idempotency.
To take care of both the cases, use of transform up and transform down is
needed...as in the pr. This reason is also mentioned in the initial PR details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to split the logic into 2 traversals? Keep the current
transformExpressionsUpWithPruning()with the current cases excluding these 2Not"pushdowns" and then atransformExpressionsDownWithPruning()with these 2 cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That in my view, would defeat the purpose of achieving idempotency in a minimum possible tree traversal. If we separate it in 2 traversals, then only for a part of subtree , the whole traversal will have to happen again.
As such I do not see any issue with the current code of subtree traversal of the newly added children to cause any issue.. Is there something which is making it suspicious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides that it is hard to reason about a nested traversals, my problem with the current inner
transformDownWithPruning()is that it can callactualExprTransformertop-down way not only on the newAndandNotnodes, but also on nodes ofaandbsubtrees if those containNotnodes.The current rule might be safe in top-down manner as well, but I feel it would be a bit cleaner to separate the traversals. But, on the other hand, separating the traversals would require 2 unique rule ids so the current PR has pros as well.
Anyways, I'm ok with this PR.
@cloud-fan, do you have any concerns or comments on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that rules like BooleanSimplification would work same bottom - up, or top - down in terms of functionality, so long as number of iterations to achieve idempotency is ignored.
If one goes top -down, some cases (Not) become optimal, while if you bottom - up ( other cases like depicted in ConstantFoldinghSuite become optimal).
The point is that the subtrees in NOT (Junction) before being acted upon by top- down rule , have already undergone the traversal of bottom- up, so the top - down would act only for pushing of Not, and moreover the traversal would terminate the moment subtree has no NOT pushed.
In my mind, I am comfortable with the behaviour.