feat: Add bias forRightSemi, RightAnti, RightMark join orientation#22957
Open
neilconway wants to merge 3 commits into
Open
feat: Add bias forRightSemi, RightAnti, RightMark join orientation#22957neilconway wants to merge 3 commits into
RightSemi, RightAnti, RightMark join orientation#22957neilconway wants to merge 3 commits into
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
RightSemioverLeftSemi#22931Rationale for this change
To evaluate a semi join, we support two orientations:
LeftSemiorRightSemi(analogously for anti and mark joins; I'll just refer to semijoins here to simplify the discussion). UnderRightSemi, we build the non-preserved ("filter") input and stream the preserved input; we do the inverse forLeftSemi. There are significant differences in evaluation behavior between these two orientations:RightSemionly needs to store the join keys for the build side;LeftSemineeds to store wider rows. By definition, the consumer of a semijoin can't be interested in any values from the filter side of the join. So even if the filter side has more rows than the preserved side, building the hash table on the filter side might still require less memory.RightSemipreserves the partitioning of the preserved input, whereasLeftSemi+CollectLeftemits withUnknownPartitioning.RightSemiworks better with dynamic filter pushdown: I don't know the dynamic filter code super well, but I'd imagine that sinceRightSemibuilds the filter side before streaming the preserved side, that gives us more information we can use to push down filters into the preserved-side scan.The current optimizer rules don't reflect this:
LeftSemiandRightSemiare considered symmetrically; whichever semijoin input is predicted to be smaller is placed on the build sideLeftSemiis the default orientationThis PR revises these rules as follows:
RightSemioverLeftSemi, unless the filter side is twice as large as the preserved side (configurable viasemi_join_swap_biasconfiguration variable)RightSemiWhat changes are included in this PR?
semi_join_swap_biasconfiguration variableAre these changes tested?
Yes.
Are there any user-facing changes?
Changes in plans for user queries.