Flink: Add equality delete conversion API and integration tests#16948
Open
mxm wants to merge 4 commits into
Open
Flink: Add equality delete conversion API and integration tests#16948mxm wants to merge 4 commits into
mxm wants to merge 4 commits into
Conversation
Flink's upsert mode writes equality deletes, which are cheap to produce but
force every reader to join the data against the delete files (merge-on-read).
ConvertEqualityDeletes is a maintenance task that rewrites them into deletion
vectors on a target branch, so reads apply deletes by position. The writer keeps
appending equality deletes to a staging branch while the conversion runs in the
background.
Example:
```
TableMaintenance.forTable(env, tableLoader)
.add(
ConvertEqualityDeletes.builder()
.stagingBranch("staging")
.targetBranch("main")
.equalityFieldColumns(ImmutableList.of("id"))
.scheduleOnEqDeleteFileCount(1)
.append();
```
The table must be V3 for deletion vectors, the equality field columns must match
the writer's, and every partition column must be an equality field, because the
converter keys rows on equality values alone and would otherwise resolve a
delete against the wrong partition. The conversion task has checks in place to
ensure this.
pvary
reviewed
Jun 24, 2026
wombatu-kun
reviewed
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Flink's upsert mode writes equality deletes, which are cheap to produce but force every reader to join the data against the delete files (merge-on-read). ConvertEqualityDeletes is a maintenance task that rewrites them into deletion vectors on a target branch, so reads apply deletes by position. The writer keeps appending equality deletes to a staging branch while the conversion runs in the background.
Example:
The table must be V3 for deletion vectors, the equality field columns must match the writer's, and every partition column must be an equality field, because the converter keys rows on equality values alone and would otherwise resolve a delete against the wrong partition. The conversion task has checks in place to ensure this.
This PR is factored out of #15996.