fix(upsert): skip expired metadata during upsert instead of comparing against it#18733
Open
tarun11Mavani wants to merge 1 commit into
Open
fix(upsert): skip expired metadata during upsert instead of comparing against it#18733tarun11Mavani wants to merge 1 commit into
tarun11Mavani wants to merge 1 commit into
Conversation
… against it When upsertMetadataTTL is configured, expired metadata entries were only cleaned up lazily at segment commit time. Records arriving after TTL expiry were still compared against stale metadata and could be rejected as out-of-order. This makes behavior inconsistent — the output depends on when cleanup last ran. Add an inline TTL expiry check in doAddRecord() so that expired metadata entries are treated as if they don't exist. The new record always wins regardless of its comparison value, and the old doc is properly invalidated via replaceDocId. Also guard the partial upsert merge path in doUpdateRecord() to prevent merging stale data from expired records. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18733 +/- ##
============================================
+ Coverage 56.80% 64.55% +7.74%
- Complexity 7 1305 +1298
============================================
Files 2580 3380 +800
Lines 149652 209651 +59999
Branches 24180 32779 +8599
============================================
+ Hits 85010 135336 +50326
- Misses 57444 63458 +6014
- Partials 7198 10857 +3659
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
upsertMetadataTTLis configured, expired metadata entries are only cleaned up at segment commit time (which can be hours apart for low-throughput tables). Records arriving after TTL expiry are still compared against stale metadata — a record with a lower comparison value gets rejected as out-of-order even though the existing entry is expired and should be irrelevant.Fix
Add an inline TTL expiry check in
doAddRecord(). When the existing record's comparison value is below the TTL threshold (largestSeenComparisonValue - metadataTTL), skip the comparison entirely:replaceDocId(no duplicate queryable rows)RecordLocationAlso guard the partial upsert merge in
doUpdateRecord()— skip reading the previous row when it's expired, preventing stale data from being merged into the new record.Changes
ConcurrentMapPartitionUpsertMetadataManager.doAddRecord()— TTL check before comparisonConcurrentMapPartitionUpsertMetadataManager.doUpdateRecord()— TTL guard on partial upsert mergeNot changed
ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes—metadataTTLandenableDeletedKeysCompactionConsistencyare mutually exclusive, so the check can never fire thereremoveExpiredPrimaryKeys()— lazy cleanup at segment commit continues as-is