[opt](memory) avoid retaining full segment key bounds buffers#63968
Merged
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve? Issue Number: None Related PR: apache#63469 Problem Summary: Merge latest apache/doris master into PR apache#63968 and resolve the beta_rowset_writer.cpp conflict by preserving the PR change that avoids copy-then-resize for segment key bounds while keeping the upstream truncation marker behavior. ### Release note None ### Check List (For Author) - Test: Manual test - git diff --check - build-support/check-format.sh be/src/storage/rowset/beta_rowset_writer.cpp - Behavior changed: No - Does this need documentation: No
Contributor
Author
|
run buildall |
Contributor
|
/review |
Contributor
There was a problem hiding this comment.
Review result: no blocking issues found in this PR.
Critical checkpoint conclusions:
- Goal and proof: The change avoids retaining full key-bound buffers in the rowset writer by assigning truncated strings directly into stored segment statistics. Existing segment key-bound truncation tests cover the functional metadata behavior, but I did not find a dedicated memory-retention regression test.
- Scope and clarity: The modification is small and focused on the two segment-statistics insertion paths in
beta_rowset_writer.cpp. - Concurrency: The changed code builds the stored statistics before taking
_segid_statistics_map_mutex, and preserves the existing mutex protection when updating_segid_statistics_mapand_segments_key_bounds_truncated. No new lock-order or heavy-under-lock issue was introduced. - Lifecycle/static initialization: No new static/global lifecycle dependency was introduced.
- Configuration: No new config item was added. Existing truncation config behavior is preserved, including the random-test path falling back to
RowsetMeta::set_segments_key_boundstruncation. - Compatibility: No storage format, thrift/protobuf schema, or function-symbol compatibility change was introduced.
- Parallel paths: Both normal
add_segmentand segcompactionflush_segment_writer_for_segcompactionpaths in this writer were updated. The load-stream writer path only forwards segment stats to destination replicas and does not retain them locally in the same map. - Conditional checks: Existing truncation threshold semantics are preserved.
- Test coverage: No new tests were added. Existing key-bound truncation tests cover visible behavior; this optimization mainly targets retained protobuf string capacity.
- Test results: I ran
git diff --checksuccessfully.build-support/check-format.sh be/src/storage/rowset/beta_rowset_writer.cppcould not run in this runner because clang-format 16 is unavailable. - Observability: No new observability is needed for this narrow memory optimization.
- Transactions/persistence/data writes: The persisted key-bound values and truncation marker semantics remain unchanged; no transaction or delete-bitmap path behavior appears changed.
- FE/BE variable passing: Not applicable.
- Performance: The change removes the full-copy-then-resize pattern on the retained statistics path and keeps the existing rowset-meta truncation behavior, so it addresses the intended memory-retention issue without adding hot-path scans.
User focus: no additional user-provided review focus was present.
Contributor
TPC-H: Total hot run time: 29446 ms |
Contributor
TPC-DS: Total hot run time: 170723 ms |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
gavinchou
approved these changes
Jun 4, 2026
This was referenced Jun 9, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Related PR: #63469
Problem Summary:
#63469truncates segment key bounds before storing segment statistics, but the current implementation first copies the fullKeyBoundsPBand then callsresize()on the protobuf string fields.For very long keys,
resize()reduces the visible string size but may keep the original large string capacity. After the truncatedSegmentStatisticsis moved into_segid_statistics_map, the rowset writer can still retain buffers sized for the original full key bounds.This PR changes the write path to build the stored
SegmentStatisticswith freshly assigned truncated key bound strings, avoiding the full-copy-then-resize pattern. The segcompaction segment stats path is updated in the same way.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)