[pull] master from apache:master#31
Open
pull[bot] wants to merge 6204 commits into
Open
Conversation
Problem Summary: Iceberg data location resolution skipped the legacy object-store.path property and fell back directly to write.folder-storage.path or table location when write.data.path was not set. This could choose the wrong data location for tables that still rely on object-store.path. This change keeps write.data.path as the highest priority, then checks object-store.path before write.folder-storage.path and the default table data directory.
Problem Summary: FileScanner kept passing raw IOContext pointers to several file readers, so DelegateReader could still create a shallow-copied IOContext on the hot scan path. That left different IOContext instances inside the same reader stack and could also dereference missing child stats pointers when an IOContext existed without file reader stats. This change keeps FileScanner's IOContext in a shared holder, passes it through CSV, text, JSON, native, Parquet, ORC, and table-format reader variants, and makes Native/Parquet/ORC use the shared DelegateReader API when a holder is available. Tracing/stat updates now check the nested stats pointer before use.
…63894) Currently, the update of partitions only depends on the visible version and visible time. If a balance occurs, the version and time of the partition will not be updated, which means that the updated partition will not be retrieved from the remote FE. When executing a query, the tablet on the BE node may no longer exist, resulting in query errors. To avoid this problem, a checksum will be calculated for the partition to determine whether the partition's metadata has changed.
…63809) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: External catalog meta cache statistics exposed cumulative eviction count, but did not provide a direct replacement frequency metric for judging whether cache capacity is too small. This PR adds `EVICTION_RATE` to `information_schema.catalog_meta_cache_statistics`, calculated as `eviction_count / request_count` and returned as `0` when there are no requests. Hive partition metadata cache defaults were also too small for common external catalog workloads, causing frequent evictions without explicit tuning. This PR increases the default Hive single-partition cache capacity from 10,000 to 100,000 and the Hive partitioned-table values cache capacity from 1,000 to 10,000. While checking similar cache entries, MaxCompute `partition_values` was found to cache table-level partition value structures but use the Hive single-partition capacity; it now follows the table-level partition values capacity. ### Release note Add `EVICTION_RATE` to `information_schema.catalog_meta_cache_statistics`, increase default Hive partition meta cache capacities, and make MaxCompute `partition_values` use the table-level partition values capacity. ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [x] Unit Test - `./run-fe-ut.sh --run org.apache.doris.datasource.metacache.MetaCacheEntryTest` - `./run-fe-ut.sh --run org.apache.doris.datasource.hive.HiveMetaStoreCacheTest,org.apache.doris.datasource.maxcompute.MaxComputeExternalMetaCacheTest` - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [x] Yes. `catalog_meta_cache_statistics` includes `EVICTION_RATE`; default Hive partition meta cache capacities are larger; MaxCompute `partition_values` uses the table-level partition values capacity. - Does this need documentation? - [x] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
The scan operator unconditionally skipped VARBINARY column predicate and TopN runtime predicate pushdown. The commit that introduced the guard was for external Parquet/file scan reader predicate limitations, so applying it in the shared scan path also blocked non-file scans. This change adds a scan-operator hook for column predicate pushdown capability, keeps the default permissive, and makes FileScanOperatorX reject VARBINARY column predicates.
…3718) Problem Summary: `DataTypeVariantSerDe::write_column_to_arrow` always cast the Arrow builder to `arrow::StringBuilder`. During Parquet OUTFILE export, the Arrow block converter can switch utf8 columns to `large_utf8` when a batch is large, which gives variant serialization an `arrow::LargeStringBuilder` and crashes BE on the bad cast. This patch handles both `arrow::StringBuilder` and `arrow::LargeStringBuilder` for VARIANT Arrow serialization and adds a BE UT that reproduces the LargeStringBuilder path.
…other_func_low_ndv in agg_strategy (#64022) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
Remove the single replica compaction (SRC) feature end-to-end across BE, FE and regression tests. Doc: apache/doris-website#3870 ### Why remove it The main reason is **correctness risk in peer selection**. A follower replica had to pick a peer holding a "proper" version (`_find_rowset_to_fetch`) and fetch its compacted result, based on replica info that was only refreshed periodically. Because replicas progress through versions independently and this "leader" selection ran against a stale, time-sensitive view of the cluster, the hoice of which peer to fetch from — and which version — was racy and could select a peer whose state no longer matched, leading to subtle inconsistencies.
### What problem does this PR solve?
Issue Number: None
Problem Summary:
Remove dead helper code from BE JSON-related implementations:
- Remove the unused `ExecuteReducer` template and its `JsonParser`/path
parsing helper chain from `function_json.cpp`.
- Remove the unused `convert_jsonb_to_rapidjson` declaration/definition
after its only live dependency was removed.
- Remove the commented-out test helper that referenced the deleted
conversion helper.
- Clean up now-unused includes and make small style cleanups around the
touched code.
This is an internal cleanup only and does not change JSON function
behavior.
### Release note
None
### Check List (For Author)
- Test: Manual test
- `ninja -C be/ut_build_ASAN
src/core/CMakeFiles/Core.dir/data_type_serde/data_type_jsonb_serde.cpp.o
src/exprs/CMakeFiles/Exprs.dir/function/function_json.cpp.o
test/CMakeFiles/doris_be_test.dir/core/column/column_variant_test.cpp.o`
- `build-support/clang-format.sh`
- `build-support/check-format.sh`
- `git diff --check`
- Behavior changed: No
- Does this need documentation: No
…st (#64024) ### What problem does this PR solve? Related PR: [63506](#63506) Problem Summary: `test_auth_remote_ip` only needs to verify that Arrow Flight SQL remote IP authentication allows a matched user to run `SELECT 1`. The shared `sql_impl` helper uses `PreparedStatement`, and Arrow Flight SQL JDBC 17 can report a close-time 8-byte client allocator leak after the prepared path has already consumed the result. This changes the case to use `JdbcUtils.executeQueryToList`, which uses `createStatement().executeQuery(...)`, so the test avoids the prepared statement cleanup path without ignoring `conn.close()` exceptions.
Issue Number: #48203 Related PR: #59223 doc: apache/doris-website#3891 Problem Summary: Support function `ARRAY_CROSS_PRODUCT` ```sql Doris> SELECT CROSS_PRODUCT([1, 2, 3], [2, 3, 4]); +-------------------------------------+ | CROSS_PRODUCT([1, 2, 3], [2, 3, 4]) | +-------------------------------------+ | [-1, 2, -1] | +-------------------------------------+ 1 row in set (0.021 sec) Doris> SELECT CROSS_PRODUCT([1, 2, 3], NULL); +--------------------------------+ | CROSS_PRODUCT([1, 2, 3], NULL) | +--------------------------------+ | NULL | +--------------------------------+ 1 row in set (0.009 sec) Doris> SELECT CROSS_PRODUCT([1, NULL, 3], [1, 2, 3]); ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]function array_cross_product cannot have null Doris> SELECT CROSS_PRODUCT([1, 2, 3, 4], [1, 2, 3, 4]); ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]function array_cross_product requires both input arrays to have exactly 3 elements, got 4 and 4 ```
### What problem does this PR solve?
Problem Summary: Replace direct typeid_cast usage for Doris column type
checks with the column-specific check_and_get_column helper. This keeps
column downcast checks consistent across core column code, expression
evaluation, storage segment code, and related table reader tests without
changing behavior.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
…ure (#63565) `(int)(a.getId() - b.getId())` overflows when BE ID delta exceeds Integer.MAX_VALUE, breaking the Comparator contract and causing stream load to fail with "Comparison method violates its general contract!". Use `Long.compare` instead. Same fix applied to CloudSystemInfoService.
Deduplicate equivalent PENDING one-shot TABLE warm up jobs by destination cluster, normalized table set, and force flag. Deduplicate equivalent PENDING one-shot CLUSTER warm up jobs by source/destination cluster pair. Reuse the oldest matching pending job and return its job id instead of appending another pending duplicate. Keep RUNNING jobs out of deduplication and preserve the existing PERIODIC / EVENT_DRIVEN behavior. Add unit tests for table/cluster deduplication, replay handling, and regression coverage.
…ewOlapScanner (#61072) - decouple realtime FileCache profile updates from the local/remote-bytes branch and update once per cycle when file cache is enabled - reset file_cache_stats as a whole (`= {}`) after realtime reporting, instead of resetting only bytes_read_from_local/bytes_read_from_remote - prevent inflated FileCache profile counters caused by repeated accumulation (e.g. LockWaitTimer, CacheGetOrSetTimer, BytesWriteIntoCache)
… to avoid reserved-keyword parse failure (#63747) ### What problem does this PR solve? Problem Summary: When a routine load job uses a column name that is a SQL reserved keyword (e.g., `group`) in a PRECEDING FILTER clause, the Nereids-to-legacy expression translator sets the slot label as the raw name (e.g., `group`) without quoting. When the legacy expression SQL is later re-parsed (e.g., during routine load reparse via `NereidsLoadUtils.parseExpressionSeq`), the unquoted reserved keyword causes a parse failure, pausing the routine load job. This PR quotes the slot label using `SqlUtils.getIdentSql()` so that reserved-keyword column names are properly backtick-quoted in the translated legacy expression SQL, preventing the parse failure.
### What problem does this PR solve? Issue Number: close #63603 Problem Summary: `MysqlProto.readLenEncodedString` reads a length-encoded integer and passes it straight to `new byte[(int) length]` with no bound. The length is fully attacker-controlled (a `0xFE` lead byte carries an 8-byte value), and it is read before authentication from `MysqlAuthPacket.readFrom` (the auth-response field at `MysqlAuthPacket.java:93` and the connection-attributes loop at `MysqlAuthPacket.java:110-118`). A small handshake response can therefore request a ~2 GiB allocation, and a length with the high bit set casts to a negative size (`NegativeArraySizeException`). This PR rejects a length that is negative or larger than the bytes remaining in the buffer before allocating. A well-formed length-encoded string's payload always fits in the remaining buffer, so valid input is unaffected. One guard covers both reach paths.
…on't enable fqdn mode in fe.conf because of using dns resolve firstly but not ip directly (#62139) ### What problem does this PR solve? improve show frontends so slow issue when we don't enable fqdn mode in fe.conf because of using dns resolve firstly but not ip directly
### What problem does this PR solve? `COM_RESET_CONNECTION` was accepted by Doris, but its behavior was not compatible with MySQL. The previous implementation cleared the current catalog/database state and returned OK after only a partial reset. This could make pooled clients, such as C# MySqlConnector with `ConnectionReset=True`, fail later unqualified SQL with `Current database is not set`. Other session-scoped state, including user variables and prepared statements, also needed to be reset consistently. ### What is changed? - Preserve the current catalog/database state across `COM_RESET_CONNECTION` so pooled connections can continue using the selected database. - Reset session variables, user variables, prepared statements, running query state, insert result, command state, and returned row count. - Roll back transaction state during reset and return an error if rollback fails. - Drop temporary tables during reset and return an error if cleanup fails. - Return OK with the autocommit server status when reset succeeds. - Return the MySQL-compatible unknown prepared statement error when executing a statement cleared by reset. - Extend regression and FE unit coverage for reset behavior, error handling, and current database preservation.
Related PR: #63145 Problem Summary: This re-submits the OSS mTLS framework work from #63145 under my account and rebases it onto the latest apache/doris master. The change ports the public mTLS scaffolding, configuration, protocol startup split, certificate-auth contracts, and TLS validation tests while excluding enterprise module directories. After the rebase, the previous FE UT failures were fixed: the ALTER/CREATE USER TLS unit tests now use Mockito static mocking instead of external JMockit parameter injection, and the MetaServiceProxy success-path test now stubs the mock client as using the latest channel configuration so the proxy does not replace it before executing the request. --------- Co-authored-by: Siyang Tang <tangsiyang@selectdb.com>
…e writes (#62880) ### What problem does this PR solve? Related PR: #62578 1. PR #62578 moved MaxCompute write block ID allocation from BE-local counters to Instead of calling FE through the BE JNI C++ bridge: MaxCompute connector Java -> BE JNI C++ -> FE the MaxCompute connector now requests FE directly through thrift: MaxCompute connector Java -> FE A new MaxComputeFeClient is added under the MaxCompute connector to handle FE methods. 2. Removes the hardcoded `MAX_BLOCK_COUNT` variable from `fe/fe-core/src/main/java/org/apache/doris/datasource/maxcompute/MCTransaction.java` and moves it to the FE config `max_compute_write_max_block_count` The default value is still 20000, so the existing behavior is preserved.
### What problem does this PR solve? Problem Summary: Routine load lag was refreshed mainly when task scheduling needed to recheck latest offsets after consuming the cached end offset. If producers continued appending data while the job was running, the cached latest offsets could become stale, so the reported routine load lag was not real-time enough. This PR refreshes routine load lag cache during `RoutineLoadScheduler` rounds. The metric path still only reads in-memory state and does not call Kafka directly. For routine load jobs, the latest offset cache is refreshed for current progress partitions. Concurrent updates from job scheduling and task scheduling are handled with monotonic atomic max updates, so latest offsets do not regress. Kafka metadata requests also use snapshots of broker/topic/converted properties.
the old one is invalid
… refactoring #62306 and legacy issues (#62821) ### What problem does this PR solve? Related PR: #62306 Problem Summary: This PR fixes some issues caused by the refactoring #62306 and legacy issues: 1. For Iceberg/Paimon systems, it's necessary to pass metadata partition values for each split. Simply relying on information from files to obtain partition values is unreliable, especially for tables migrated from Hive. 2. Condition cache conflicts with CountReader and Lazy RF; see comments in `be/src/exec/scan/file_scanner.cpp` for details. 3. PR #62306 omitted handling of Iceberg name_mapping. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
### What problem does this PR solve? Related PR: #63469 Problem Summary: `#63469` truncates segment key bounds before storing segment statistics, but the current implementation first copies the full `KeyBoundsPB` and then calls `resize()` on the protobuf string fields. For very long keys, `resize()` reduces the visible string size but may keep the original large string capacity. After the truncated `SegmentStatistics` is moved into `_segid_statistics_map`, the rowset writer can still retain buffers sized for the original full key bounds. This PR changes the write path to build the stored `SegmentStatistics` with freshly assigned truncated key bound strings, avoiding the full-copy-then-resize pattern. The segcompaction segment stats path is updated in the same way.
… format (#63570) V3 layout: |data1..dataN|varuint_len1..varuint_lenN|data_block_size(u32)|num_elems(u32)| Benchmark (median of 10 reps): page pre-decode is ~1.0–3.6x faster than V2 (largest for short values), and the contiguous layout compresses ~1–11% smaller after ZSTD.
`create_texpr_literal_node<TYPE_VARBINARY>` treated the input pointer as `std::string*`, but Doris `Field` stores `TYPE_VARBINARY` values as `StringView`. When TopN predicate conversion builds a VARBINARY literal from a `Field`, the helper reinterprets a `StringView*` as a `std::string*`, which can make `std::string` assignment read a bogus size and request a huge allocation under ASAN. This PR reads VARBINARY literal input as `StringView`, copies the exact byte range into the thrift literal, and adds VARBINARY coverage for `create_texpr_node_from(Field, TYPE_VARBINARY, ...)` and `VLiteral` round trip. It also wires the `const void*` helper for `TYPE_VARBINARY`.
…e candidate (#64062) ### What problem does this PR solve? Problem Summary: Add `tools/release-tools/`, a set of helper scripts for a Release Manager (RM) to cut an Apache Doris **source** release candidate in three steps: - `01-check-env.sh` — check / prepare the GPG signing environment and ASF credentials. - `02-package-sign-upload.sh` — `git archive` the tag, GPG-sign, generate sha512, upload to the dev SVN. - `03-vote-mail.sh` — generate the `[VOTE]` email draft. - `release.env` — shared config (version, paths, signing key, SVN URLs, email); edit per release. - `README.md` — usage. The scripts are reusable across releases (everything version-specific lives in `release.env`). Branch prep, issue cleanup, patch merges and tag creation are out of scope.
test_show_create_table_nereids duplicate with test_show_create_table test_show_create_table's outfile is useless
#64562) ### What problem does this PR solve? Issue Number: None Related PR: None Problem Summary: `regression-test/suites/load_p2/tvf/test_s3_tvf.groovy` configured several S3 TVF attributes with both a virtual-host style URI and `use_path_style=true`: ```text uri = "s3://${bucket}.${endpoint}/..." use_path_style = "true" ``` These two settings conflict. Aliyun OSS rejects path-style access for this bucket with `HTTP 403 SecondLevelDomainForbidden` and the message `Please use virtual hosted style to access`. The regression case could pass on the previous endpoint, but failed after the P2 environment switched to the Aliyun internal Beijing endpoint where OSS enforces virtual-host style access. This PR fixes the regression case by removing the six active `addProperty("use_path_style", "true")` settings whose URI is already in virtual-host form, so the SDK sends requests in the addressing style required by OSS. The remaining S3 TVF attributes in this file do not set `use_path_style` and keep their previous behavior. This PR also improves the failure path in the test. Previously, `assertTrue(attribute.expectFiled)` threw immediately and the later `logger.info("error: ", ex)` line was skipped, so the failure only showed a bare assertion line. The exception is now logged before the assertion, and the assertion message includes the loop index, table name, property map, and original error message. Manual verification used the same OSS bucket, key prefix, and endpoint while only changing the addressing style: ```text Path-style request: https://oss-cn-beijing.aliyuncs.com/doris-regression-bj/?prefix=regression/load/data/kd16=abcdefg/&max-keys=2 Result: HTTP 403 SecondLevelDomainForbidden Virtual-host request: https://doris-regression-bj.oss-cn-beijing.aliyuncs.com/?prefix=regression/load/data/kd16=abcdefg/&max-keys=2 Result: HTTP 200 OK ``` The same behavior was reproduced through Doris S3 TVF: the query fails with `use_path_style=true` and succeeds after the property is removed. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Improve BE unit coverage for low-covered `be/src/core` paths. The change removes unused core helpers (`ArenaWithFreeLists`, `nested_utils`, and an unused `LargeIntValue` hash helper), then adds focused tests for int128 utilities, large integer stream conversion, column filter helper behavior, comparison specializations, and several data type serde branches including Nothing, Bitmap, HLL, QuantileState, Variant, Map, and Struct. ### Release note None
### What problem does this PR solve? the original description of sum0 is hard to understand. adjust the comments
…63366) Problem Summary: Move local exchange (LE) planning from BE's `_plan_local_exchange` (pipeline build time) to a new FE-side planner. The FE planner mirrors BE semantics, brings several correctness fixes, and is gated by a session variable so the legacy BE path stays available as a fallback. **Core design** - New `AddLocalExchange` pass runs after `DistributePlanner`, walking each fragment's plan tree bottom-up via the polymorphic `PlanNode.enforceAndDeriveLocalExchange()`. Each node declares what distribution it requires of its children; the framework inserts `LocalExchangeNode` where needed. - `LocalExchangeNode` represents intra-fragment data redistribution and supports PASSTHROUGH, GLOBAL/LOCAL/BUCKET HASH_SHUFFLE, BROADCAST, PASS_TO_ONE, ADAPTIVE_PASSTHROUGH, LOCAL_MERGE_SORT, NOOP. - Per-BE instance semantics: `maxPerBeInstances` (max pipeline instances assigned to any single BE) is used instead of global instance count to match BE's `_num_instances` check. Planning is a no-op when `maxPerBeInstances == 1`. - Serial → non-serial fan-out: when a serial operator feeds a non-serial parent without an intermediate LE, the framework inserts a PASSTHROUGH LE to restore N-task parallelism, matching BE's `required_data_distribution()` rule. - Requirement-based exchange type resolution via `LocalExchangeTypeRequire`: `RequireHash` adapts to any hash flavour, `RequireSpecific` preserves the exact requested type. **AggregationNode correctness fixes** PR #62438 introduced a semantic split for `required_data_distribution=HASH` (correctness-required vs performance-only). BE's `!_needs_finalize && !enable_local_exchange_before_agg → base` early-return conflates both intents in `AggSinkOperatorX` and `DistinctStreamingAggOperatorX`, wrongly catching FIRST_MERGE (correctness) / non-streaming dedup (correctness) and producing PASSTHROUGH-over-serial-child → wrong aggregation results. The FE planner adds the missing `!isMerge()` / `useStreamingPreagg=true` guards so FIRST_MERGE and non-streaming dedup always emit HASH, regardless of the flag. Also adds `requiresShuffleForCorrectness()` (mirrors BE's `is_shuffled_operator()`) so SetOperationNode propagates the "downstream depends on hash" flag correctly through chains. **Session variables** - `enable_local_shuffle_planner` (default true) — use FE planner; when false, BE plans LE itself via the legacy path. - `enable_local_shuffle` — master switch. - `enable_local_exchange_before_agg` — mirrors #62438. **Architectural notes** This PR puts the FE planner in the driver's seat for LE insertion but intentionally keeps BE-side machinery as a fallback: 1. `is_serial_operator` is still computed on both sides — any future change to BE's per-operator C++ override must be mirrored in FE. 2. Legacy BE planner (`pipeline_fragment_context.cpp::_plan_local_exchange`) is preserved and gated by `runtime_state.h::plan_local_shuffle()`; the two paths are mutually exclusive. 3. `_propagate_local_exchange_num_tasks` is kept as a runtime safety net for paired-pipeline num_tasks mismatches. **Thrift enum rename** The intra-fragment exchange enum is renamed `ExchangeType` → `TLocalPartitionType` (and `HASH_SHUFFLE` → `GLOBAL_EXECUTION_HASH_SHUFFLE`) for clarity; the BE operator headers are updated mechanically. This accounts for the otherwise-mechanical BE `.h` churn. ### Release note Add session variable `enable_local_shuffle_planner` (default true) to control whether local exchange nodes are planned in FE (new path) or in BE (legacy `_plan_local_exchange`). The two paths are mutually exclusive; the legacy path remains intact behind this flag. Co-authored-by: Gabriel <liwenqiang@selectdb.com>
### What problem does this PR solve? Some function implementations cloned nullable null maps, array offsets, or pass-through columns even though the result only needs to share immutable column data. This change reuses those COW subcolumns directly in non-mutating paths and keeps explicit clones for paths that modify result data. ### Release note None
…ion (#64593) Optimize `percentile_reservoir` aggregation performance by reducing per-row aggregate function overhead and using faster internal sorting for reservoir samples. before: ```sql Doris> select percentile_reservoir(FUniqID, 0.9999) from hits_100m; +---------------------------------------+ | percentile_reservoir(FUniqID, 0.9999) | +---------------------------------------+ | 9.222511254540202e+18 | +---------------------------------------+ 1 row in set (1.292 sec) ``` now: ```sql Doris> select percentile_reservoir(FUniqID, 0.9999) from hits_100m; +---------------------------------------+ | percentile_reservoir(FUniqID, 0.9999) | +---------------------------------------+ | 9.222511254540202e+18 | +---------------------------------------+ 1 row in set (0.537 sec) ```
### What problem does this PR solve? Problem Summary: Consecutive TopN nodes were merged only when the child order key list was a prefix of the parent order key list. When the parent order key list was shorter and was instead a prefix of the child list, the rule kept both TopN nodes even though the child ordering can serve as a deterministic tie-breaker for the parent ordering. This change allows that prefix direction, keeps the longer order key list in the merged TopN, and adjusts LogicalTopN.withOrderKeys typing so callers preserve their child type.
Problem Summary: concat_ws has a BE execution path for a single array argument. When the array column row itself is NULL, the executor still walked the nested array data and could return values from nested storage instead of treating the NULL array row as empty input. Also, if the optimizer rewrite is disabled, multiple array arguments can reach this BE array path and were silently executed using only the first array argument. This change keeps concat_ws return nullability unchanged, skips nested data for NULL array rows, and rejects array-form concat_ws calls unless the executor receives exactly separator plus one array argument. ### Release note Fix wrong concat_ws results for nullable array inputs and return an error for unsupported multiple-array execution without optimizer rewrite.
…4668) ## Problem `test_sql_block_rule_status` fails intermittently on the community P0 pipeline (observed ~2/8 builds, clustered under high CI load across several unrelated PRs). The failure is the exact-value assertion on the `BLOCKS` column: ``` assertEquals("1", statusRows[0][9].toString()) // expected 1, actual 2 ``` ## Root cause `BLOCKS` is read from `SqlBlockRule.getBlockCount()`, a **process-wide, monotonically increasing** `LongCounterMetric`. The rule under test is created with `global=true`, so its counter is shared cluster-wide and is **not isolated** to this test's single matching query. On a quiet FE a single matching query deterministically yields `BLOCKS == 1`, but any additional matching evaluation of the same statement under concurrent CI load (e.g. a transient JDBC/network statement re-delivery) bumps the shared counter past 1. The defect is the test asserting an exact value on a shared monotonic counter — not the counting logic itself. This is a pre-existing flake, not introduced by any specific PR. ## Fix Assert the meaningful invariant — *the rule fired at least once* — instead of an exact, racy count: ```groovy assertTrue(Integer.parseInt(statusRows[0][9].toString()) >= 1, "BLOCKS should be >= 1 but was ${statusRows[0][9]}") ```
## Summary - add Variant NestedGroup to the build feature list output in build.sh ## Validation - bash -n build.sh - git diff --check
…4221) ### What problem does this PR solve? `vtablet_writer` and `vtablet_writer_v2` used fixed 10ms polling loops while waiting for downstream node channels / load streams to finish close or reach quorum success. When downstream recovery is slow, upstream close wait may repeatedly scan unfinished channels and consume unnecessary CPU. This PR changes close wait to an event-driven wakeup model: - `vtablet_writer`: - Adds a close wait condition variable and version counter in `IndexChannel`. - `VNodeChannel` notifies close wait when the last add-block RPC finishes or when the channel is cancelled. - `IndexChannel::close_wait()` waits on the notification instead of polling every 10ms. - `vtablet_writer_v2`: - Adds close wait notification helpers in `LoadStreamStub`. - Stream close and cancel paths notify close wait. - `VTabletWriterV2::_close_wait()` waits on stream close events instead of polling every 10ms. The existing quorum success logic and max wait timeout behavior are preserved. A bounded fallback wait is kept so timeout and cancellation state can still be refreshed even if no downstream event arrives.
… reuse the cdc reader (#64423) ### What problem does this PR solve? Problem Summary: For from-to (MySQL/PG CDC) streaming jobs, once a job enters the incremental (binlog) phase, two issues hurt throughput: - On the **FE** side, every polling round (default `max_interval` = 10s) re-selects a BE via global round-robin, so the task drifts across BEs with no job→BE affinity. - On the **cdc_client** side, although per-job reader ownership and a per-job fixed replication slot already exist, the live reader is not actually reused: the stream reader is closed and rebuilt on every round. As a result every round rebuilds the reader. For PG this means reconnecting the replication slot and re-locating the WAL position (~15s each round), which together with large-transaction buffering is a major cause of idle / low-throughput stalls in the incremental phase.
Stream load does not support `compress_type=zstd` in the shared load format parser. Async group commit also checks only legacy compressed CSV format enum values when estimating compressed input size, so `compress_type` based compressed input is not handled consistently by stream load and HTTP stream load. This PR adds ZSTD parsing in `LoadUtil::parse_format`, adds a shared `LoadUtil::is_compressed_load` helper for `compress_type` and legacy compressed CSV format types, and uses it in stream load and HTTP stream group commit paths. This PR also adds BE UT and regression coverage for ZSTD CSV/JSON stream load and group commit stream/HTTP stream load.
### What problem does this PR solve? Issue Number: None Related PR: #57133 Problem Summary: `BaseTabletsChannel::_write_block_data` can run concurrently with `incremental_open` for the same tablets channel. `_tablet_writers` is an `std::unordered_map` protected by `_tablet_writers_lock` when writers are inserted, but the tablet load rowset info lookup read the map without holding the lock. A concurrent `emplace` may rehash `_tablet_writers`, so the unlocked lookup can race with bucket reallocation. This patch protects the lookup with `_tablet_writers_lock` and avoids using unordered_map iterators after the lock is released. The actual writer operations still run outside `_tablet_writers_lock`, so the lock remains scoped to the map access.
### What problem does this PR solve? Add an explicit block check to reject null column or type pointers at operator sink/get_block boundaries, while keeping the existing type compatibility check unchanged. ### Release note None
This PR exposes `variant_enable_nested_group` as a public VARIANT property and wires the related configuration through parser/type serialization. Main changes: - Allow `variant_enable_nested_group` in VARIANT predefined fields. - Disable doc mode and sparse-column related options when NestedGroup is enabled. - Serialize `variant_enable_nested_group` in `VariantType#toSql`. - Add `variant_nested_group_max_depth` config and make the default NestedGroup write provider explicitly return not-supported status when the write path is unavailable. - Update FE/BE tests for parser behavior, type serialization, and disabled NestedGroup write-path handling.
### What problem does this PR solve? Routine load submit failures can renew a task directly from the scheduler after the task has begun a transaction. That path mutates the job's `routineLoadTaskInfoList` without holding the job write lock, racing with scheduler idle-slot counting that reads the same list. This PR protects the submit-failure renew path with the job write lock, matching the existing timeout and transaction-status renew paths, and adds unit coverage for the locking behavior.
… ARN (#64766) ### What problem does this PR solve? S3 storage vault creation only treated role ARN as the credential-provider path. When users configured `s3.credentials_provider_type` without `s3.role_arn`, FE did not persist the provider type into `ObjectStoreInfoPB`, and Cloud meta-service still required AK/SK for the vault. The recycler also only read credential provider type inside the role ARN branch. This change allows S3 storage vaults to use an explicit credentials provider type without role ARN. FE now writes `cred_provider_type` when `s3.credentials_provider_type` or `AWS_CREDENTIALS_PROVIDER_TYPE` is set, Cloud meta-service accepts credential-provider-based S3 vaults without AK/SK, and the recycler reads the provider type independently from role ARN.
### What problem does this PR solve?
Before this change, S3-compatible glob listing derived the object-store
`ListObjects` prefix by stopping at the first glob metacharacter. For a
path like:
`s3://bucket/asin_trend/sale/month/date=2025-{0[3-9],1[0-2]}-01/mp_id=8/0/0/436/*`
the old behavior listed the broad prefix:
`asin_trend/sale/month/date=2025-`
and then filtered all returned object keys in FE. If many unrelated
objects existed under `date=2025-*`, for example other dates, `mp_id`s,
or deeper paths, S3 TVF planning could spend a long time listing and
filtering files before query execution started.
After this change, Doris expands safely enumerable glob fragments before
issuing object-store list requests. The same path is now listed through
narrower prefixes such as:
`asin_trend/sale/month/date=2025-03-01/mp_id=8/0/0/436/`
...
`asin_trend/sale/month/date=2025-12-01/mp_id=8/0/0/436/`
Doris still applies the full glob regex after listing, so result
correctness is unchanged. The optimization only reduces the remote
listing scope. Expansion is limited to bounded brace alternations and
positive character classes, with a hard cap to avoid generating too many
prefixes. Existing pagination behavior through `startAfter` and
`maxFile` is preserved.
## Summary - add a BE CMake option for Variant NestedGroup extension modules - let the Storage target replace the default provider with external extension sources when the option is enabled - let BE unit tests include matching external extension test sources when the option is enabled - keep the change limited to existing build files shared by the public and private trees
…er-row serialization (#64612) The old per-row estimateSingleRowPayloadBytes ZSTD-serialized a one-row batch for every row (CPU-heavy and ~25x oversized); sum FieldVector.getBufferSize() over the whole batch instead, and rotate the block lazily.
### What problem does this PR solve? Problem Summary: FE metrics exposed current connection counts but did not expose the configured maximum total connection count or each user's max_user_connections value. This change adds doris_fe_connection_max from qe_max_connection and doris_fe_user_connection_max with a user label. User max connection metrics are initialized from all user properties when MetricRepo starts and are synchronized when users are created, dropped, or when user properties are updated or replayed. ### Release note Add FE metrics doris_fe_connection_max and doris_fe_user_connection_max.
… test (#64756) ### What problem does this PR solve? Issue Number: #64464 Related PR: N/A Problem Summary: When querying a SQL Server JDBC catalog, a predicate on a `bit` column such as `WHERE bit_value = '1'` is folded to a boolean literal during analysis. The pushed-down predicate must be rendered as an integer (`= 1` / `= 0`) for SQL Server, never the `TRUE` / `FALSE` keyword: SQL Server has no boolean literal and reports `SQLServerException: Invalid column name 'TRUE'` (see #64464). On current master this is already handled correctly. The JDBC pushdown path was refactored to the connector SPI (`PluginDrivenScanNode` -> `ExprToConnectorExpressionConverter` -> `JdbcQueryBuilder`), and `JdbcQueryBuilder.formatBooleanLiteral()` renders booleans per dialect (`SQLSERVER` / `ORACLE` / `OCEANBASE_ORACLE` / `DB2` -> `1`/`0`, others -> `TRUE`/`FALSE`). `JdbcQueryBuilderTest` already unit-tests this. What was missing is the **end-to-end** regression test that the issue triage explicitly asked for. This PR adds it to the SQL Server docker JDBC suite (`test_sqlserver_jdbc_catalog.groovy`), covering `bit_value = '1'`, `bit_value = '0'` and `bit_value in ('1', '0')`: it asserts via `explain` that the pushed remote SQL renders `[bit_value] = 1` / `[bit_value] = 0`, and executes the queries end-to-end (which throw `Invalid column name 'TRUE'` on the buggy path). Note: `branch-4.0` still uses the old `JdbcScanNode` / `ExprToSqlVisitor` path, which renders the dialect-agnostic `TRUE`/`FALSE` and is what triggers the bug reported in #64464. That branch needs a separate code fix; this regression test alone would fail there and is not sufficient on its own.
#64705) ### What problem does this PR solve? Issue Number: N/A Related PR: N/A Problem Summary: This PR avoids blocking external meta cache invalidation on slow miss loads in FE. Previously, `MetaCacheEntry` relied on Caffeine's synchronous loading path for cache misses. When an external metadata loader became slow, operations that invalidate the same cache, such as `REFRESH CATALOG` and the corresponding replay path, could wait on the slow load and block the replay-related invalidation flow. Implementation summary: - Keep the existing `LoadingCache` to preserve current hit-path behavior and `refreshAfterWrite` support. - Add a manual miss-load path behind a new FE config switch, using `getIfPresent()` instead of synchronous `LoadingCache.get()` for misses. - Deduplicate concurrent miss loads with striped locks inside `MetaCacheEntry`. - Add an entry-level `invalidateGeneration` counter. Each invalidate increments the generation before clearing cache state. - Record the generation before a manual miss load, check it once before `put()`, and check it again after `put()`. If invalidation happens during the race window, the just-loaded value is removed so stale data is not kept in cache. - Keep null miss-load results uncached so the manual path does not attempt to put null into Caffeine. Configuration: - Add FE config `enable_external_meta_cache_manual_miss_load`, default `false`. - When it is `false`, `MetaCacheEntry` keeps the original synchronous Caffeine miss-load behavior. - When it is `true`, `MetaCacheEntry` uses the manual miss-load path plus `invalidateGeneration` protection. Scope and limitations: - This change applies to `MetaCacheEntry` used by external metadata cache paths in FE. It does not cover the legacy `MetaCache`. - `LegacyMetaCacheFactory` is intentionally not refactored in this PR. A follow-up PR will rework that path with `MetaCache`, and the legacy factory changes are left to that dedicated refactor. - The protection is designed for manual miss loads. It does not make Caffeine's asynchronous `refreshAfterWrite` reload generation-aware. - As a result, `refreshAfterWrite` is still preserved, but an async refresh result may still write back after an invalidate. That is an intentional trade-off in this version. - The new regression case is valuable as a reference and for suitable environments, but it may be skipped in standard CI because it depends on JDBC regression setup, FE debug points, and an external MySQL/JDBC environment. ### Release note None ### Check List (For Author) - Test - [ ] Regression test - [x] Unit Test - [x] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason Manual test: 1. Reproduced the blocking path with `REFRESH CATALOG` against a JDBC external catalog and a debug point that sleeps in `PluginDrivenExternalTable.initSchema`. 2. Repeated the baseline scenario 5 times with `enable_external_meta_cache_manual_miss_load=false` and observed `REFRESH CATALOG` blocked for about 14s while `DESC` stayed slow. 3. Repeated the optimized scenario 5 times with `enable_external_meta_cache_manual_miss_load=true` and observed `REFRESH CATALOG` return within about 1s while `DESC` remained slow. 4. Added a regression case as a manual-test reference because its execution depends on JDBC regression environment and FE debug-point availability. Unit test: - `FE_UT_PARALLEL=1 ./run-fe-ut.sh --run org.apache.doris.datasource.metacache.MetaCacheEntryTest` - Behavior changed: - [x] Yes. Behavior change: - `REFRESH CATALOG` and the corresponding FE invalidation path are no longer blocked by slow external metadata miss loads in this `MetaCacheEntry` implementation. - Does this need documentation? - [x] No. - [ ] Yes. ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label
Related PR: #63191 Problem Summary: Arrow 17 defaults to `C++17` when CMAKE_CXX_STANDARD is not specified, while Doris BE is built with `C++20`. This can make header-defined inline/template code from Arrow Flight and its dependencies be compiled under different C++ standard modes in the same final binary. In particular, Arrow Status-related inline paths may generate different implementations across C++17 and C++20, such as different initialization strategies for function-local static std::string objects: code: ```cpp const std::string& get_empty_string() { static const std::string s = ""; return s; } ``` cpp17 lazy initialization: ```asm get_empty_string[abi:cxx11](): push rbp mov rbp, rsp sub rsp, 64 cmp byte ptr [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]], 0 jne .LBB0_4 lea rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]] call __cxa_guard_acquire@PLT cmp eax, 0 je .LBB0_4 lea rdx, [rbp - 33] mov qword ptr [rbp - 32], rdx mov rax, qword ptr [rbp - 32] mov qword ptr [rbp - 8], rax lea rdi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]] lea rsi, [rip + .L.str] call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&) jmp .LBB0_3 .LBB0_3: lea rax, [rbp - 33] mov qword ptr [rbp - 24], rax mov rdi, qword ptr [rbp - 24] call std::__new_allocator<char>::~__new_allocator() [base object destructor] lea rdi, [rip + std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::~basic_string() [base object destructor]] lea rsi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]] lea rdx, [rip + __dso_handle] call __cxa_atexit@PLT lea rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]] call __cxa_guard_release@PLT ``` cpp20 constant initialization: ```asm get_empty_string[abi:cxx11](): push rbp mov rbp, rsp lea rax, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]] pop rbp ret get_empty_string[abi:cxx11]()::s[abi:cxx11]: .quad get_empty_string[abi:cxx11]()::s[abi:cxx11]+16 .quad 0 .zero 16 ``` Mixing those definitions through `weak/COMDAT` symbols is not a supported build model and can surface as runtime crashes in Flight error/status handling paths.
### What problem does this PR solve?
ColumnElementView exposed ptr_at() only to adapt predicate set lookups
that accepted raw void pointers. For string columns this required a
mutable temporary StringRef inside the view, making the element access
API harder to reason about.
This change removes ColumnElementView::ptr_at() and the string staging
field, updates in-list predicate evaluation to use
get_element()/get_data() with HybridSetBase::find(data, size), and
replaces the predicate selector macro with a small templated helper that
accepts named lambdas.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
### What problem does this PR solve? `from_base64` and `from_base64_binary` call the base64 decoder after sizing the output buffer from `len / 4 * 3`. For invalid input whose length is not a multiple of four, this can pass an undersized destination buffer into the decoder before the function marks the row as invalid. Root cause: the functions only handled decoder failure after invoking the decoder, but did not reject impossible base64 lengths first. This patch returns `NULL` for inputs with invalid base64 length before decoding, keeping the existing invalid-input SQL behavior while avoiding unsafe decoder calls. ### Release note None
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )