[fix](load): quote slot labels in routine-load legacy expr translation to avoid reserved-keyword parse failure#63747
Conversation
…tion to avoid reserved-keyword parse failure
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
should move getIdentSql from InitMaterializationContextHook to sqlUtils |
…Utils Move the getIdentSql and escapeUnquote methods to the shared SqlUtils class so they can be reused by PlanUtils for quoting slot labels in routine-load legacy expression translation.
|
run buildall |
TPC-H: Total hot run time: 31629 ms |
TPC-DS: Total hot run time: 171490 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
please fix failure p0 cases |
|
run buildall |
TPC-H: Total hot run time: 29078 ms |
TPC-DS: Total hot run time: 171401 ms |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
/review |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoint conclusions:
- Goal/test proof: the PR fixes identifier quoting for translated predicates and adds a routine-load regression case for keyword column names, but one config edit intended to enable the test is ineffective.
- Scope/focus: the code and test changes are mostly focused on identifier quoting; the regression-conf edit is not aligned with how the framework loads config.
- Concurrency/lifecycle: no concurrency, lock-order, or special lifecycle issue was found in the actual PR diff.
- Config/incompatibility: no Doris runtime config item is added, but the added regression
otherConfigsblock is not consumed byConfig.fromConfigObject, so it does not change the test environment as intended. - Parallel paths: identifier quoting was applied to both MV SQL assembly and PlanUtils expression translation; no missing parallel production path was confirmed.
- Test coverage: the new routine-load test covers the keyword-column case when Kafka tests are enabled by the active pipeline config; the default regression-conf edit does not enable it.
- Observability/persistence/data writes/performance: no issue was found in these areas for the actual PR diff.
User focus: no additional user-provided review focus was supplied.
|
run buildall |
TPC-H: Total hot run time: 29147 ms |
TPC-DS: Total hot run time: 169514 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Review summary: I did not find additional blocking issues in the current head.
Critical checkpoint conclusions:
- Goal and tests: The PR centralizes identifier quoting and uses it in Nereids-to-legacy expression conversion so routine-load filters containing reserved-word columns can be serialized and reparsed. The added FE unit test covers the reparse path for
group, and the Kafka regression case exercises routine-load execution when Kafka tests are enabled. - Scope: The implementation is small and focused; the previous dead
otherConfigsblock noted in the existing review thread is no longer present in this head. - Concurrency and lifecycle: No new shared mutable state, locking, threads, or special lifecycle management are introduced.
- Configuration and compatibility: No new effective config items or storage/protocol format changes are introduced.
- Parallel code paths: The shared
PlanUtils.translateToLegacyExprpath is used by routine-load predicate conversion, including the Kafka and Kinesis task reparse paths, so the fix is applied at the common conversion point. - Conditional checks and error handling: No new special conditionals or unchecked
Status-style flows are introduced in the FE change. - Test coverage: Coverage is reasonable for the reported reserved-identifier reparse problem. The Kafka regression is gated by existing
enableKafkaTestbehavior. - Observability, transactions, persistence, data writes: No new observability needs, transaction behavior, or persistence semantics are introduced by this quoting change.
- Performance: Identifier quoting adds only small string construction on expression serialization/conversion paths, not a hot execution path.
- User focus: No additional user-provided focus points were specified.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
When a routine load job uses a column name that is a SQL reserved keyword (e.g.,
group) in a PRECEDING FILTER clause, theNereids-to-legacy expression translator sets the slot label as the raw name (e.g.,
group) without quoting. When the legacyexpression SQL is later re-parsed (e.g., during routine load reparse via
NereidsLoadUtils.parseExpressionSeq), the unquotedreserved keyword causes a parse failure, pausing the routine load job.
This PR quotes the slot label using
SqlUtils.getIdentSql()so that reserved-keyword column names are properly backtick-quoted inthe translated legacy expression SQL, preventing the parse failure.
Release note
None
Check List (For Author)
Test
Behavior changed:
toSqlWithoutTbl()output but does not affect semantics.Does this need documentation?
Check List (For Reviewer who merge this PR)