[Fix](thirdparty) Unify arrow cpp standard with be#64390
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Requesting changes because the PR only partially removes the mixed C++ standard condition.
Critical checkpoint conclusions:
- Goal/test: The goal is to make Arrow/Flight-related code use the same C++ standard as BE. Arrow, RE2, gRPC, and Abseil are covered, and third-party build checks passed, but Protobuf and Paimon still leave mixed-standard paths.
- Scope: The change is small and focused, but incomplete for the actual dependency graph.
- Concurrency/lifecycle/config/compat: No runtime concurrency, lifecycle ownership, dynamic config, transaction, storage-format, or FE/BE protocol compatibility concerns were introduced.
- Parallel paths: A default
paimon_cpppath still reuses Doris Arrow headers while compiling as C++17; Protobuf's new flag is overwritten by its own CMake. - Tests/results: CI third-party builds passed, but there is no verification that all relevant third-party compile commands use C++20; the upstream CMake files show two paths do not.
- Observability/performance: Not applicable for this build-script-only change.
- User focus: No additional user-provided review focus was present.
| LDFLAGS="${ldflags}" \ | ||
| "${CMAKE_CMD}" -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \ | ||
| -DCMAKE_BUILD_TYPE=Release \ | ||
| -DCMAKE_CXX_STANDARD="${TP_CXX_STANDARD}" \ |
There was a problem hiding this comment.
Protobuf v21.11 ignores this flag. This build invokes the top-level Protobuf CMake via ../.., and that file unconditionally does set(CMAKE_CXX_STANDARD 11) after project(), so the command-line -DCMAKE_CXX_STANDARD=20 is overwritten and libprotobuf/libprotoc still build as C++11. If this fix relies on keeping the Flight/Protobuf inline/template boundary in the same standard mode as BE, this needs an upstream patch or another enforced flag that actually wins; otherwise this line gives a false sense that Protobuf was rebuilt as C++20.
|
|
||
| LDFLAGS="${ldflags}" \ | ||
| "${CMAKE_CMD}" -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \ | ||
| -DCMAKE_CXX_STANDARD="${TP_CXX_STANDARD}" \ |
There was a problem hiding this comment.
Arrow itself is built as C++20 here, but the same Arrow headers are still compiled under C++17 by the default paimon_cpp build. The default package list builds paimon_cpp after Arrow, thirdparty/paimon-cpp-cache.cmake sets PAIMON_USE_EXTERNAL_ARROW ON, and the paimon-cpp patch symlinks Doris's Arrow include dir into that build; upstream paimon-cpp then unconditionally sets CMAKE_CXX_STANDARD 17. That leaves Arrow header inline code compiled as C++17 in paimon-cpp objects and as C++20 in BE/Arrow, so the mixed-standard condition this PR is trying to remove still exists. Please make paimon-cpp use the same standard as TP_CXX_STANDARD, including patching its CMake to respect the requested value.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Requesting changes.
Critical checkpoint conclusions:
- Goal and proof: The goal is to eliminate mixed C++ standards for Arrow/Flight and relevant dependency inline/template code. The PR only partially achieves this; Protobuf itself, gRPC, Arrow, Paimon, RE2, and Abseil are covered, but at least
brpcstill compiles installed Protobuf headers under C++11 and is linked into the C++20 BE. No build/test evidence in the PR proves the final linked binary no longer has mixed-standard Protobuf consumers. - Scope/focus: The change is small and mostly focused, but the selected package list is incomplete for the dependency boundary it is trying to make uniform.
- Concurrency/lifecycle/config/storage/transaction: Not applicable; this is third-party build logic.
- Compatibility/parallel paths: Blocking issue found in a parallel Protobuf consumer path (
brpc). Existing comments about Protobuf itself and Paimon were treated as known context and not duplicated. - Tests/results: I did not run a third-party rebuild in this review. A clean build that verifies the affected third-party compile standards should be added or reported.
- Observability/performance: Not applicable beyond build reproducibility.
User focus: no additional focus points were provided in .code-review.DWgKsg/review_focus.txt.
|
PR approved by at least one committer and no changes requested. |
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Reviewed the four third-party files in scope and the existing inline review threads. I found one new issue: the added paimon patch line fails git diff --check due trailing whitespace. I did not duplicate the existing mixed-standard Protobuf/brpc comments.
Critical checkpoint conclusions:
- Goal/test: The PR aims to align Arrow/Paimon C++ standard usage with BE. Arrow and Paimon propagation are present, but
git diff --checkcurrently fails; I did not run a full third-party build. - Scope: The changes are focused to third-party build/patch wiring.
- Concurrency/lifecycle: No runtime concurrency change. The Arrow Status static-initialization risk from deleting the old patch was checked; the final Arrow/Paimon C++20 path addresses that specific risk, while broader non-Arrow mixed-standard concerns remain in existing review threads.
- Config/compatibility/parallel paths: No runtime config was added. Existing threads already cover Protobuf/brpc/other-consumer standard compatibility concerns, so I did not submit duplicate comments.
- Tests/style: Review-only.
git diff --checkwas run on the changed files and fails on the submitted inline comment. - Data, transactions, observability, and performance: Not applicable to these third-party build-script changes, and no additional issue was found.
User focus: No additional user-provided review focus.
Subagent conclusions: optimizer-rewrite reported no new findings. tests-session-config produced TS-1, submitted below as an inline comment, and TS-2, dismissed as a duplicate of existing mixed-standard threads. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same current ledger/comment set.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Related PR: #63191 Problem Summary: Arrow 17 defaults to `C++17` when CMAKE_CXX_STANDARD is not specified, while Doris BE is built with `C++20`. This can make header-defined inline/template code from Arrow Flight and its dependencies be compiled under different C++ standard modes in the same final binary. In particular, Arrow Status-related inline paths may generate different implementations across C++17 and C++20, such as different initialization strategies for function-local static std::string objects: code: ```cpp const std::string& get_empty_string() { static const std::string s = ""; return s; } ``` cpp17 lazy initialization: ```asm get_empty_string[abi:cxx11](): push rbp mov rbp, rsp sub rsp, 64 cmp byte ptr [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]], 0 jne .LBB0_4 lea rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]] call __cxa_guard_acquire@PLT cmp eax, 0 je .LBB0_4 lea rdx, [rbp - 33] mov qword ptr [rbp - 32], rdx mov rax, qword ptr [rbp - 32] mov qword ptr [rbp - 8], rax lea rdi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]] lea rsi, [rip + .L.str] call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&) jmp .LBB0_3 .LBB0_3: lea rax, [rbp - 33] mov qword ptr [rbp - 24], rax mov rdi, qword ptr [rbp - 24] call std::__new_allocator<char>::~__new_allocator() [base object destructor] lea rdi, [rip + std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::~basic_string() [base object destructor]] lea rsi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]] lea rdx, [rip + __dso_handle] call __cxa_atexit@PLT lea rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]] call __cxa_guard_release@PLT ``` cpp20 constant initialization: ```asm get_empty_string[abi:cxx11](): push rbp mov rbp, rsp lea rax, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]] pop rbp ret get_empty_string[abi:cxx11]()::s[abi:cxx11]: .quad get_empty_string[abi:cxx11]()::s[abi:cxx11]+16 .quad 0 .zero 16 ``` Mixing those definitions through `weak/COMDAT` symbols is not a supported build model and can surface as runtime crashes in Flight error/status handling paths.
Related PR: #63191
Problem Summary:
Arrow 17 defaults to
C++17when CMAKE_CXX_STANDARD is not specified, while Doris BE is built withC++20. This can make header-defined inline/template code from Arrow Flight and its dependencies be compiled under different C++ standard modes in the same final binary.In particular, Arrow Status-related inline paths may generate different implementations across C++17 and C++20, such as different initialization strategies for function-local static std::string objects:
code:
cpp17 lazy initialization:
cpp20 constant initialization:
Mixing those definitions through
weak/COMDATsymbols is not a supported build model and can surface as runtime crashes in Flight error/status handling paths.