Skip to content

[Fix](thirdparty) Unify arrow cpp standard with be#64390

Merged
HappenLee merged 3 commits into
apache:masterfrom
linrrzqqq:unify-arrow-cpp-standard
Jun 25, 2026
Merged

[Fix](thirdparty) Unify arrow cpp standard with be#64390
HappenLee merged 3 commits into
apache:masterfrom
linrrzqqq:unify-arrow-cpp-standard

Conversation

@linrrzqqq

Copy link
Copy Markdown
Collaborator

Related PR: #63191

Problem Summary:

Arrow 17 defaults to C++17 when CMAKE_CXX_STANDARD is not specified, while Doris BE is built with C++20. This can make header-defined inline/template code from Arrow Flight and its dependencies be compiled under different C++ standard modes in the same final binary.

In particular, Arrow Status-related inline paths may generate different implementations across C++17 and C++20, such as different initialization strategies for function-local static std::string objects:

code:

const std::string& get_empty_string() {
    static const std::string s = "";
    return s;
} 

cpp17 lazy initialization:

get_empty_string[abi:cxx11]():
        push    rbp
        mov     rbp, rsp
        sub     rsp, 64
        cmp     byte ptr [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]], 0
        jne     .LBB0_4
        lea     rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        call    __cxa_guard_acquire@PLT
        cmp     eax, 0
        je      .LBB0_4
        lea     rdx, [rbp - 33]
        mov     qword ptr [rbp - 32], rdx
        mov     rax, qword ptr [rbp - 32]
        mov     qword ptr [rbp - 8], rax
        lea     rdi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        lea     rsi, [rip + .L.str]
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&)
        jmp     .LBB0_3
.LBB0_3:
        lea     rax, [rbp - 33]
        mov     qword ptr [rbp - 24], rax
        mov     rdi, qword ptr [rbp - 24]
        call    std::__new_allocator<char>::~__new_allocator() [base object destructor]
        lea     rdi, [rip + std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::~basic_string() [base object destructor]]
        lea     rsi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        lea     rdx, [rip + __dso_handle]
        call    __cxa_atexit@PLT
        lea     rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        call    __cxa_guard_release@PLT

cpp20 constant initialization:

get_empty_string[abi:cxx11]():
        push    rbp
        mov     rbp, rsp
        lea     rax, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        pop     rbp
        ret

get_empty_string[abi:cxx11]()::s[abi:cxx11]:
        .quad   get_empty_string[abi:cxx11]()::s[abi:cxx11]+16
        .quad   0
        .zero   16

Mixing those definitions through weak/COMDAT symbols is not a supported build model and can surface as runtime crashes in Flight error/status handling paths.

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@linrrzqqq

Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.93% (21148/39217)
Line Coverage 37.65% (201628/535517)
Region Coverage 33.66% (158012/469458)
Branch Coverage 34.74% (69266/199410)

@linrrzqqq

Copy link
Copy Markdown
Collaborator Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because the PR only partially removes the mixed C++ standard condition.

Critical checkpoint conclusions:

  • Goal/test: The goal is to make Arrow/Flight-related code use the same C++ standard as BE. Arrow, RE2, gRPC, and Abseil are covered, and third-party build checks passed, but Protobuf and Paimon still leave mixed-standard paths.
  • Scope: The change is small and focused, but incomplete for the actual dependency graph.
  • Concurrency/lifecycle/config/compat: No runtime concurrency, lifecycle ownership, dynamic config, transaction, storage-format, or FE/BE protocol compatibility concerns were introduced.
  • Parallel paths: A default paimon_cpp path still reuses Doris Arrow headers while compiling as C++17; Protobuf's new flag is overwritten by its own CMake.
  • Tests/results: CI third-party builds passed, but there is no verification that all relevant third-party compile commands use C++20; the upstream CMake files show two paths do not.
  • Observability/performance: Not applicable for this build-script-only change.
  • User focus: No additional user-provided review focus was present.

Comment thread thirdparty/build-thirdparty.sh Outdated
LDFLAGS="${ldflags}" \
"${CMAKE_CMD}" -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_STANDARD="${TP_CXX_STANDARD}" \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protobuf v21.11 ignores this flag. This build invokes the top-level Protobuf CMake via ../.., and that file unconditionally does set(CMAKE_CXX_STANDARD 11) after project(), so the command-line -DCMAKE_CXX_STANDARD=20 is overwritten and libprotobuf/libprotoc still build as C++11. If this fix relies on keeping the Flight/Protobuf inline/template boundary in the same standard mode as BE, this needs an upstream patch or another enforced flag that actually wins; otherwise this line gives a false sense that Protobuf was rebuilt as C++20.


LDFLAGS="${ldflags}" \
"${CMAKE_CMD}" -DCMAKE_POLICY_VERSION_MINIMUM=3.5 \
-DCMAKE_CXX_STANDARD="${TP_CXX_STANDARD}" \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow itself is built as C++20 here, but the same Arrow headers are still compiled under C++17 by the default paimon_cpp build. The default package list builds paimon_cpp after Arrow, thirdparty/paimon-cpp-cache.cmake sets PAIMON_USE_EXTERNAL_ARROW ON, and the paimon-cpp patch symlinks Doris's Arrow include dir into that build; upstream paimon-cpp then unconditionally sets CMAKE_CXX_STANDARD 17. That leaves Arrow header inline code compiled as C++17 in paimon-cpp objects and as C++20 in BE/Arrow, so the mixed-standard condition this PR is trying to remove still exists. Please make paimon-cpp use the same standard as TP_CXX_STANDARD, including patching its CMake to respect the requested value.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.94% (28324/38305)
Line Coverage 57.96% (308577/532416)
Region Coverage 54.77% (258308/471646)
Branch Coverage 56.12% (112060/199685)

@linrrzqqq

Copy link
Copy Markdown
Collaborator Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes.

Critical checkpoint conclusions:

  • Goal and proof: The goal is to eliminate mixed C++ standards for Arrow/Flight and relevant dependency inline/template code. The PR only partially achieves this; Protobuf itself, gRPC, Arrow, Paimon, RE2, and Abseil are covered, but at least brpc still compiles installed Protobuf headers under C++11 and is linked into the C++20 BE. No build/test evidence in the PR proves the final linked binary no longer has mixed-standard Protobuf consumers.
  • Scope/focus: The change is small and mostly focused, but the selected package list is incomplete for the dependency boundary it is trying to make uniform.
  • Concurrency/lifecycle/config/storage/transaction: Not applicable; this is third-party build logic.
  • Compatibility/parallel paths: Blocking issue found in a parallel Protobuf consumer path (brpc). Existing comments about Protobuf itself and Paimon were treated as known context and not duplicated.
  • Tests/results: I did not run a third-party rebuild in this review. A clean build that verifies the affected third-party compile standards should be added or reported.
  • Observability/performance: Not applicable beyond build reproducibility.

User focus: no additional focus points were provided in .code-review.DWgKsg/review_focus.txt.

Comment thread thirdparty/build-thirdparty.sh
HappenLee
HappenLee previously approved these changes Jun 12, 2026

@HappenLee HappenLee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 12, 2026
@linrrzqqq

Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.91% (21118/39173)
Line Coverage 37.64% (201385/535091)
Region Coverage 33.62% (157729/469124)
Branch Coverage 34.71% (69152/199219)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.93% (28287/38260)
Line Coverage 57.95% (308294/531972)
Region Coverage 54.88% (258636/471291)
Branch Coverage 56.19% (112098/199482)

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Jun 24, 2026
@linrrzqqq

Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq

Copy link
Copy Markdown
Collaborator Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the four third-party files in scope and the existing inline review threads. I found one new issue: the added paimon patch line fails git diff --check due trailing whitespace. I did not duplicate the existing mixed-standard Protobuf/brpc comments.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to align Arrow/Paimon C++ standard usage with BE. Arrow and Paimon propagation are present, but git diff --check currently fails; I did not run a full third-party build.
  • Scope: The changes are focused to third-party build/patch wiring.
  • Concurrency/lifecycle: No runtime concurrency change. The Arrow Status static-initialization risk from deleting the old patch was checked; the final Arrow/Paimon C++20 path addresses that specific risk, while broader non-Arrow mixed-standard concerns remain in existing review threads.
  • Config/compatibility/parallel paths: No runtime config was added. Existing threads already cover Protobuf/brpc/other-consumer standard compatibility concerns, so I did not submit duplicate comments.
  • Tests/style: Review-only. git diff --check was run on the changed files and fails on the submitted inline comment.
  • Data, transactions, observability, and performance: Not applicable to these third-party build-script changes, and no additional issue was found.

User focus: No additional user-provided review focus.

Subagent conclusions: optimizer-rewrite reported no new findings. tests-session-config produced TS-1, submitted below as an inline comment, and TS-2, dismissed as a duplicate of existing mixed-standard threads. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same current ledger/comment set.

Comment thread thirdparty/patches/paimon-cpp-buildutils-static-deps.patch
@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.15% (24596/38339)
Line Coverage 47.82% (255375/534012)
Region Coverage 44.56% (210896/473242)
Branch Coverage 45.62% (91553/200667)

1 similar comment
@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.15% (24596/38339)
Line Coverage 47.82% (255375/534012)
Region Coverage 44.56% (210896/473242)
Branch Coverage 45.62% (91553/200667)

@HappenLee HappenLee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HappenLee HappenLee merged commit c7f3559 into apache:master Jun 25, 2026
35 checks passed
github-actions Bot pushed a commit that referenced this pull request Jun 25, 2026
Related PR: #63191

Problem Summary:

Arrow 17 defaults to `C++17` when CMAKE_CXX_STANDARD is not specified,
while Doris BE is built with `C++20`. This can make header-defined
inline/template code from Arrow Flight and its dependencies be compiled
under different C++ standard modes in the same final binary.

In particular, Arrow Status-related inline paths may generate different
implementations across C++17 and C++20, such as different initialization
strategies for function-local static std::string objects:

code:
```cpp
const std::string& get_empty_string() {
    static const std::string s = "";
    return s;
} 
```

cpp17 lazy initialization:
```asm
get_empty_string[abi:cxx11]():
        push    rbp
        mov     rbp, rsp
        sub     rsp, 64
        cmp     byte ptr [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]], 0
        jne     .LBB0_4
        lea     rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        call    __cxa_guard_acquire@PLT
        cmp     eax, 0
        je      .LBB0_4
        lea     rdx, [rbp - 33]
        mov     qword ptr [rbp - 32], rdx
        mov     rax, qword ptr [rbp - 32]
        mov     qword ptr [rbp - 8], rax
        lea     rdi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        lea     rsi, [rip + .L.str]
        call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&)
        jmp     .LBB0_3
.LBB0_3:
        lea     rax, [rbp - 33]
        mov     qword ptr [rbp - 24], rax
        mov     rdi, qword ptr [rbp - 24]
        call    std::__new_allocator<char>::~__new_allocator() [base object destructor]
        lea     rdi, [rip + std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::~basic_string() [base object destructor]]
        lea     rsi, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        lea     rdx, [rip + __dso_handle]
        call    __cxa_atexit@PLT
        lea     rdi, [rip + guard variable for get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        call    __cxa_guard_release@PLT
```

cpp20 constant initialization: 
```asm
get_empty_string[abi:cxx11]():
        push    rbp
        mov     rbp, rsp
        lea     rax, [rip + get_empty_string[abi:cxx11]()::s[abi:cxx11]]
        pop     rbp
        ret

get_empty_string[abi:cxx11]()::s[abi:cxx11]:
        .quad   get_empty_string[abi:cxx11]()::s[abi:cxx11]+16
        .quad   0
        .zero   16
```

Mixing those definitions through `weak/COMDAT` symbols is not a
supported build model and can surface as runtime crashes in Flight
error/status handling paths.
@linrrzqqq linrrzqqq deleted the unify-arrow-cpp-standard branch June 25, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants