Skip to content

[Bug] BE UT crashes under ARM64 + LSAN build due to allocation size exceeding 4GB limit #64187

@heguanhui

Description

@heguanhui

Version

trunk (master branch)

What's Wrong?

When running BE UT with CMAKE_BUILD_TYPE=LSAN on ARM64 architecture, the test cases IntersectOperatorTest::test_sink_large_string_data_over_4g and ExceptOperatorTest::test_sink_large_string_data_over_4g crash the entire UT process with the following error:

==15521==ERROR: LeakSanitizer: requested allocation size 0x200000000 exceeds maximum supported size of 0x100000000
    #0 ... in realloc
    #1 ... in doris::DefaultMemoryAllocator::realloc(void*, unsigned long) allocator.h:100
    #2 ... in doris::Allocator::realloc(...) allocator.cpp:411
    #3 ... in doris::PODArrayBase::realloc(...) pod_array.h:191
    #4 ... in doris::PODArrayBase::reserve(...) pod_array.h:261
    #5 ... in doris::PODArrayBase::resize(...) pod_array.h:267
    #6 ... in doris::ColumnStr::insert_range_from_ignore_overflow(...) column_string.cpp:113
    #7 ... in doris::MutableBlock::merge_impl_ignore_overflow(...) block.h:586
    #8 ... in doris::MutableBlock::merge_ignore_overflow(...) block.h:564
    #9 ... in doris::SetSinkOperatorX<true>::sink(...) set_sink_operator.cpp:89
    #10 ... in IntersectOperatorTest_test_sink_large_string_data_over_4g_Test::TestBody() set_operator_test.cpp:480

After this error, the UT process exits immediately, preventing any subsequent tests from running.

What You Expected?

The UT process should not crash. Tests that are incompatible with LSAN's ARM64 allocation limit should be gracefully skipped so that the rest of the UT suite can continue running.

How to Reproduce?

  1. Build BE UT on ARM64 with CMAKE_BUILD_TYPE=LSAN
  2. Run: ./run-be-ut.sh --run --filter=IntersectOperatorTest.test_sink_large_string_data_over_4g
  3. Observe the crash

Root Cause Analysis

The crash is caused by the interaction of two factors:

1. LSAN's ARM64 allocation limit

In LSAN's source code (compiler-rt/lib/lsan/lsan_allocator.cpp), the maximum allowed single allocation size is defined per architecture:

#if defined(__i386__) || defined(__arm__)
static const uptr kMaxAllowedMallocSize = 1ULL << 30;       // 1GB
#elif defined(__mips64) || defined(__aarch64__)
static const uptr kMaxAllowedMallocSize = 4ULL << 30;       // 4GB (ARM64)
#else
static const uptr kMaxAllowedMallocSize = 1ULL << 40;       // 1TB (x86_64)
#endif

On ARM64 (__aarch64__), kMaxAllowedMallocSize = 0x100000000 (4GB). Any single allocation request exceeding this limit triggers a fatal error.

2. PODArray's power-of-two rounding

PODArrayBase::reserve() (pod_array.h:259-262) rounds up the requested size to the next power of two via round_up_to_power_of_two_or_zero(). When the test accumulates ~4.1GB of string data in ColumnStr::chars, the resize request of ~4.1GB gets rounded up to 8GB (0x200000000), which exceeds LSAN's 4GB limit on ARM64.

The detailed calculation for the Intersect test (4200 rows × 1MB per row, batched at 500 rows):

Batch chars.size() resize target round_up_to_2^N Realloc?
5 1.95GB 2.44GB 0x100000000 (4GB) Yes
6-8 1.95~3.91GB 2.44~3.91GB 0x100000000 (4GB) No (capacity sufficient)
9 3.91GB 4.10GB 0x200000000 (8GB) Yes → CRASH

This issue does not occur on x86_64 because LSAN's limit there is 1TB.

Are you willing to submit PR?

  • Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions