Skip to content

example/rdma_performance tsan测试报错data race #3295

@daming6

Description

@daming6

Describe the bug
加上tsan编译、链接选项后,在两个shell窗口分别执行如下命令:
taskset -c 0-95 ./kpl_tools server -use_rdma 0 -thread_num 96
taskset -c 48-95 ./kpl_tools client -thread_num 48 -queue_depth 85 -attachment_size 131072 -use_rdma 0 -connection_type pooled
会出现一些tsan报错,报错均为data race,再具体可以分为两类,分别是Location is global以及Location is heap block of size xx at 0xxx allocated by main thread
第一类报错举例如下:

WARNING: ThreadSanitizer: data race (pid=6737)
Write of size 8 at 0xfffff6e40618 by thread T3:
#0 logging::add_vlog_site(int const**, char const*, int, int) src/butil/logging.cc:1863 (libbrpc.so+0x22548c)
#1 bthread::TaskControl::worker_thread(void*) src/bthread/task_control.cpp:120 (libbrpc.so+0x2db1b4)
#2 (libtsan.so.2+0x3c874)

Previous read of size 8 at 0xfffff6e40618 by thread T5:
#0 bthread::TaskControl::worker_thread(void*) src/bthread/task_control.cpp:120 (libbrpc.so+0x2dad30)
#1 (libtsan.so.2+0x3c874)

Location is global 'bthread::TaskControl::worker_thread(void*)::vlocal' of size 8 at 0xfffff6e40618 (libbrpc.so+0x930618)

Thread T3 'brpc_wkr:0-0' (tid=6741, running) created by main thread at:
#0 pthread_create (libtsan.so.2+0x6ad40)
#1 bthread::TaskControl::init(int) src/bthread/task_control.cpp:263 (libbrpc.so+0x2de51c)
#2 bthread::get_or_new_task_control() src/bthread/bthread.cpp:117 (libbrpc.so+0x2b1b9c)
#3 bthread::start_from_non_worker(unsigned long*, bthread_attr_t const*, void* ()(void), void*) src/bthread/bthread.cpp:274 (libbrpc.so+0x2af8dc)
#4 bthread_start_background src/bthread/bthread.cpp:356 (libbrpc.so+0x2af8dc)
#5 GlobalInitializeOrDieImpl src/brpc/global.cpp:651 (libbrpc.so+0x40f91c)
#6 pthread_once (libtsan.so.2+0x4c654)
#7 brpc::GlobalInitializeOrDie() src/brpc/global.cpp:656 (libbrpc.so+0x40d690)
#8 brpc::Server::InitializeOnce() src/brpc/server.cpp:676 (libbrpc.so+0x486a3c)
#9 brpc::Server::StartInternal(butil::EndPoint const&, brpc::PortRange const&, brpc::ServerOptions const*) src/brpc/server.cpp:862 (libbrpc.so+0x494020)
#10 brpc::Server::Start(butil::EndPoint const&, brpc::ServerOptions const*) src/brpc/server.cpp:1279 (libbrpc.so+0x496bd0)
#11 brpc::Server::Start(int, brpc::ServerOptions const*) src/brpc/server.cpp:1298 (libbrpc.so+0x496e64)
#12 brpc::StartDummyServerAt(int, brpc::ProfilerLinker) src/brpc/server.cpp:1974 (libbrpc.so+0x497144)
#13 main /home/caolx5/brpc-master/example/rdma_performance/client.cpp:282 (client+0x40a3c4)

Thread T5 'brpc_wkr:0-1' (tid=6743, running) created by main thread at:
#0 pthread_create (libtsan.so.2+0x6ad40)
#1 bthread::TaskControl::init(int) src/bthread/task_control.cpp:263 (libbrpc.so+0x2de51c)
#2 bthread::get_or_new_task_control() src/bthread/bthread.cpp:117 (libbrpc.so+0x2b1b9c)
#3 bthread::start_from_non_worker(unsigned long*, bthread_attr_t const*, void* ()(void), void*) src/bthread/bthread.cpp:274 (libbrpc.so+0x2af8dc)
#4 bthread_start_background src/bthread/bthread.cpp:356 (libbrpc.so+0x2af8dc)
#5 GlobalInitializeOrDieImpl src/brpc/global.cpp:651 (libbrpc.so+0x40f91c)
#6 pthread_once (libtsan.so.2+0x4c654)
#7 brpc::GlobalInitializeOrDie() src/brpc/global.cpp:656 (libbrpc.so+0x40d690)
#8 brpc::Server::InitializeOnce() src/brpc/server.cpp:676 (libbrpc.so+0x486a3c)
#9 brpc::Server::StartInternal(butil::EndPoint const&, brpc::PortRange const&, brpc::ServerOptions const*) src/brpc/server.cpp:862 (libbrpc.so+0x494020)
#10 brpc::Server::Start(butil::EndPoint const&, brpc::ServerOptions const*) src/brpc/server.cpp:1279 (libbrpc.so+0x496bd0)
#11 brpc::Server::Start(int, brpc::ServerOptions const*) src/brpc/server.cpp:1298 (libbrpc.so+0x496e64)
#12 brpc::StartDummyServerAt(int, brpc::ProfilerLinker) src/brpc/server.cpp:1974 (libbrpc.so+0x497144)
#13 main /home/caolx5/brpc-master/example/rdma_performance/client.cpp:282 (client+0x40a3c4)

SUMMARY: ThreadSanitizer: data race src/butil/logging.cc:1863 in logging::add_vlog_site(int const**, char const*, int, int)

函数栈指向的是logging.h BAIDU_VLOG_IS_ON函数宏

define BAIDU_VLOG_IS_ON(verbose_level, filepath) \

({ static const int* vlocal = &::logging::VLOG_UNINITIALIZED;       \
    const int saved_verbose_level = (verbose_level);                \
    (saved_verbose_level >= 0)/*VLOG(-1) is forbidden*/ &&          \
        (*vlocal >= saved_verbose_level) &&                         \
        ((vlocal != &::logging::VLOG_UNINITIALIZED) ||              \
         (::logging::add_vlog_site(&vlocal, filepath, __LINE__,     \
                                   saved_verbose_level))); })   // add_vlog_site 写&vlocal

logging模块没有对BAIDU_VLOG_IS_ON函数宏 读vlocal和调用add_vlog_site 写&vlocal 时做多线程之间的锁同步等同步保护机制,所以就会出现线程之间的数据竞争
或者如果logging模块有上层异步队列机制同步保护,这个就是误报了

第二类报错举例如下:

WARNING: ThreadSanitizer: data race (pid=6737)
Write of size 8 at 0xffffeb401e60 by thread T7:
#0 bthread::TimerThread::Bucket::schedule(void ()(void), void*, timespec const&) src/bthread/timer_thread.cpp:212 (libbrpc.so+0x309b1c)
#1 bthread::TimerThread::schedule(void ()(void), void*, timespec const&) src/bthread/timer_thread.cpp:231 (libbrpc.so+0x309d9c)
#2 bthread::TaskGroup::_add_sleep_event(void*) src/bthread/task_group.cpp:940 (libbrpc.so+0x2ff160)
#3 bthread::TaskGroup::sched_to(bthread::TaskGroup**, bthread::TaskMeta*) src/bthread/task_group.cpp:792 (libbrpc.so+0x2fea6c)
#4 bthread::TaskGroup::sched_to(bthread::TaskGroup**, unsigned long) src/bthread/task_group_inl.h:82 (libbrpc.so+0x300ea8)
#5 bthread::TaskGroup::sched(bthread::TaskGroup**) src/bthread/task_group.cpp:700 (libbrpc.so+0x300ea8)
#6 bthread::TaskGroup::usleep(bthread::TaskGroup**, unsigned long) src/bthread/task_group.cpp:986 (libbrpc.so+0x301560)
#7 bthread_usleep src/bthread/bthread.cpp:569 (libbrpc.so+0x2b0c6c)
#8 GlobalUpdate src/brpc/global.cpp:248 (libbrpc.so+0x40dfa8)
#9 bthread::TaskGroup::task_runner(long) src/bthread/task_group.cpp:391 (libbrpc.so+0x301798)
#10 bthread_make_fcontext (libbrpc.so+0x2b7734)
#11 bthread::TaskGroup::sched_to(bthread::TaskGroup**, unsigned long) src/bthread/task_group_inl.h:82 (libbrpc.so+0x301de4)
#12 bthread::TaskGroup::run_main_task() src/bthread/task_group.cpp:209 (libbrpc.so+0x301de4)
#13 bthread::TaskControl::worker_thread(void*) src/bthread/task_control.cpp:126 (libbrpc.so+0x2daed4)
#14 (libtsan.so.2+0x3c874)

Previous read of size 8 at 0xffffeb401e60 by thread T2:
#0 bthread::TimerThread::Bucket::consume_tasks() src/bthread/timer_thread.cpp:174 (libbrpc.so+0x309110)
#1 bthread::TimerThread::run() src/bthread/timer_thread.cpp:364 (libbrpc.so+0x30ab10)
#2 bthread::TimerThread::run_this(void*) src/bthread/timer_thread.cpp:125 (libbrpc.so+0x30b9a8)
#3 (libtsan.so.2+0x3c874)

Location is heap block of size 896 at 0xffffeb401c00 allocated by main thread:
#0 operator new[](unsigned long, std::align_val_t, std::nothrow_t const&) (libtsan.so.2+0x928b0)
#1 bthread::TimerThread::start(bthread::TimerThreadOptions const*) src/bthread/timer_thread.cpp:159 (libbrpc.so+0x308ef0)
#2 init_global_timer_thread src/bthread/timer_thread.cpp:476 (libbrpc.so+0x309674)
#3 pthread_once (libtsan.so.2+0x4c654)
#4 bthread::get_or_create_global_timer_thread() src/bthread/timer_thread.cpp:485 (libbrpc.so+0x3098f4)
#5 bthread::TaskControl::init(int) src/bthread/task_control.cpp:248 (libbrpc.so+0x2de460)
#6 bthread::get_or_new_task_control() src/bthread/bthread.cpp:117 (libbrpc.so+0x2b1b9c)
#7 bthread::start_from_non_worker(unsigned long*, bthread_attr_t const*, void* ()(void), void*) src/bthread/bthread.cpp:274 (libbrpc.so+0x2af8dc)
#8 bthread_start_background src/bthread/bthread.cpp:356 (libbrpc.so+0x2af8dc)
#9 GlobalInitializeOrDieImpl src/brpc/global.cpp:651 (libbrpc.so+0x40f91c)
#10 pthread_once (libtsan.so.2+0x4c654)
#11 brpc::GlobalInitializeOrDie() src/brpc/global.cpp:656 (libbrpc.so+0x40d690)
#12 brpc::Server::InitializeOnce() src/brpc/server.cpp:676 (libbrpc.so+0x486a3c)
#13 brpc::Server::StartInternal(butil::EndPoint const&, brpc::PortRange const&, brpc::ServerOptions const*) src/brpc/server.cpp:862 (libbrpc.so+0x494020)
#14 brpc::Server::Start(butil::EndPoint const&, brpc::ServerOptions const*) src/brpc/server.cpp:1279 (libbrpc.so+0x496bd0)
#15 brpc::Server::Start(int, brpc::ServerOptions const*) src/brpc/server.cpp:1298 (libbrpc.so+0x496e64)
#16 brpc::StartDummyServerAt(int, brpc::ProfilerLinker) src/brpc/server.cpp:1974 (libbrpc.so+0x497144)
#17 main /home/caolx5/brpc-master/example/rdma_performance/client.cpp:282 (client+0x40a3c4)

Thread T7 'brpc_wkr:0-3' (tid=6745, running) created by main thread at:
#0 pthread_create (libtsan.so.2+0x6ad40)
#1 bthread::TaskControl::init(int) src/bthread/task_control.cpp:263 (libbrpc.so+0x2de51c)
#2 bthread::get_or_new_task_control() src/bthread/bthread.cpp:117 (libbrpc.so+0x2b1b9c)
#3 bthread::start_from_non_worker(unsigned long*, bthread_attr_t const*, void* ()(void), void*) src/bthread/bthread.cpp:274 (libbrpc.so+0x2af8dc)
#4 bthread_start_background src/bthread/bthread.cpp:356 (libbrpc.so+0x2af8dc)
#5 GlobalInitializeOrDieImpl src/brpc/global.cpp:651 (libbrpc.so+0x40f91c)
#6 pthread_once (libtsan.so.2+0x4c654)
#7 brpc::GlobalInitializeOrDie() src/brpc/global.cpp:656 (libbrpc.so+0x40d690)
#8 brpc::Server::InitializeOnce() src/brpc/server.cpp:676 (libbrpc.so+0x486a3c)
#9 brpc::Server::StartInternal(butil::EndPoint const&, brpc::PortRange const&, brpc::ServerOptions const*) src/brpc/server.cpp:862 (libbrpc.so+0x494020)
#10 brpc::Server::Start(butil::EndPoint const&, brpc::ServerOptions const*) src/brpc/server.cpp:1279 (libbrpc.so+0x496bd0)
#11 brpc::Server::Start(int, brpc::ServerOptions const*) src/brpc/server.cpp:1298 (libbrpc.so+0x496e64)
#12 brpc::StartDummyServerAt(int, brpc::ProfilerLinker) src/brpc/server.cpp:1974 (libbrpc.so+0x497144)
#13 main /home/caolx5/brpc-master/example/rdma_performance/client.cpp:282 (client+0x40a3c4)

Thread T2 'brpc_timer' (tid=6740, running) created by main thread at:
#0 pthread_create (libtsan.so.2+0x6ad40)
#1 bthread::TimerThread::start(bthread::TimerThreadOptions const*) src/bthread/timer_thread.cpp:164 (libbrpc.so+0x308f70)
#2 init_global_timer_thread src/bthread/timer_thread.cpp:476 (libbrpc.so+0x309674)
#3 pthread_once (libtsan.so.2+0x4c654)
#4 bthread::get_or_create_global_timer_thread() src/bthread/timer_thread.cpp:485 (libbrpc.so+0x3098f4)
#5 bthread::TaskControl::init(int) src/bthread/task_control.cpp:248 (libbrpc.so+0x2de460)
#6 bthread::get_or_new_task_control() src/bthread/bthread.cpp:117 (libbrpc.so+0x2b1b9c)
#7 bthread::start_from_non_worker(unsigned long*, bthread_attr_t const*, void* ()(void), void*) src/bthread/bthread.cpp:274 (libbrpc.so+0x2af8dc)
#8 bthread_start_background src/bthread/bthread.cpp:356 (libbrpc.so+0x2af8dc)
#9 GlobalInitializeOrDieImpl src/brpc/global.cpp:651 (libbrpc.so+0x40f91c)
#10 pthread_once (libtsan.so.2+0x4c654)
#11 brpc::GlobalInitializeOrDie() src/brpc/global.cpp:656 (libbrpc.so+0x40d690)
#12 brpc::Server::InitializeOnce() src/brpc/server.cpp:676 (libbrpc.so+0x486a3c)
#13 brpc::Server::StartInternal(butil::EndPoint const&, brpc::PortRange const&, brpc::ServerOptions const*) src/brpc/server.cpp:862 (libbrpc.so+0x494020)
#14 brpc::Server::Start(butil::EndPoint const&, brpc::ServerOptions const*) src/brpc/server.cpp:1279 (libbrpc.so+0x496bd0)
#15 brpc::Server::Start(int, brpc::ServerOptions const*) src/brpc/server.cpp:1298 (libbrpc.so+0x496e64)
#16 brpc::StartDummyServerAt(int, brpc::ProfilerLinker) src/brpc/server.cpp:1974 (libbrpc.so+0x497144)
#17 main /home/caolx5/brpc-master/example/rdma_performance/client.cpp:282 (client+0x40a3c4)

SUMMARY: ThreadSanitizer: data race src/bthread/timer_thread.cpp:212 in bthread::TimerThread::Bucket::schedule(void ()(void), void*, timespec const&)

函数栈指向的是timer_thread.cpp 如下两个函数
TimerThread::Task* TimerThread::Bucket::consume_tasks() {
Task* head = NULL;
if (_task_head) { // NOTE: schedule() and consume_tasks() are sequenced // _task_head读
// by TimerThread._nearest_run_time and fenced by TimerThread._mutex.
// We can avoid touching the mutex and related cacheline when the
// bucket is actually empty.
BAIDU_SCOPED_LOCK(_mutex);
if (_task_head) {
head = _task_head;
_task_head = NULL;
_nearest_run_time = std::numeric_limits<int64_t>::max();
}
}
return head;
}

TimerThread::Bucket::ScheduleResult
TimerThread::Bucket::schedule(void (fn)(void), void* arg,
const timespec& abstime) {
butil::ResourceId slot_id;
Task* task = butil::get_resource(&slot_id);
if (task == NULL) {
ScheduleResult result = { INVALID_TASK_ID, false };
return result;
}
task->next = NULL;
task->fn = fn;
task->arg = arg;
task->run_time = butil::timespec_to_microseconds(abstime);
uint32_t version = task->version.load(butil::memory_order_relaxed);
if (version == 0) { // skip 0.
task->version.fetch_add(2, butil::memory_order_relaxed);
version = 2;
}
const TaskId id = make_task_id(slot_id, version);
task->task_id = id;
bool earlier = false;
{
BAIDU_SCOPED_LOCK(_mutex);
task->next = _task_head;
_task_head = task; // _task_head写
if (task->run_time < _nearest_run_time) {
_nearest_run_time = task->run_time;
earlier = true;
}
}
ScheduleResult result = { id, earlier };
return result;
}
这个看着是tsan无法识别封装宏BAIDU_SCOPED_LOCK(_mutex),导致tsan认为多线程之间没有锁同步保护,一个线程写_task_head,另一个线程同时读_task_head(上述代码有对应注释),所以报错数据竞争data race
这第二种报错跟另外两个issue(#2864 以及 #1687 )可能是一个原因导致的tsan报错:brpc没有适配好tsan

To Reproduce
加上tsan编译、链接选项后,在两个shell窗口分别执行如下命令:
taskset -c 0-95 ./kpl_tools server -use_rdma 0 -thread_num 96
taskset -c 48-95 ./kpl_tools client -thread_num 48 -queue_depth 85 -attachment_size 131072 -use_rdma 0 -connection_type pooled

Expected behavior
如果brpc没有适配好tsan,那就没法跑tsan;如果适配好,期望不会出现上述的data race tsan报错

Versions
OS: openEuler 24.03 (LTS-SP2)
Compiler: gcc 12.3.1
brpc: 1.16
protobuf: protobuf-25.1-12.oe2403sp2.aarch64

Additional context/screenshots

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions