[lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Wed Mar 22 10:41:01 EDT 2023


On 2023-03-22 04:02, Ondřej Surý via lttng-dev wrote:
> Hi,
> 
> this happens with all the patches fully applied and doesn't seem to be caused by anything I am doing :)
> 
> WARNING: ThreadSanitizer: data race (pid=3995296)
>    Write of size 8 at 0x7fb51e5fd048 by thread T296:
>      #0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
>      #1 urcu_mb_synchronize_rcu /home/ondrej/Projects/userspace-rcu/src/urcu.c:412:2 (liburcu-mb.so.8+0x35e0) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 (liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
> 
>    Previous atomic write of size 4 at 0x7fb51e5fd048 by thread T295:
>      #0 urcu_adaptative_wake_up /home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:138:2 (liburcu-mb.so.8+0x8cb9) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #1 urcu_wake_all_waiters /home/ondrej/Projects/userspace-rcu/src/../src/urcu-wait.h:214:3 (liburcu-mb.so.8+0x41de) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 urcu_mb_synchronize_rcu /home/ondrej/Projects/userspace-rcu/src/urcu.c:522:2 (liburcu-mb.so.8+0x3766) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #3 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:381:4 (liburcu-mb.so.8+0x9c38) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
> 
>    Location is stack of thread T296.
> 
>    Thread T296 (tid=3995606, running) created by thread T272 at:
>      #0 pthread_create <null> (badcache_test+0x44d5fb) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
>      #1 call_rcu_data_init /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 (liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 __create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 (liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #3 urcu_mb_create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 (liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 (libisc-9.19.12-dev.so+0x7a0a0) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 (libisc-9.19.12-dev.so+0x77890) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #6 isc__trampoline_run /home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 (libisc-9.19.12-dev.so+0xaa6be) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
> 
>    Thread T295 (tid=3995605, running) created by thread T261 at:
>      #0 pthread_create <null> (badcache_test+0x44d5fb) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7)
>      #1 call_rcu_data_init /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:460:8 (liburcu-mb.so.8+0x5b26) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #2 __create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:514:2 (liburcu-mb.so.8+0x53b5) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #3 urcu_mb_create_call_rcu_data /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:524:9 (liburcu-mb.so.8+0x52bd) (BuildId: c613f5290cb1c2233fc80714aec4bd742c418823)
>      #4 loop_run /home/ondrej/Projects/bind9/lib/isc/loop.c:293:31 (libisc-9.19.12-dev.so+0x7a0a0) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #5 loop_thread /home/ondrej/Projects/bind9/lib/isc/loop.c:327:2 (libisc-9.19.12-dev.so+0x77890) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #6 isc__trampoline_run /home/ondrej/Projects/bind9/lib/isc/trampoline.c:202:11 (libisc-9.19.12-dev.so+0xaa6be) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
> 
> SUMMARY: ThreadSanitizer: data race (/home/ondrej/Projects/bind9/tests/dns/.libs/badcache_test+0x49257d) (BuildId: 166dea93b2dca28264fc85c79b56d116cd491ed7) in __tsan_memset
> 
> This is between:
> - DEFINE_URCU_WAIT_NODE(wait, URCU_WAIT_WAITING);
> and
> - uatomic_or(&wait->state, URCU_WAIT_TEARDOWN);
> 
> That's pretty much weird because the "Write" happens on stack local variable,

Yes, it's the initialization of this "wait state" variable, which is located on the stack of the thread being blocked.

At initialization time, there are no concurrent accesses to this variable.

> while the "Previous write" happens after futex,

That "previous" write was clearly for an unrelated prior blocking of the same thread. Basically what is missing here is information about the lifetime of this urcu_wait_node object. After urcu_adaptative_busy_wait() has observed the state (uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN), it knows that the object has no concurrent user anymore, which means the thread can move forward and reclaim this area of the stack for other uses.

So somehow we should add an annotation about the lifetime of this object, which begins with DEFINE_URCU_WAIT_QUEUE() and ends right after "urcu_posix_assert(uatomic_read(&wait->state) & URCU_WAIT_TEARDOWN);".

Thanks,

Mathieu


  which lead me to the fact that
> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
> 
> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4
> 
> Oh boy...
> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej at sury.org
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com



More information about the lttng-dev mailing list