[lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up
Ondřej Surý
ondrej at sury.org
Wed Mar 22 07:01:02 EDT 2023
> On 22. 3. 2023, at 9:02, Ondřej Surý via lttng-dev <lttng-dev at lists.lttng.org> wrote:
>
> That's pretty much weird because the "Write" happens on stack local variable,
> while the "Previous write" happens after futex, which lead me to the fact that
> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
>
> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4
FTR neither annotating the futex with __tsan_acquire(addr) and __tsan_release(addr)
nor falling back to compat_futex_async() for ThreadSanitizer has helped.
It seems to me that TSAN still doesn't understand the synchronization between
RCU read-critical sections and call_rcu/synchronize_rcu() as I am also getting
following reports:
Write of size 8 at 0x7b54000009c0 by thread T102:
#0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: a7c1595d61e3ee411276cf89a536a8daefa959a3)
#1 mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:324:3 (libisc-9.19.12-dev.so+0x7d136) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
#2 isc__mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:684:2 (libisc-9.19.12-dev.so+0x7e0c3) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
#3 bcentry_destroy_rcu /home/ondrej/Projects/bind9/lib/dns/badcache.c:163:2 (libdns-9.19.12-dev.so+0x4e071) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
#4 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:389:5 (liburcu-mb.so.8+0x9d6b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
Previous atomic write of size 8 at 0x7b54000009c0 by main thread (mutexes: write M0):
#0 ___cds_wfcq_append /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:202:2 (liburcu-mb.so.8+0xa8ae) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#1 _cds_wfcq_enqueue /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:223:9 (liburcu-mb.so.8+0xac09) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#2 _call_rcu /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:719:2 (liburcu-mb.so.8+0x604f) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#3 urcu_mb_barrier /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:932:3 (liburcu-mb.so.8+0x4d1b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
#4 badcache_flush /home/ondrej/Projects/bind9/lib/dns/badcache.c:329:2 (libdns-9.19.12-dev.so+0x4d8b3) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
[...]
E.g. ThreadSanitizer reports a race between a place where bcentry->rcu_head is added to call_rcu() queue
and when call_rcu callbacks are called. Annotating the bcentry with acquire/release here helps with this
particular data race, but it does not feel right to me to add annotation at this level.
The code is not very complicated there:
static void
bcentry_destroy_rcu(struct rcu_head *rcu_head) {
dns_bcentry_t *bad = caa_container_of(rcu_head, dns_bcentry_t,
rcu_head);
/* __tsan_release(bad); <-- this helps */
dns_badcache_t *bc = bad->bc;
isc_mem_put(bc->mctx, bad, sizeof(*bad));
dns_badcache_detach(&bc);
}
static void
bcentry_evict(struct cds_lfht *ht, dns_bcentry_t *bad) {
/* There can be multiple deleters now */
if (cds_lfht_del(ht, &bad->ht_node) == 0) {
/* __tsan_acquire(bad); <- this helps */
call_rcu(&bad->rcu_head, bcentry_destroy_rcu);
}
}
static void
badcache_flush(dns_badcache_t *bc, struct cds_lfht *ht) {
struct cds_lfht *oldht = rcu_xchg_pointer(&bc->ht, ht);
synchronize_rcu();
rcu_read_lock();
dns_bcentry_t *bad = NULL;
struct cds_lfht_iter iter;
cds_lfht_for_each_entry (oldht, &iter, bad, ht_node) {
bcentry_evict(oldht, bad);
}
rcu_read_unlock();
rcu_barrier();
RUNTIME_CHECK(cds_lfht_destroy(oldht, NULL) == 0);
}
Any ideas?
Ondrej
--
Ondřej Surý (He/Him)
ondrej at sury.org
More information about the lttng-dev
mailing list