[lttng-dev] ThreadSanitizer: data race between urcu_mb_synchronize_rcu and urcu_adaptative_wake_up

Wed Mar 22 10:57:58 EDT 2023

On 2023-03-22 07:01, Ondřej Surý via lttng-dev wrote:
>> On 22. 3. 2023, at 9:02, Ondřej Surý via lttng-dev <lttng-dev at lists.lttng.org> wrote:
>>
>> That's pretty much weird because the "Write" happens on stack local variable,
>> while the "Previous write" happens after futex, which lead me to the fact that
>> ThreadSanitizer doesn't intercept futex, but we can annotate the futexes:
>>
>> https://groups.google.com/g/thread-sanitizer/c/T0G_NyyZ3s4
> 
> FTR neither annotating the futex with __tsan_acquire(addr) and __tsan_release(addr)
> nor falling back to compat_futex_async() for ThreadSanitizer has helped.
> 
> It seems to me that TSAN still doesn't understand the synchronization between
> RCU read-critical sections and call_rcu/synchronize_rcu() as I am also getting
> following reports:
> 
>    Write of size 8 at 0x7b54000009c0 by thread T102:
>      #0 __tsan_memset <null> (badcache_test+0x49257d) (BuildId: a7c1595d61e3ee411276cf89a536a8daefa959a3)
>      #1 mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:324:3 (libisc-9.19.12-dev.so+0x7d136) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #2 isc__mem_put /home/ondrej/Projects/bind9/lib/isc/mem.c:684:2 (libisc-9.19.12-dev.so+0x7e0c3) (BuildId: a33cd26e483b73684928b4782627f1278c001605)
>      #3 bcentry_destroy_rcu /home/ondrej/Projects/bind9/lib/dns/badcache.c:163:2 (libdns-9.19.12-dev.so+0x4e071) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
>      #4 call_rcu_thread /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:389:5 (liburcu-mb.so.8+0x9d6b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
> 
>    Previous atomic write of size 8 at 0x7b54000009c0 by main thread (mutexes: write M0):
>      #0 ___cds_wfcq_append /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:202:2 (liburcu-mb.so.8+0xa8ae) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #1 _cds_wfcq_enqueue /home/ondrej/Projects/userspace-rcu/src/../include/urcu/static/wfcqueue.h:223:9 (liburcu-mb.so.8+0xac09) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #2 _call_rcu /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:719:2 (liburcu-mb.so.8+0x604f) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #3 urcu_mb_barrier /home/ondrej/Projects/userspace-rcu/src/../src/urcu-call-rcu-impl.h:932:3 (liburcu-mb.so.8+0x4d1b) (BuildId: d4f5ea9d96625c7b7d2b2efb590b208f7b83cb6f)
>      #4 badcache_flush /home/ondrej/Projects/bind9/lib/dns/badcache.c:329:2 (libdns-9.19.12-dev.so+0x4d8b3) (BuildId: 8a550b795003cd1075ff29590734c806d84e76e6)
>      [...]
> 
> E.g. ThreadSanitizer reports a race between a place where bcentry->rcu_head is added to call_rcu() queue
> and when call_rcu callbacks are called.  Annotating the bcentry with acquire/release here helps with this
> particular data race, but it does not feel right to me to add annotation at this level.
> 
> The code is not very complicated there:
> 
> static void
> bcentry_destroy_rcu(struct rcu_head *rcu_head) {
>          dns_bcentry_t *bad = caa_container_of(rcu_head, dns_bcentry_t,
>                                                rcu_head);
>          /* __tsan_release(bad); <-- this helps */
>          dns_badcache_t *bc = bad->bc;
> 
>          isc_mem_put(bc->mctx, bad, sizeof(*bad));
> 
>          dns_badcache_detach(&bc);
> }
> 
> static void
> bcentry_evict(struct cds_lfht *ht, dns_bcentry_t *bad) {
>          /* There can be multiple deleters now */
>          if (cds_lfht_del(ht, &bad->ht_node) == 0) {
>                  /* __tsan_acquire(bad); <- this helps */
>                  call_rcu(&bad->rcu_head, bcentry_destroy_rcu);
>          }
> }
> 
> static void
> badcache_flush(dns_badcache_t *bc, struct cds_lfht *ht) {
>          struct cds_lfht *oldht = rcu_xchg_pointer(&bc->ht, ht);
> 
>          synchronize_rcu();
> 
>          rcu_read_lock();
>          dns_bcentry_t *bad = NULL;
>          struct cds_lfht_iter iter;
>          cds_lfht_for_each_entry (oldht, &iter, bad, ht_node) {
>                  bcentry_evict(oldht, bad);
>          }
>          rcu_read_unlock();
>          rcu_barrier();
>          RUNTIME_CHECK(cds_lfht_destroy(oldht, NULL) == 0);
> }
> 
> Any ideas?

I suspect what happens here is that TSAN is considering the

         /*
          * Implicit memory barrier after uatomic_xchg() orders store to
          * q->tail before store to old_tail->next.
          *
          * At this point, dequeuers see a NULL tail->p->next, which
          * indicates that the queue is being appended to. The following
          * store will append "node" to the queue from a dequeuer
          * perspective.
          */
         CMM_STORE_SHARED(old_tail->next, new_head);

within ___cds_wfcq_append() as racy.

This pairs with:

/*
  * Waiting for enqueuer to complete enqueue and return the next node.
  */
static inline struct cds_wfcq_node *
___cds_wfcq_node_sync_next(struct cds_wfcq_node *node, int blocking)
{
         struct cds_wfcq_node *next;
         int attempt = 0;

         /*
          * Adaptative busy-looping waiting for enqueuer to complete enqueue.
          */
         while ((next = CMM_LOAD_SHARED(node->next)) == NULL) {
                 if (___cds_wfcq_busy_wait(&attempt, blocking))
                         return CDS_WFCQ_WOULDBLOCK;
         }

         return next;
}

So the release semantic is provided by the implicit SEQ_CST barrier in:

___cds_wfcq_append():
   old_tail = uatomic_xchg(&tail->p, new_tail); (release)

and the acquire semantic is provided by the implicit SEQ_CST barrier in:

___cds_wfcq_splice():

         /*
          * Memory barrier implied before uatomic_xchg() orders store to
          * src_q->head before store to src_q->tail. This is required by
          * concurrent enqueue on src_q, which exchanges the tail before
          * updating the previous tail's next pointer.
          */
         tail = uatomic_xchg(&src_q_tail->p, &src_q_head->node);

Notice how the release/acquire semantic is provided by tail->p, which is atomically modified _before_ we set the node->next pointer.

With this information, is there a specific annotation that would make sense ?

Thanks,

Mathieu

> 
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej at sury.org
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com