[lttng-dev] Deadlock between call_rcu thread and RCU-bp thread doing registration in rcu_read_lock()
mathieu.desnoyers at efficios.com
Fri Apr 10 16:26:02 EDT 2015
----- Original Message -----
> I use rcu-bp (0.8.6) and get deadlock between call_rcu thread and
> threads willing to do rcu_read_lock():
> 1. Some thread is in read-side critical section.
> 2. call_rcu thread waits for readers in stack of rcu_bp_register(), i.e.
> holds mutex.
> 3. Another thread enters into critical section via rcu_read_lock() and
> blocks on the mutex taken by thread 2.
> Such deadlock is quite unexpected for me. Especially if RCU is used for
> reference counting.
Let's have a look at the reproducer below,
> Originally it happened with rculfhash, below is minimized reproducer:
> #include <pthread.h>
> #include <urcu-bp.h>
> struct Node
> struct rcu_head rcu_head;
> static void free_node(struct rcu_head * head)
> struct Node *node = caa_container_of(head, struct Node, rcu_head);
> static void * reader_thread(void * arg)
> return NULL;
> int main(int argc, char * argv)
> struct Node * node = malloc(sizeof(*node));
> call_rcu(&node->rcu_head, free_node);
> pthread_t read_thread_info;
> pthread_create(&read_thread_info, NULL, reader_thread, NULL);
> pthread_join(read_thread_info, NULL);
This "pthread_join" blocks until reader_thread exits. It blocks
while holding the RCU read-side lock. Quoting README.md:
"### Interaction with mutexes
One must be careful to do not cause deadlocks due to interaction of
`synchronize_rcu()` and RCU read-side with mutexes. If `synchronize_rcu()`
is called with a mutex held, this mutex (or any mutex which has this
mutex in its dependency chain) should not be acquired from within a RCU
read-side critical section.
This is especially important to understand in the context of the
QSBR flavor: a registered reader thread being "online" by
default should be considered as within a RCU read-side critical
section unless explicitly put "offline". Therefore, if
`synchronize_rcu()` is called with a mutex held, this mutex, as
well as any mutex which has this mutex in its dependency chain
should only be taken when the RCU reader thread is "offline"
(this can be performed by calling `rcu_thread_offline()`)."
So what appears to happen here is that urcu-bp lazy registration
grabs the rcu_gp_lock when the first rcu_read_lock is encountered.
This mutex is also held when synchronize_rcu() is awaiting on
reader thread's completion. So synchronize_rcu() of the call_rcu
thread can block on the read-side lock held by main() (awaiting
on pthread_join), which blocks the lazy registration of reader_thread,
because it needs to grab that same lock.
So this issue here is caused by holding the RCU read-side lock
while calling pthread_join.
For the QSBR flavor, you will want to put the main() thread
offline before awaiting on pthread_join.
Does it answer your question ?
> return 0;
> Thread 3 (Thread 0x7f8e2ab05700 (LWP 7386)):
> #0 0x00000035cacdf343 in *__GI___poll (fds=<optimized out>,
> nfds=<optimized out>, timeout=<optimized out>) at
> #1 0x000000383880233e in wait_for_readers
> (input_readers=0x7f8e2ab04cf0, cur_snap_readers=0x0,
> qsreaders=0x7f8e2ab04ce0) at urcu-bp.c:211
> #2 0x0000003838802af2 in synchronize_rcu_bp () at urcu-bp.c:272
> #3 0x00000038388043a3 in call_rcu_thread (arg=0x1f7f030) at
> #4 0x00000035cb0079d1 in start_thread (arg=0x7f8e2ab05700) at
> #5 0x00000035cace8b6d in clone () at
> Thread 2 (Thread 0x7f8e2a304700 (LWP 7387)):
> #0 __lll_lock_wait () at
> #1 0x00000035cb009508 in _L_lock_854 () from /lib64/libpthread.so.0
> #2 0x00000035cb0093d7 in __pthread_mutex_lock (mutex=0x3838a05ca0
> <rcu_gp_lock>) at pthread_mutex_lock.c:61
> #3 0x0000003838801ed9 in mutex_lock (mutex=<optimized out>) at
> #4 0x000000383880351e in rcu_bp_register () at urcu-bp.c:493
> #5 0x000000383880382e in _rcu_read_lock_bp () at urcu/static/urcu-bp.h:159
> #6 rcu_read_lock_bp () at urcu-bp.c:296
> #7 0x0000000000400801 in reader_thread ()
> #8 0x00000035cb0079d1 in start_thread (arg=0x7f8e2a304700) at
> #9 0x00000035cace8b6d in clone () at
> Thread 1 (Thread 0x7f8e2ab06740 (LWP 7385)):
> #0 0x00000035cb00822d in pthread_join (threadid=140248569890560,
> thread_return=0x0) at pthread_join.c:89
> #1 0x000000000040088f in main ()
> Eugene Ivanov
> This e-mail is confidential and may contain legally privileged information.
> It is intended only for the addressees. If you have received this e-mail in
> error, kindly notify us immediately by telephone or e-mail and delete the
> message from your system.
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
More information about the lttng-dev