[lttng-dev] Deadlock in call_rcu_thread when destroy rculfhash node with nested rculfhash

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Wed Jun 7 22:05:53 UTC 2017


----- On Jun 1, 2017, at 9:01 AM, Mathieu Desnoyers <mathieu.desnoyers at efficios.com> wrote: 

> ----- On Oct 21, 2016, at 4:19 AM, Evgeniy Ivanov <i at eivanov.com> wrote:

>> On Wed, Oct 19, 2016 at 6:03 PM, Mathieu Desnoyers < [
>> mailto:mathieu.desnoyers at efficios.com | mathieu.desnoyers at efficios.com ] >
>> wrote:

>>> This is because we use call_rcu internally to trigger the hash table
>>> resize.

>>> In cds_lfht_destroy, we start by waiting for "in-flight" resize to complete.
>>> Unfortunately, this requires that call_rcu worker thread progresses. If
>>> cds_lfht_destroy is called from the call_rcu worker thread, it will wait
>>> forever.

>>> One alternative would be to implement our own worker thread scheme
>>> for the rcu HT resize rather than use the call_rcu worker thread. This
>>> would simplify cds_lfht_destroy requirements a lot.

>>> Ideally I'd like to re-use all the call_rcu work dispatch/worker handling
>>> scheme, just as a separate work queue.

>>> Thoughts ?

>> Thank you for explaining. Sounds like a plan: in our prod there is no issue with
>> having extra thread for table resizes. And nested tables is important feature.

> I finally managed to find some time to implement a solution, feedback
> would be welcome!

> Here are the RFC patches:

> https://lists.lttng.org/pipermail/lttng-dev/2017-May/027183.html
> https://lists.lttng.org/pipermail/lttng-dev/2017-May/027184.html

Just merged commits derived from those patches into liburcu master branch. 

Thanks, 

Mathieu 

> Thanks,

> Mathieu

>>> Thanks,

>>> Mathieu

>>> ----- On Oct 19, 2016, at 6:03 AM, Evgeniy Ivanov < [ mailto:i at eivanov.com |
>>> i at eivanov.com ] > wrote:

>>>> Sorry, found partial answer in docs which state that cds_lfht_destroy should not
>>>> be called from a call_rcu thread context. Why does this limitation exists?

>>>> On Wed, Oct 19, 2016 at 12:56 PM, Evgeniy Ivanov < [ mailto:i at eivanov.com |
>>>> i at eivanov.com ] > wrote:

>>>>> Hi,

>>>>> Each node of top level rculfhash has nested rculfhash. Some thread clears the
>>>>> top level map and then uses rcu_barrier() to wait until everything is destroyed
>>>>> (it is done to check leaks). Recently it started to dead lock sometimes with
>>>>> following stacks:

>>>>> Thread1:

>>>>> __poll
>>>>> cds_lfht_destroy <---- nested map
>>>>> ...
>>>>> free_Node(rcu_head*) <----- node of top level map
>>>>> call_rcu_thread

>>>>> Thread2:

>>>>> syscall
>>>>> rcu_barrier_qsbr
>>>>> destroy_all
>>>>> main

>>>>> Did call_rcu_thread dead lock with barrier thread? Or is it some kind of
>>>>> internal deadlock because of nested maps?

>>>>> --
>>>>> Cheers,
>>>>> Evgeniy

>>>> --
>>>> Cheers,
>>>> Evgeniy

>>>> _______________________________________________
>>>> lttng-dev mailing list
>>>> [ mailto:lttng-dev at lists.lttng.org | lttng-dev at lists.lttng.org ]
>>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]

>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> [ http://www.efficios.com/ | http://www.efficios.com ]

>>> _______________________________________________
>>> lttng-dev mailing list
>>> [ mailto:lttng-dev at lists.lttng.org | lttng-dev at lists.lttng.org ]
>>> [ https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev |
>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev ]

>> --
>> Cheers,
>> Evgeniy

>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20170607/4b688a1c/attachment.html>


More information about the lttng-dev mailing list