[lttng-dev] User-space RCU: call rcu_barrier() before dissociating helper thread?

Fri Apr 30 14:41:43 EDT 2021

----- On Apr 29, 2021, at 9:49 AM, lttng-dev lttng-dev at lists.lttng.org wrote:

> In multipath-tools, we are using a custom RCU helper thread, which is cleaned
> out
> on exit:
> 
> https://github.com/opensvc/multipath-tools/blob/23a01fa679481ff1144139222fbd2c4c863b78f8/multipathd/main.c#L3058
> 
> I put a call to rcu_barrier() there in order to make sure all callbacks had
> finished
> before detaching the helper thread.
> 
> Now we got a report that rcu_barrier() isn't available before user-space RCU 0.8
> (https://github.com/opensvc/multipath-tools/issues/5) (and RHEL7 / Centos7
> still has 0.7.16).
> 
> Question: was it over-cautious or otherwise wrong to call rcu_barrier() before
> set_thread_call_rcu_data(NULL)? Can we maybe just skip this call? If no, what
> would be the recommended way for liburcu < 0.8 to dissociate a helper thread?
> 
> (Note: I'm not currently subscribed to lttng-dev).

First of all, there is a significant reason why liburcu does not free the "default"
call_rcu worker thread data structures at process exit. This is caused by the fact that
a call_rcu callback may very well invoke call_rcu() to re-enqueue more work.

AFAIU this is somewhat similar to what happens to the Linux kernel RCU implementation
when the machine needs to be shutdown or rebooted: there may indeed never be any point
in time where it is safe to free the call_rcu worker thread data structures without leaks,
due to the fact that a call_rcu callback may re-enqueue further work indefinitely.

So my understanding is that you implement your own call rcu worker thread because the
one provided by liburcu leaks data structure on process exit, and you expect that
call rcu_barrier once will suffice to ensure quiescence of the call rcu worker thread
data structures. Unfortunately, this does not cover the scenario where a call_rcu
callback re-enqueues additional work.

So without knowing more details on the reasons why you wish to clean up memory at
process exit, and why it would be valid to do so in your particular use-case, it's
rather difficult for me to elaborate a complete answer.

I can see that maybe we could change liburcu to make it so that we free all
call_rcu data structures _if_ they happen to be empty of callbacks at process exit,
after invoking one rcu_barrier. That should take care of not leaking data structures
in the common case where call_rcu does not enqueue further callbacks.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com