[lttng-dev] RCU API usage from call_rcu callbacks?

Wed Mar 22 10:45:28 EDT 2023

On Wed, Mar 22, 2023 at 09:57:25AM -0400, Mathieu Desnoyers wrote:
> On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:
> > Hi,
> > 
> > the documentation is pretty silent on this, and asking here is probably going to be faster
> > than me trying to use the source to figure this out.
> > 
> > Is it legal to call_rcu() from within the call_rcu() callback?
> 
> Yes. call_rcu callbacks can be chained.
> 
> Note that you'll need to issue rcu_barrier() on program exit as many times as you chained call_rcu callbacks if you intend to make sure no queued callbacks still exist on program clean shutdown. See this comment above urcu_call_rcu_exit():
> 
>  * Teardown the default call_rcu worker thread if there are no queued
>  * callbacks on process exit. This prevents leaking memory.
>  *
>  * Here is how an application can ensure graceful teardown of this
>  * worker thread:
>  *
>  * - An application queuing call_rcu callbacks should invoke
>  *   rcu_barrier() before it exits.
>  * - When chaining call_rcu callbacks, the number of calls to
>  *   rcu_barrier() on application exit must match at least the maximum
>  *   number of chained callbacks.
>  * - If an application chains callbacks endlessly, it would have to be
>  *   modified to stop chaining callbacks when it detects an application
>  *   exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
>  *   after setting that flag.

This trick can also be used to gracefully shut down in the presence
of bounded chaining using but one rcu_barrier() call.

							Thanx, Paul

>  * - The statements above apply to a library which queues call_rcu
>  *   callbacks, only it needs to invoke rcu_barrier in its library
>  *   destructor.
> 
> 
> > 
> > What about the other RCU (and CDS) API calls?
> 
> They can be unless stated otherwise. For instance, rcu_barrier() cannot be called from a call_rcu worker thread.
> 
> > 
> > How does that interact with create_call_rcu_data()?  I have <n> event loops and I am
> > initializing <n> 1:1 call_rcu helper threads as I need to do some per-thread initialization
> > as some of the destroy-like functions use random numbers (don't ask).
> 
> As I recall, set_thread_call_rcu_data() will associate a call_rcu worker instance for the current thread. So all following call_rcu() invocations from that thread will be queued into this per-thread call_rcu queue, and handled by the call_rcu worker thread.
> 
> But I wonder why you inherently need this 1:1 mapping, rather than using the content of the structure containing the rcu_head to figure out which per-thread data should be used ?
> 
> If you manage to separate the context from the worker thread instances, then you could use per-cpu call_rcu worker threads, which will eventually scale even better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids [1].
> 
> > 
> > If it's legal to call_rcu() from call_rcu thread, which thread is going to be used?
> 
> The call_rcu invoked from the call_rcu worker thread will queue the call_rcu callback onto the queue handled by that worker thread. It does so by setting
> 
>   URCU_TLS(thread_call_rcu_data) = crdp;
> 
> early in call_rcu_thread(). So any chained call_rcu is handled by the same call_rcu worker thread doing the chaining, with the exception of teardown where the pending callbacks are moved to the default worker thread.
> 
> Thanks,
> 
> Mathieu
> 
> [1] https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoyers@efficios.com/
> 
> 
> > 
> > Thank you,
> > Ondrej
> > --
> > Ondřej Surý (He/Him)
> > ondrej at sury.org
> > 
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>