[lttng-dev] RCU API usage from call_rcu callbacks?

Wed Mar 22 09:57:25 EDT 2023

On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:
> Hi,
> 
> the documentation is pretty silent on this, and asking here is probably going to be faster
> than me trying to use the source to figure this out.
> 
> Is it legal to call_rcu() from within the call_rcu() callback?

Yes. call_rcu callbacks can be chained.

Note that you'll need to issue rcu_barrier() on program exit as many times as you chained call_rcu callbacks if you intend to make sure no queued callbacks still exist on program clean shutdown. See this comment above urcu_call_rcu_exit():

  * Teardown the default call_rcu worker thread if there are no queued
  * callbacks on process exit. This prevents leaking memory.
  *
  * Here is how an application can ensure graceful teardown of this
  * worker thread:
  *
  * - An application queuing call_rcu callbacks should invoke
  *   rcu_barrier() before it exits.
  * - When chaining call_rcu callbacks, the number of calls to
  *   rcu_barrier() on application exit must match at least the maximum
  *   number of chained callbacks.
  * - If an application chains callbacks endlessly, it would have to be
  *   modified to stop chaining callbacks when it detects an application
  *   exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
  *   after setting that flag.
  * - The statements above apply to a library which queues call_rcu
  *   callbacks, only it needs to invoke rcu_barrier in its library
  *   destructor.

> 
> What about the other RCU (and CDS) API calls?

They can be unless stated otherwise. For instance, rcu_barrier() cannot be called from a call_rcu worker thread.

> 
> How does that interact with create_call_rcu_data()?  I have <n> event loops and I am
> initializing <n> 1:1 call_rcu helper threads as I need to do some per-thread initialization
> as some of the destroy-like functions use random numbers (don't ask).

As I recall, set_thread_call_rcu_data() will associate a call_rcu worker instance for the current thread. So all following call_rcu() invocations from that thread will be queued into this per-thread call_rcu queue, and handled by the call_rcu worker thread.

But I wonder why you inherently need this 1:1 mapping, rather than using the content of the structure containing the rcu_head to figure out which per-thread data should be used ?

If you manage to separate the context from the worker thread instances, then you could use per-cpu call_rcu worker threads, which will eventually scale even better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids [1].

> 
> If it's legal to call_rcu() from call_rcu thread, which thread is going to be used?

The call_rcu invoked from the call_rcu worker thread will queue the call_rcu callback onto the queue handled by that worker thread. It does so by setting

   URCU_TLS(thread_call_rcu_data) = crdp;

early in call_rcu_thread(). So any chained call_rcu is handled by the same call_rcu worker thread doing the chaining, with the exception of teardown where the pending callbacks are moved to the default worker thread.

Thanks,

Mathieu

[1] https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoyers@efficios.com/

> 
> Thank you,
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej at sury.org
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com