[lttng-dev] RCU API usage from call_rcu callbacks?
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Wed Mar 22 09:57:25 EDT 2023
On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:
> Hi,
>
> the documentation is pretty silent on this, and asking here is probably going to be faster
> than me trying to use the source to figure this out.
>
> Is it legal to call_rcu() from within the call_rcu() callback?
Yes. call_rcu callbacks can be chained.
Note that you'll need to issue rcu_barrier() on program exit as many times as you chained call_rcu callbacks if you intend to make sure no queued callbacks still exist on program clean shutdown. See this comment above urcu_call_rcu_exit():
* Teardown the default call_rcu worker thread if there are no queued
* callbacks on process exit. This prevents leaking memory.
*
* Here is how an application can ensure graceful teardown of this
* worker thread:
*
* - An application queuing call_rcu callbacks should invoke
* rcu_barrier() before it exits.
* - When chaining call_rcu callbacks, the number of calls to
* rcu_barrier() on application exit must match at least the maximum
* number of chained callbacks.
* - If an application chains callbacks endlessly, it would have to be
* modified to stop chaining callbacks when it detects an application
* exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
* after setting that flag.
* - The statements above apply to a library which queues call_rcu
* callbacks, only it needs to invoke rcu_barrier in its library
* destructor.
>
> What about the other RCU (and CDS) API calls?
They can be unless stated otherwise. For instance, rcu_barrier() cannot be called from a call_rcu worker thread.
>
> How does that interact with create_call_rcu_data()? I have <n> event loops and I am
> initializing <n> 1:1 call_rcu helper threads as I need to do some per-thread initialization
> as some of the destroy-like functions use random numbers (don't ask).
As I recall, set_thread_call_rcu_data() will associate a call_rcu worker instance for the current thread. So all following call_rcu() invocations from that thread will be queued into this per-thread call_rcu queue, and handled by the call_rcu worker thread.
But I wonder why you inherently need this 1:1 mapping, rather than using the content of the structure containing the rcu_head to figure out which per-thread data should be used ?
If you manage to separate the context from the worker thread instances, then you could use per-cpu call_rcu worker threads, which will eventually scale even better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids [1].
>
> If it's legal to call_rcu() from call_rcu thread, which thread is going to be used?
The call_rcu invoked from the call_rcu worker thread will queue the call_rcu callback onto the queue handled by that worker thread. It does so by setting
URCU_TLS(thread_call_rcu_data) = crdp;
early in call_rcu_thread(). So any chained call_rcu is handled by the same call_rcu worker thread doing the chaining, with the exception of teardown where the pending callbacks are moved to the default worker thread.
Thanks,
Mathieu
[1] https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoyers@efficios.com/
>
> Thank you,
> Ondrej
> --
> Ondřej Surý (He/Him)
> ondrej at sury.org
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
More information about the lttng-dev
mailing list