On 2023-03-22 07:08, Ondřej Surý via lttng-dev wrote:
> Hi,
> the documentation is pretty silent on this, and asking here is probably going to be faster
> than me trying to use the source to figure this out.
> Is it legal to call_rcu() from within the call_rcu() callback?

Yes. call_rcu callbacks can be chained.

Note that you'll need to issue rcu_barrier() on program exit as many times as you chained call_rcu callbacks if you intend to make sure no queued callbacks still exist on program clean shutdown. See this comment above urcu_call_rcu_exit():

  * Teardown the default call_rcu worker thread if there are no queued
  * callbacks on process exit. This prevents leaking memory.
  * Here is how an application can ensure graceful teardown of this
  * worker thread:
  * - An application queuing call_rcu callbacks should invoke
  *   rcu_barrier() before it exits.
  * - When chaining call_rcu callbacks, the number of calls to
  *   rcu_barrier() on application exit must match at least the maximum
  *   number of chained callbacks.
  * - If an application chains callbacks endlessly, it would have to be
  *   modified to stop chaining callbacks when it detects an application
  *   exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
  *   after setting that flag.
  * - The statements above apply to a library which queues call_rcu
  *   callbacks, only it needs to invoke rcu_barrier in its library
  *   destructor.

> What about the other RCU (and CDS) API calls?

They can be unless stated otherwise. For instance, rcu_barrier() cannot be called from a call_rcu worker thread.

> How does that interact with create_call_rcu_data()?  I have <n> event loops and I am
> initializing <n> 1:1 call_rcu helper threads as I need to do some per-thread initialization
> as some of the destroy-like functions use random numbers (don't ask).

As I recall, set_thread_call_rcu_data() will associate a call_rcu worker instance for the current thread. So all following call_rcu() invocations from that thread will be queued into this per-thread call_rcu queue, and handled by the call_rcu worker thread.

But I wonder why you inherently need this 1:1 mapping, rather than using the content of the structure containing the rcu_head to figure out which per-thread data should be used ?

If you manage to separate the context from the worker thread instances, then you could use per-cpu call_rcu worker threads, which will eventually scale even better when I integrate the liburcu call_rcu API with sys_rseq concurrency ids [1].

> If it's legal to call_rcu() from call_rcu thread, which thread is going to be used?

The call_rcu invoked from the call_rcu worker thread will queue the call_rcu callback onto the queue handled by that worker thread. It does so by setting

   URCU_TLS(thread_call_rcu_data) = crdp;

early in call_rcu_thread(). So any chained call_rcu is handled by the same call_rcu worker thread doing the chaining, with the exception of teardown where the pending callbacks are moved to the default worker thread.



[1] https://lore.kernel.org/lkml/20221122203932.231377-1-mathieu.desnoyers@efficios.com/

