[lttng-dev] call_rcu seems inefficient without futex

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon Jan 27 10:38:05 EST 2020


----- On Jan 23, 2020, at 7:19 PM, lttng-dev lttng-dev at lists.lttng.org wrote:

> Hi,
> 
> I recently installed knot dns for a very small FreeBSD server. I noticed
> that it uses a surprising amount of CPU, even when there is no load:
> about 0.25%. That's not huge, but it seems unnecessarily high when my
> QPS is less than 0.01.
> 
> After some profiling, I came to the conclusion that this is caused by
> call_rcu_wait using futex_async to repeatedly wait. Since there is no
> futex on FreeBSD (without the Linux compatibility layer), this
> effectively turns into a permanent busy waiting loop.
> 
> I think futex_noasync can be used here instead. call_rcu_wait is only
> supposed to be called from call_rcu_thread, never from a signal context.
> call_rcu calls get_call_rcu_data, which may call
> get_default_call_rcu_data, which calls pthread_mutex_lock through
> call_rcu_lock. Therefore, call_rcu is not async-signal-safe already.

call_rcu() is meant to be async-signal-safe and lock-free after that
initialization has been performed on first use. Paul, do you know where
we have documented this in liburcu ?

> Also, I think it only makes sense to use call_rcu around a RCU write,
> which contradicts the README saying that only RCU reads are allowed in
> signal handlers.

Not sure what you mean by "use call_rcu around a RCU write" ?

Is there anything similar to sys_futex on FreeBSD ?

It would be good to look into alternative ways to fix this that do not
involve changing the guarantees provided by call_rcu() for that fallback
scenario (no futex available). Perhaps in your use-case you may want to
tweak the retry delay for compat_futex_async(). Currently
src/compat_futex.c:compat_futex_async() has a 10ms delay. Would 100ms
be more acceptable ?

Thanks,

Mathieu

> 
> I applied "sed -i -e 's/futex_async/futex_noasync/'
> src/urcu-call-rcu-impl.h" and knot seems to work correctly with only
> 0.01% CPU now. I also ran tests/unit and tests/regression with default
> and signal backends and all completed successfully.
> 
> I think that the other two usages of futex_async are also a little
> suspicious, but I didn't look too closely.
> 
> Thanks,
> Alex.
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list