librseq memory allocator portability

Mon Feb 3 09:52:46 EST 2025

On 2025-02-02 16:48, Ondřej Surý via lttng-dev wrote:
> Hey Mathieu,
> 
> I’m actually thinking about using librseq for some more lightweight stuff first - like statistic counters (especially those related to memory allocations), and my first obvious question would be - how portable is the librseq memory allocator?
> 
> I am not keen on having a code with ifdef spaghetti to support BSDs, so I think that it probably should provide some per-thread memory pools at the expense of just consuming more memory on platforms without rseq system call.

Hi Ondrej,

I have good news: the librseq mempool allocator is completely
independent of the rseq system call. It's just a memory
allocator.

AFAIR, the only optional dependency which is Linux-specific
is on memfd_create for the RSEQ_MEMPOOL_POPULATE_COW_INIT
populate policy. The RSEQ_MEMPOOL_POPULATE_COW_ZERO policy
can be used as fallback on other architectures. It uses
more memory on systems where only few of the possible
cores are used and allocated memory is initialized to
non-zero values.

Indexing per-cpu can fallback to sched_getcpu(3) if
rseq is not available, which should be common enough.
This is actually what the librseq rseq_current_cpu()
does as fallback. So you could just use that static
inline helper.

Now in terms of strategy for using rseq critical sections
for a split-counter use-case, let's see our options:

- You'll want to use rseq_load_add_store__ptr() on your
   fast path. It will return a negative error in case of
   abort, or if the environment does not support rseq:

   A) either your Linux kernel does not have CONFIG_RSEQ or
      is too old,
   B) or your GNU libc does not have rseq support, and you
      did not explicitly call the librseq thread registration.
   C) or your environment does not support rseq at all
      (e.g. BSD).

In the caller code, when handing rseq_load_add_store__ptr()
errors, I would recommend a fallback to an atomic counter
increment of a _second_ counter, so if the fallback is used
in a situation caused by an rseq abort (e.g. a debugger single
stepping within the rseq critical section) it works flawlessly.

Then you'll want to sum up the per-cpu counters for the rseq
fast-path and for the atomic counter slow-path whenever you
read the counter value.

We may need to tweak the librseq code a little bit to handle
case (C) for BSDs, as this has not been a target so far, so
we may need to add a few #ifdefs to cover that case seamlessly.

We have an internal project progressing at EfficiOS which aims
to implement the equivalent of the Linux kernel static keys for
userspace on various architectures. The goal is to dynamically
change the code behavior between a no-op and a jump, thus selecting
between rseq or atomics from a library or program constructor
based on rseq availability.

Please let me know if anything is unclear.

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com