From mathieu.desnoyers at efficios.com Tue Nov 4 15:17:17 2025 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 4 Nov 2025 15:17:17 -0500 Subject: [RELEASE] Userspace RCU 0.15.4 Message-ID: <43ad1f76-3682-47f4-b3c9-62a94053db3c@efficios.com> Hi, This is a patchlevel release of the Userspace RCU library. The most relevant change in this release is the removal of a redundant memory barrier on x86 for store and RMW operations with the CMM_SEQ_CST_FENCE memory ordering. This addresses a performance regression for users of the pre-0.15 uatomic API that build against a liburcu configured to use compiler builtins for atomics (--enable-compiler-atomic-builtins). As a reminder, the CMM_SEQ_CST_FENCE MO is a superset of SEQ_CST: it provides sequential consistency _and_ acts as a full memory barrier, similarly to the semantic associated with cmpxchg() and atomic_add_return() within the LKMM. Here is the rationale for this change: /* * On x86, a atomic store with sequential consistency is always implemented with * an exchange operation, which has an implicit lock prefix when a memory operand * is used. * * Indeed, on x86, only loads can be re-ordered with prior stores. Therefore, * for keeping sequential consistency, either load operations or store * operations need to have a memory barrier. All major toolchains have selected * the store operations to have this barrier to avoid penalty on load * operations. * * Therefore, assuming that the used toolchain follows this convention, it is * safe to rely on this implicit memory barrier to implement the * `CMM_SEQ_CST_FENCE` memory order and thus no further barrier need to be * emitted. */ #define cmm_seq_cst_fence_after_atomic_store(...) \ do { } while (0) /* * Let the default implementation (emit a memory barrier) after load operations * for the `CMM_SEQ_CST_FENCE`. The rationale is explained above for * `cmm_seq_cst_fence_after_atomic_store()`. */ /* #define cmm_seq_cst_fence_after_atomic_load(...) */ /* * On x86, atomic read-modify-write operations always have a lock prefix either * implicitly or explicitly for sequential consistency. * * Therefore, no further memory barrier, for the `CMM_SEQ_CST_FENCE` memory * order, needs to be emitted for these operations. */ #define cmm_seq_cst_fence_after_atomic_rmw(...) \ do { } while (0) Changelog: 2025-11-04 Userspace RCU 0.15.4 * uatomic: Fix redundant memory barriers for atomic builtin operations * Cleanup: Remove useless declarations from urcu-qsbr * src/urcu-bp.c: assert => urcu_posix_assert * ppc.h: improve ppc64 caa_get_cycles on Darwin Thanks, Mathieu Project website: https://liburcu.org Git repository: https://git.liburcu.org/userspace-rcu.git -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com From mathieu.desnoyers at efficios.com Mon Nov 10 20:35:35 2025 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 10 Nov 2025 20:35:35 -0500 Subject: [RELEASE] Userspace RCU 0.15.5 Message-ID: Hi, This is the 0.15.5 release of liburcu. The most relevant change introduced by this release is the use of "lock; addl" to replace the "mfence" instruction for cmm_smp_mb() on x86-64 when users build liburcu without "compiler builtins" atomics. Users wishing to synchronize with I/O already need to use cmm_mb(). This is motivated by the fact that "lock; addl" is significantly faster than "mfence". Detailed changelog: 2025-11-10 Userspace RCU 0.15.5 * x86: Define cmm_smp_mb() as lock; addl rather than mfence * Introduce barrier test * Add test_uatomic to gitignore * Cleanup: Remove stray space * benchmark: Add uatomic benchmark Project website: https://liburcu.org Git repository: https://git.liburcu.org/userspace-rcu.git Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com