[lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu
Dmitry Vyukov
dvyukov at google.com
Wed May 17 06:21:33 EDT 2023
On Tue, 16 May 2023 at 17:47, Olivier Dion <odion at efficios.com> wrote:
>
> On Tue, 16 May 2023, Dmitry Vyukov via lttng-dev <lttng-dev at lists.lttng.org> wrote:
> > On Mon, 15 May 2023 at 22:18, Olivier Dion <odion at efficios.com> wrote:
>
> Hi Dmitry,
>
> > Where can I see the actual changes? Preferably side-by-side diffs with
> > full context. Are they (or can they be) uploaded somewhere on
> > github/gerrit?
>
> Here's the link to the first patch of the set on our Gerrit
> <https://review.lttng.org/c/userspace-rcu/+/9737>.
>
> > Are there any remaining open questions?
>
> On the Gerrit no. But I would add this for the known issues:
>
> We have a regression test for forking. We get the following when
> running it:
>
> ==23432==ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override)
> With TSAN_OPTIONS=die_after_fork=0, here are the results for GCC and
> Clang.
>
> * gcc 11.3.0
> ==25266==ThreadSanitizer: dup thread with used id 0x7fd40dafe600
>
> Looks like this was fixed with recent merged of TSAN in gcc 13.
>
> * clang 14.0.6
Hi Olivier,
Forking under tsan is a bit tricky. But I see we now take a number of
internal mutexes around fork (but I think still not all of them), so
maybe it's not that bad. If TSAN_OPTIONS=die_after_fork=0 works
reasonably reliably for you, then export it for testing.
Older compilers are missing a number of bug fixes.
So if you can restrict testing to newer compilers only, that's the way to go.
> Tests pass but are very slow. This seems to be because we're calling
> exit(3) in the childs. Changing it to _exit(2) solves the issue. The
> only thing I can think of is that exit(3) is not async-signal-safe like
> _exit(2), yet we do other none-async safe calls like malloc(3) and
> free(3) in the childs.
By default tsan sleeps for 1 second at exit, since exit sequence is a
common source of races. Perhaps it's just these sleeps. Try
TSAN_OPTIONS=atexit_sleep_ms=0 (or maybe =50).
> Here are some perf records of that:
>
> With exit(3)
> ```
> 12.35% test_urcu_fork. [unknown] [k] 0xffffffff89ebd22b
> 8.69% test_urcu_fork. [unknown] [k] 0xffffffff89ebd8fa
> 7.50% test_urcu_fork. [unknown] [k] 0xffffffff8a001360
> 6.11% test_urcu_fork. test_urcu_fork.tap [.] __sanitizer::internal_memset
> 5.85% test_urcu_fork. test_urcu_fork.tap [.] __tsan::ForkChildAfter
> 1.96% test_urcu_fork. [unknown] [k] 0xffffffff892f8efd
> 1.88% test_urcu_fork. [unknown] [k] 0xffffffff89ec8e2c
> 1.29% test_urcu_fork. [unknown] [k] 0xffffffff890a99d1
> 0.79% test_urcu_fork. [unknown] [k] 0xffffffff8918c566
> 0.75% test_urcu_fork. test_urcu_fork.tap [.] __sanitizer::ThreadRegistry::OnFork
> 0.71% test_urcu_fork. [unknown] [k] 0xffffffff89349574
> 0.64% test_urcu_fork. [unknown] [k] 0xffffffff89ee424f
> 0.57% test_urcu_fork. test_urcu_fork.tap [.] __tsan::MetaMap::AllocBlock
> 0.55% test_urcu_fork. [unknown] [k] 0xffffffff89367249
> 0.53% test_urcu_fork. test_urcu_fork.tap [.] __tsan_read8
> 0.51% test_urcu_fork. [unknown] [k] 0xffffffff89ec8e05
> ```
>
> With _exit(2)
> ```
> 12.26% test_urcu_fork. [unknown] [k] 0xffffffff89ebd8fa
> 9.51% test_urcu_fork. [unknown] [k] 0xffffffff89ebd22b
> 6.78% test_urcu_fork. test_urcu_fork.tap [.] __sanitizer::internal_memset
> 6.49% test_urcu_fork. [unknown] [k] 0xffffffff8a001360
> 4.92% test_urcu_fork. test_urcu_fork.tap [.] __tsan::ForkChildAfter
> 2.92% test_urcu_fork. [unknown] [k] 0xffffffff892f8efd
> 2.19% test_urcu_fork. [unknown] [k] 0xffffffff89ec8e2c
> 1.88% test_urcu_fork. [unknown] [k] 0xffffffff890a99d1
> 0.83% test_urcu_fork. test_urcu_fork.tap [.] __tsan::MetaMap::AllocBlock
> 0.72% test_urcu_fork. [unknown] [k] 0xffffffff892f8f1c
> 0.67% test_urcu_fork. [unknown] [k] 0xffffffff8918c566
> 0.65% test_urcu_fork. [unknown] [k] 0xffffffff89ec16dd
> 0.62% test_urcu_fork. test_urcu_fork.tap [.] __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate
> 0.61% test_urcu_fork. test_urcu_fork.tap [.] __tsan_write4
> 0.60% test_urcu_fork. [unknown] [k] 0xffffffff89367249
> 0.59% test_urcu_fork. [unknown] [k] 0xffffffff89ee424f
> 0.50% test_urcu_fork. test_urcu_fork.tap [.] __sanitizer::ThreadRegistry::OnFork
> 0.50% test_urcu_fork. test_urcu_fork.tap [.] __tsan_write8
> 0.47% test_urcu_fork. test_urcu_fork.tap [.] __tsan::ForkBefore
> 0.45% test_urcu_fork. test_urcu_fork.tap [.] __tsan_func_entry
> ```
>
> > Is there CI coverage with -fsanitize=thread?
>
> Since the urcu-signal flavor deadlocks with TSAN (see reproducer in
> commit log), the CI simply timeouts. I do however have a job with TSAN
> as an matrix axis here <https://ci.lttng.org/job/dev_odion_liburcu/>.
Sounds good.
More information about the lttng-dev
mailing list