[lttng-dev] [PATCH 00/11] Add support for TSAN to liburcu

Dmitry Vyukov dvyukov at google.com
Wed May 17 06:57:36 EDT 2023


I've skimmed through the changes. They look sane. You know what you
are doing. So unless you have any other open issues, I don't have much
to add.

On Wed, 17 May 2023 at 12:21, Dmitry Vyukov <dvyukov at google.com> wrote:
> > Hi Dmitry,
> >
> > > Where can I see the actual changes? Preferably side-by-side diffs with
> > > full context. Are they (or can they be) uploaded somewhere on
> > > github/gerrit?
> >
> > Here's the link to the first patch of the set on our Gerrit
> > <https://review.lttng.org/c/userspace-rcu/+/9737>.
> >
> > > Are there any remaining open questions?
> >
> > On the Gerrit no.  But I would add this for the known issues:
> >
> > We have a regression test for forking.  We get the following when
> > running it:
> >
> > ==23432==ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override)
>
> > With TSAN_OPTIONS=die_after_fork=0, here are the results for GCC and
> > Clang.
> >
> > * gcc 11.3.0
> >   ==25266==ThreadSanitizer: dup thread with used id 0x7fd40dafe600
> >
> >   Looks like this was fixed with recent merged of TSAN in gcc 13.
> >
> > * clang 14.0.6
>
> Hi Olivier,
>
> Forking under tsan is a bit tricky. But I see we now take a number of
> internal mutexes around  fork (but I think still not all of them), so
> maybe it's not that bad. If  TSAN_OPTIONS=die_after_fork=0 works
> reasonably reliably for you, then export it for testing.
>
> Older compilers are missing a number of bug fixes.
> So if you can restrict testing to newer compilers only, that's the way to go.
>
>
> > Tests pass but are very slow.  This seems to be because we're calling
> > exit(3) in the childs.  Changing it to _exit(2) solves the issue.  The
> > only thing I can think of is that exit(3) is not async-signal-safe like
> > _exit(2), yet we do other none-async safe calls like malloc(3) and
> > free(3) in the childs.
>
> By default tsan sleeps for 1 second at exit, since exit sequence is a
> common source of races. Perhaps it's just these sleeps. Try
> TSAN_OPTIONS=atexit_sleep_ms=0 (or maybe =50).
>
>
> > Here are some perf records of that:
> >
> > With exit(3)
> > ```
> >   12.35%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd22b
> >    8.69%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ebd8fa
> >    7.50%  test_urcu_fork.  [unknown]                [k] 0xffffffff8a001360
> >    6.11%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::internal_memset
> >    5.85%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::ForkChildAfter
> >    1.96%  test_urcu_fork.  [unknown]                [k] 0xffffffff892f8efd
> >    1.88%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e2c
> >    1.29%  test_urcu_fork.  [unknown]                [k] 0xffffffff890a99d1
> >    0.79%  test_urcu_fork.  [unknown]                [k] 0xffffffff8918c566
> >    0.75%  test_urcu_fork.  test_urcu_fork.tap       [.] __sanitizer::ThreadRegistry::OnFork
> >    0.71%  test_urcu_fork.  [unknown]                [k] 0xffffffff89349574
> >    0.64%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ee424f
> >    0.57%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan::MetaMap::AllocBlock
> >    0.55%  test_urcu_fork.  [unknown]                [k] 0xffffffff89367249
> >    0.53%  test_urcu_fork.  test_urcu_fork.tap       [.] __tsan_read8
> >    0.51%  test_urcu_fork.  [unknown]                [k] 0xffffffff89ec8e05
> > ```
> >
> > With _exit(2)
> > ```
> >   12.26%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd8fa
> >    9.51%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ebd22b
> >    6.78%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::internal_memset
> >    6.49%  test_urcu_fork.  [unknown]             [k] 0xffffffff8a001360
> >    4.92%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkChildAfter
> >    2.92%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8efd
> >    2.19%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec8e2c
> >    1.88%  test_urcu_fork.  [unknown]             [k] 0xffffffff890a99d1
> >    0.83%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::MetaMap::AllocBlock
> >    0.72%  test_urcu_fork.  [unknown]             [k] 0xffffffff892f8f1c
> >    0.67%  test_urcu_fork.  [unknown]             [k] 0xffffffff8918c566
> >    0.65%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ec16dd
> >    0.62%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator32<__sanitizer::AP32>, __sanitizer::LargeMmapAllocatorPtrArrayStatic>::Allocate
> >    0.61%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write4
> >    0.60%  test_urcu_fork.  [unknown]             [k] 0xffffffff89367249
> >    0.59%  test_urcu_fork.  [unknown]             [k] 0xffffffff89ee424f
> >    0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __sanitizer::ThreadRegistry::OnFork
> >    0.50%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_write8
> >    0.47%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan::ForkBefore
> >    0.45%  test_urcu_fork.  test_urcu_fork.tap    [.] __tsan_func_entry
> > ```
> >
> > > Is there CI coverage with -fsanitize=thread?
> >
> > Since the urcu-signal flavor deadlocks with TSAN (see reproducer in
> > commit log), the CI simply timeouts.  I do however have a job with TSAN
> > as an matrix axis here <https://ci.lttng.org/job/dev_odion_liburcu/>.
>
> Sounds good.


More information about the lttng-dev mailing list