[lttng-dev] New TLS usage in libgcc_s.so.1, compatibility impact
Adhemerval Zanella Netto
adhemerval.zanella at linaro.org
Mon Jan 15 08:55:06 EST 2024
On 15/01/24 09:46, Szabolcs Nagy wrote:
> The 01/13/2024 13:49, Florian Weimer wrote:
>> This commit
>>
>> commit 8abddb187b33480d8827f44ec655f45734a1749d
>> Author: Andrew Burgess <andrew.burgess at embecosm.com>
>> Date: Sat Aug 5 14:31:06 2023 +0200
>>
>> libgcc: support heap-based trampolines
>>
>> Add support for heap-based trampolines on x86_64-linux, aarch64-linux,
>> and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and
>> __builtin_nested_func_ptr_deleted functions for these targets.
>>
>> Co-Authored-By: Maxim Blinov <maxim.blinov at embecosm.com>
>> Co-Authored-By: Iain Sandoe <iain at sandoe.co.uk>
>> Co-Authored-By: Francois-Xavier Coudert <fxcoudert at gcc.gnu.org>
>>
>> added TLS usage to libgcc_s.so.1. The way that libgcc_s is currently
>> built, it ends up using a dynamic TLS variant on the Linux targets.
>> This means that there is no up-front TLS allocation with glibc (but
>> there would be one with musl).
>>
>> There is still a compatibility impact because glibc assigns a TLS module
>> ID upfront. This seems to be what causes the
>> ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail. We end
>> up with an infinite regress during process termination because
>> libgcc_s.so.1 has been loaded, resulting in a DTV update. When this
>> happens, the bottom of the stack looks like this:
>>
>> #4447 0x00007ffff7f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1
>> #4448 0x00007ffff7fdb142 in free (ptr=<optimized out>)
>> at ../include/rtld-malloc.h:50
>> #4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822
>> #4450 0x00007ffff7fdb214 in update_get_addr (ti=0x7ffff7f2bfc0,
>> gen=<optimized out>) at ../elf/dl-tls.c:916
>> #4451 0x00007ffff7fddccc in __tls_get_addr ()
>> at ../sysdeps/x86_64/tls_get_addr.S:55
>> #4452 0x00007ffff7f288f0 in free () from /lib64/liblttng-ust-libc-wrapper.so.1
>> #4453 0x00007ffff7fdb142 in free (ptr=<optimized out>)
>> at ../include/rtld-malloc.h:50
>> #4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822
>> #4455 0x00007ffff7fdb214 in update_get_addr (ti=0x7ffff7f39fa0,
>> gen=<optimized out>) at ../elf/dl-tls.c:916
>> #4456 0x00007ffff7fddccc in __tls_get_addr ()
>> at ../sysdeps/x86_64/tls_get_addr.S:55
>> #4457 0x00007ffff7f36113 in lttng_ust_cancelstate_disable_push ()
>> from /lib64/liblttng-ust-common.so.1
>> #4458 0x00007ffff7f4c2e8 in ust_lock_nocheck () from /lib64/liblttng-ust.so.1
>> #4459 0x00007ffff7f5175a in lttng_ust_cleanup () from /lib64/liblttng-ust.so.1
>> #4460 0x00007ffff7fca0f2 in _dl_call_fini (
>> closure_map=closure_map at entry=0x7ffff7fbe000) at dl-call_fini.c:43
>> #4461 0x00007ffff7fce06e in _dl_fini () at dl-fini.c:114
>> #4462 0x00007ffff7d82fe6 in __run_exit_handlers () from /lib64/libc.so.6
>>
>> Cc:ing <lttng-dev at lists.lttng.org> for awareness.
>>
>> The issue also requires a recent glibc with changes to DTV management:
>> commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls
>> access after dlopen [BZ #19924]"). If I understand things correctly,
>> before this glibc change, we didn't deallocate the old DTV, so there was
>> no call to the free function.
>
> with 19924 fixed, after a dlopen or dlclose every thread updates
> its dtv on the next dynamic tls access.
>
> before that, dtv was only updated up to the generation of the
> module being accessed for a particular tls access.
>
> so hitting the free in the dtv update path is now more likely
> but the free is not new, it was there before.
>
> also note that this is unlikely to happen on aarch64 since
> tlsdesc only does dynamic tls access after a 512byte static
> tls reservation runs out.
>
>>
>> On the glibc side, we should recommend that intercepting mallocs and its
>> dependencies use initial-exec TLS because that kind of TLS does not use
>> malloc. If intercepting mallocs using dynamic TLS work at all, that's
>> totally by accident, and was in the past helped by glibc bug 19924. (I
>
> right.
>
>> don't think there is anything special about libgcc_s.so.1 that triggers
>> the test failure above, it is just an object with dynamic TLS that is
>> implicitly loaded via dlopen at the right stage of the test.) In this
>> particular case, we can also paper over the test failure in glibc by not
>> call free at all because the argument is a null pointer:
>>
>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>> index 7b3dd9ab60..14c71cbd06 100644
>> --- a/elf/dl-tls.c
>> +++ b/elf/dl-tls.c
>> @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, size_t new_gen)
>> dtv entry free it. Note: this is not AS-safe. */
>> /* XXX Ideally we will at some point create a memory
>> pool. */
>> - free (dtv[modid].pointer.to_free);
>> + if (dtv[modid].pointer.to_free != NULL)
>> + free (dtv[modid].pointer.to_free);
>> dtv[modid].pointer.val = TLS_DTV_UNALLOCATED;
>> dtv[modid].pointer.to_free = NULL;
>
> can be done, but !=NULL is more likely since we do modid reuse
> after dlclose.
>
> there is also a realloc in dtv resizing which happens when more
> than 16 modules with tls are loaded after thread creation
> (DTV_SURPLUS).
>
> i'm not sure if it's worth supporting malloc interposers that
> only work sometimes.
>
Maybe one option would to try reinstate the async-signal-safe TLS
code to avoid malloc/free in dynamic TLS altogether. We revert it on
2.14 release cause it broke ASAN/LSAN [1], but I think we might try
to reinstate on 2.40 and work with sanitizer project to get this sort
out.
[1] https://sourceware.org/pipermail/libc-alpha/2014-January/047931.html
More information about the lttng-dev
mailing list