[lttng-dev] What is the effect of not preloading lttng-ust-fork?

Jonathan Rajotte Julien Jonathan.rajotte-julien at efficios.com
Wed Apr 26 16:05:34 UTC 2017


Hi,

On 2017-04-23 09:56 PM, Douglas Graham wrote:
> Hi,
> 
>  
> 
> We have an init-like top-level application that starts a bunch of other daemons. This top-level daemon is started with LD_PRELOAD= /usr/lib/liblttng-ust-fork.so.0 so all the daemons that it starts inherit that.  Sometimes we want to debug one of these child applications, so we arrange that instead of starting it directly, the top-level daemon starts it like so: “gdbserver :23 <path/to/real/daemon>”.  This arranges that the child daemon not actually run until a remote debugger attaches.  This scheme seemed to work reasonably well until recently.  It still works maybe half the time, but the other half, gdbserver crashes in lttng related code.  The backtrace is below.  As a quick fix, I have tried starting gdbserver and the child application like so: “/usr/bin/env -u LD_PRELOAD gdbserver :23 <path/to/real/daemon>”.  In the limited testing I’ve done so far, this is looking good, but I’m wondering what other problems might turn up if I don’t preload liblttng-ust-fork into
> gdbserver and the application it is debugging.  Gdbserver does fork after all, and the application might as well.  Does this seem like a reasonable long-term solution for using the remote debugger?
> 
>  
> 
> Here’s the crash backtrace:
> 
>  
> 
> (gdb) bt
> 
> #0  0xb6e8c810 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
> 
> #1  0xb6e9028c in __GI_abort () at abort.c:89
> 
> #2  0xb6e08198 in lttng_ust_elf_destroy (elf=0x2, elf at entry=0xb6f9a088 <lock>)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-elf.c:342
> 
> #3  0xb6e08bec in get_elf_info (bin_data=0xb6541308) at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-statedump.c:165
> 
> #4  trace_baddr (bin_data=0xb6541308) at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-statedump.c:175
> 
> #5  extract_bin_info_events (info=info at entry=0xb6542360, size=size at entry=40, _data=_data at entry=0xb65423b8)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-statedump.c:298
> 
> #6  0xb6f65a78 in __GI___dl_iterate_phdr (callback=0xb6e08a34 <extract_bin_info_events>, data=data at entry=0xb65423b8) at dl-iteratephdr.c:76
> 
> #7  0xb6e09c9c in do_baddr_statedump (owner=0xb6e5a618 <global_apps>)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-statedump.c:332
> 
> #8  do_lttng_ust_statedump (owner=owner at entry=0xb6e5a618 <global_apps>)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-statedump.c:355
> 
> #9  0xb6e007dc in lttng_handle_pending_statedump (owner=owner at entry=0xb6e5a618 <global_apps>)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-events.c:857
> 
> #10 0xb6df864c in handle_pending_statedump (sock_info=0xb6e5a618 <global_apps>)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-comm.c:599
> 
> #11 handle_message (lum=0xb654286c, sock=<optimized out>, sock_info=<optimized out>)
> 
>     at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-comm.c:957
> 
> #12 ust_listener_thread (arg=0xb6e5a618 <global_apps>) at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-comm.c:1520
> 
> #13 0xb6d49f6c in start_thread (arg=0xb6543420) at pthread_create.c:339
> 
>  
> 
> I am not able to easily examine the data in lttng_ust_elf_destroy() because as you can see from the argument list to that function, gdb is confused about the value of the “elf” pointer (I find that gdb often gets confused on ARM when unwinding from a call to abort()).  But I can see that it aborted  because a close() failed:
> 
>  
> 
>         if (close(elf->fd)) {
> 
>                 abort();
> 
>         }
> 
>  
> 
> The two other threads in the gdbserver application were here at the time of the crash:
> 
>  
> 
> #0  syscall () at ../sysdeps/unix/sysv/linux/arm/syscall.S:37
> 
> #1  0xb6df7cd0 in futex (val3=0, uaddr2=0x0, timeout=0x0, val=0, op=0, uaddr=<optimized out>)
> 
>     at /repo/ngrbuilder/jenkins/ProductionBuild_0/xcs-build/build/xenon/bitbake_build/tmp/sysroots/xenon/usr/include/urcu/futex.h:65
> 
> #2  futex_async (op=0, val=0, timeout=0x0, uaddr2=0x0, val3=0, uaddr=<optimized out>)
> 
>     at /repo/ngrbuilder/jenkins/ProductionBuild_0/xcs-build/build/xenon/bitbake_build/tmp/sysroots/xenon/usr/include/urcu/futex.h:97
> 
> #3  wait_for_sessiond (sock_info=<optimized out>) at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-comm.c:1276
> 
> #4  ust_listener_thread (arg=0xb6e585ec <local_apps>) at /usr/src/debug/lttng-ust/2.8.2-r3.eriR0/git/liblttng-ust/lttng-ust-comm.c:1330
> 
> #5  0xb6d49f6c in start_thread (arg=0xb6d43420) at pthread_create.c:339
> 
>  
> 
> #0  0xb6efd7b8 in execve () at ../sysdeps/unix/syscall-template.S:84
> 
> #1  0xb6efd8c8 in execv (path=<optimized out>, argv=<optimized out>) at execv.c:25
> 
> #2  0x00035b40 in linux_create_inferior (program=0x6bc80 "/opt/radiosw/bin/ngr2mpw.elf", allargs=0x6bc70)


I was not able to reproduce locally the problem so testing the following and reporting to us the result would be very helpful.

After some digging, linux_create_inferior seems to perform bulk closing after the fork.

    linux_create_inferior -> close_most_fds -> fdwalk with do_close.
	
lttng-ust live inside the process hence gdbserver could close the fds used by lttng-ust. Since lttng-ust can close fd already closed it would results in a double close.

I would suspect this is the primary source of problem. You will need to ld_preload the liblttng-ust-fd library available with lttng-ust 2.9+ to prevent this kind of problem.

The commit introducing liblttng-ust-fd:

commit 95e6d2685eec83a144f533a447ffd355fadf9d79
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date:   Mon Sep 12 12:04:13 2016 -0400

    Implement liblttng-ust-fd
    
    This library overrides close() and closeall() libc functions, and uses
    lttng_ust_safe_close_fd() to check whether the application can
    interact with the file descriptor or if it should be left to lttng-ust.
    
    This takes care of bugs caused by applications doing bulk close() or
    closefrom() of file descriptors soon after forking.
    
    Fixes: #253
    Fixes: #626
    
    CC: Aravind HT <aravind.ht at gmail.com>
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>


It looks like you are using lttng-ust 2.8. You can either upgrade and use liblttng-ust-fd;liblttng-ust-fork or use the wrapper functionality of gdbserver. e.g.:

/usr/bin/env -u LD_PRELOAD gdbserver --wrapper env LD_PRELOAD=<path/to/liblttng-ust-fork> -- :23 <path/to/real/daemon>

This should prevent the LD_PRELOAD for gdbserver and will LD_PRELOAD liblttng-ust-fork for the daemon.

Cheers

>     at /usr/src/debug/gdb/7.10.1-r0/gdb-7.10.1/gdb/gdbserver/linux-low.c:889
> 
> #3  0x00018e30 in start_inferior (argv=0x6bc70) at /usr/src/debug/gdb/7.10.1-r0/gdb-7.10.1/gdb/gdbserver/server.c:233
> 
> #4  0x0001aa58 in captured_main (argc=argc at entry=-1225151608, argv=argv at entry=0x3fda8 <__libc_csu_init+32>)
> 
>     at /usr/src/debug/gdb/7.10.1-r0/gdb-7.10.1/gdb/gdbserver/server.c:3516
> 
> #5  0x0001250c in main (argc=-1225151608, argv=0x3fda8 <__libc_csu_init+32>)
> 
>  
> 
> Thanks,
> 
> Doug
> 
>  
> 
>  
> 
>  
> 
> 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 

-- 
Jonathan R. Julien
Efficios


More information about the lttng-dev mailing list