[lttng-dev] [lttng-tools 2/2] Fix: relayd: revents may be 0 when using poll()

Jérémie Galarneau jeremie.galarneau at efficios.com
Thu Nov 27 21:21:38 EST 2014


I applied both patches and the
tests/regression/tools/streaming/test_ust seems to be hanging when it
tries to enable events.
Make sure you build with --disable-epoll.

Here's the session daemon's (annotated) backtrace, threads 8 and 9 are
relevant here:

Thread 11 (Thread 0x7fd301bbb700 (LWP 20906)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd301bbabd0, timeout=-1)
    at compat-poll.c:279
#2  0x000000000042052a in thread_ht_cleanup (data=0x0) at ht-cleanup.c:70
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 10 (Thread 0x7fd3013ba700 (LWP 20908)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd3013b9ba0, timeout=-1)
    at compat-poll.c:279
#2  0x0000000000447f77 in thread_manage_health (data=0x0) at main.c:3875
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 9 (Thread 0x7fd300bb9700 (LWP 20909)):
*** Okay, we're waiting for the consumer's status... Taking an awfully
long time... ***
#0  0x00007fd3025b478d in recvmsg () from /usr/lib/libpthread.so.0
#1  0x000000000044e471 in lttcomm_recv_unix_sock (sock=22,
buf=0x7fd300bb6f70, len=4)
    at unix.c:176
#2  0x0000000000410d4e in consumer_socket_recv (socket=0x7fd2dc004940,
msg=0x7fd300bb6f70,
    len=4) at consumer.c:106
#3  0x0000000000410e59 in consumer_recv_status_reply
(sock=0x7fd2dc004940) at consumer.c:141
#4  0x000000000041259a in consumer_send_msg (sock=0x7fd2dc004940,
msg=0x7fd300bb6fe0)
    at consumer.c:752
#5  0x0000000000439fd3 in ust_consumer_get_channel (socket=0x7fd2dc004940,
    ua_chan=0x7fd2dc0139c0) at ust-consumer.c:262
#6  0x00000000004317ff in do_consumer_create_channel (usess=0x7fd2dc003cd0,
    ua_sess=0x7fd2dc0062b0, ua_chan=0x7fd2dc0139c0, bitness=64,
registry=0x7fd2dc004d50)
    at ust-app.c:2200
#7  0x000000000043243d in create_channel_per_uid (app=0x7fd2d80008e0,
usess=0x7fd2dc003cd0,
    ua_sess=0x7fd2dc0062b0, ua_chan=0x7fd2dc0139c0) at ust-app.c:2534
#8  0x000000000043280a in do_create_channel (app=0x7fd2d80008e0,
usess=0x7fd2dc003cd0,
    ua_sess=0x7fd2dc0062b0, ua_chan=0x7fd2dc0139c0) at ust-app.c:2654
#9  0x00000000004329ba in create_ust_app_channel (ua_sess=0x7fd2dc0062b0,
    uchan=0x7fd2dc005610, app=0x7fd2d80008e0, type=LTTNG_UST_CHAN_PER_CPU,
    usess=0x7fd2dc003cd0, ua_chanp=0x0) at ust-app.c:2727
#10 0x0000000000435560 in ust_app_create_channel_glb (usess=0x7fd2dc003cd0,
    uchan=0x7fd2dc005610) at ust-app.c:3668
#11 0x000000000040e0a5 in channel_ust_create (usess=0x7fd2dc003cd0,
attr=0x7fd2dc0053a0,
    type=LTTNG_BUFFER_PER_UID) at channel.c:362
#12 0x00000000004199ba in cmd_enable_channel (session=0x7fd2dc007d00,
domain=0x7fd2dc001a5c,
    attr=0x7fd2dc0053a0, wpipe=-1) at cmd.c:1011
#13 0x000000000041a72c in cmd_enable_event (session=0x7fd2dc007d00,
domain=0x7fd2dc001a5c,
    channel_name=0x7fd2dc001b90 "", event=0x7fd2dc001c90,
filter_expression=0x0, filter=0x0,
    exclusion=0x0, wpipe=-1) at cmd.c:1527
#14 0x0000000000446668 in process_client_msg (cmd_ctx=0x7fd2dc000900, sock=20,
    sock_error=0x7fd300bb8bcc) at main.c:3266
#15 0x0000000000448c96 in thread_manage_clients (data=0x0) at main.c:4152
#16 0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#17 0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 8 (Thread 0x7fd3003b8700 (LWP 20910)):

#0  0x00007fd3025b411c in __lll_lock_wait () from /usr/lib/libpthread.so.0
^
*** Thread 9 is taking this lock in ust_app_create_channel_glb() ***

#1  0x00007fd3025ae895 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#2  0x000000000041405c in session_lock_list () at session.c:128
#3  0x0000000000442a85 in thread_dispatch_ust_registration (data=0x0)
at main.c:1845
#4  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 7 (Thread 0x7fd2ffbb7700 (LWP 20911)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd2ffbb6b90, timeout=-1)
    at compat-poll.c:279
#2  0x0000000000442e9c in thread_registration_apps (data=0x0) at main.c:1991
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 6 (Thread 0x7fd2ff3b6700 (LWP 20912)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd2ff3b5bd0, timeout=-1)
    at compat-poll.c:279
#2  0x0000000000441b57 in thread_manage_apps (data=0x0) at main.c:1469
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 5 (Thread 0x7fd2febb5700 (LWP 20914)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd2febb4bc0, timeout=-1)
    at compat-poll.c:279
#2  0x000000000043b169 in ust_thread_manage_notify (data=0x0) at ust-thread.c:74
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 4 (Thread 0x7fd2fe3b4700 (LWP 20915)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd2fe3b3bd0, timeout=-1)
    at compat-poll.c:279
#2  0x000000000043d376 in agent_thread_manage_registration (data=0x0)
at agent-thread.c:315
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 3 (Thread 0x7fd2fdbb3700 (LWP 20962)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd2fdbb2b90, timeout=-1)
    at compat-poll.c:279
#2  0x0000000000441385 in thread_manage_consumer (data=0x68fcc0
<ustconsumer64_data>)
    at main.c:1293
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 2 (Thread 0x7fd2fd3b2700 (LWP 20970)):
#0  0x00007fd3020cb7bd in poll () from /usr/lib/libc.so.6
#1  0x000000000045e40b in compat_poll_wait (events=0x7fd2fd3b1b90, timeout=-1)
    at compat-poll.c:279
#2  0x0000000000441385 in thread_manage_consumer (data=0x691e00
<ustconsumer32_data>)
    at main.c:1293
#3  0x00007fd3025ac314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fd3020d45bd in clone () from /usr/lib/libc.so.6

Thread 1 (Thread 0x7fd3046dca00 (LWP 20905)):
#0  0x00007fd3025ad505 in pthread_join () from /usr/lib/libpthread.so.0
#1  0x000000000044cbd3 in main (argc=4, argv=0x7fff0898e418) at main.c:5489



Basically, the session daemon seems to be waiting on the consumer's
reply... Let's look at that side of the equation.

Thread 7 (Thread 0x7fe85c22e700 (LWP 20971)):
#0  0x00007fe85c5317bd in poll () from /usr/lib/libc.so.6
#1  0x0000000000424e40 in compat_poll_wait (events=0x7fe85c22db90, timeout=-1)
    at compat-poll.c:279
#2  0x0000000000404d4c in thread_manage_health (data=0x0) at
health-consumerd.c:246
#3  0x00007fe85ca12314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fe85c53a5bd in clone () from /usr/lib/libc.so.6

Thread 6 (Thread 0x7fe85ba2d700 (LWP 20972)):
#0  0x00007fe85c5317bd in poll () from /usr/lib/libc.so.6
#1  0x0000000000424e40 in compat_poll_wait (events=0x7fe85ba2cb50, timeout=-1)
    at compat-poll.c:279
#2  0x000000000040d220 in consumer_thread_channel_poll (data=0xbf3010)
at consumer.c:2765
#3  0x00007fe85ca12314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fe85c53a5bd in clone () from /usr/lib/libc.so.6

Thread 5 (Thread 0x7fe85b22c700 (LWP 20973)):
#0  0x00007fe85c5317bd in poll () from /usr/lib/libc.so.6
#1  0x0000000000424e40 in compat_poll_wait (events=0x7fe85b22bb90, timeout=-1)
    at compat-poll.c:279
#2  0x000000000040b118 in consumer_thread_metadata_poll
(data=0xbf3010) at consumer.c:2195
#3  0x00007fe85ca12314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fe85c53a5bd in clone () from /usr/lib/libc.so.6

Thread 4 (Thread 0x7fe85aa2b700 (LWP 20975)):
#0  0x00007fe85c5317bd in poll () from /usr/lib/libc.so.6
#1  0x000000000040be2d in consumer_thread_data_poll (data=0xbf3010) at
consumer.c:2436
#2  0x00007fe85ca12314 in start_thread () from /usr/lib/libpthread.so.0
#3  0x00007fe85c53a5bd in clone () from /usr/lib/libc.so.6

Thread 3 (Thread 0x7fe85a22a700 (LWP 20976)):
#0  0x00007fe85c5317bd in poll () from /usr/lib/libc.so.6
#1  0x00000000004080c4 in lttng_consumer_poll_socket
(consumer_sockpoll=0x7fe85a229be0)
    at consumer.c:1111
#2  0x000000000040e25b in consumer_thread_sessiond_poll
(data=0xbf3010) at consumer.c:3067
#3  0x00007fe85ca12314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fe85c53a5bd in clone () from /usr/lib/libc.so.6

Thread 2 (Thread 0x7fe859a29700 (LWP 20977)):
#0  0x00007fe85c48691f in do_sigwaitinfo () from /usr/lib/libc.so.6
#1  0x00007fe85c4869bd in sigwaitinfo () from /usr/lib/libc.so.6
#2  0x000000000041153d in consumer_timer_thread (data=0xbf3010) at
consumer-timer.c:504
#3  0x00007fe85ca12314 in start_thread () from /usr/lib/libpthread.so.0
#4  0x00007fe85c53a5bd in clone () from /usr/lib/libc.so.6

Thread 1 (Thread 0x7fe85e8fda00 (LWP 20969)):
#0  0x00007fe85ca13505 in pthread_join () from /usr/lib/libpthread.so.0
#1  0x000000000040445f in main (argc=9, argv=0x7fff7993e848) at
lttng-consumerd.c:486

I'm thinking your patch is only missing a callsite which is causing
the consumer to not pick-up on the socket's POLLIN event.

I'm looking into this, but wanted to keep you in the loop.

Regards,
Jérémie

On Wed, Nov 26, 2014 at 4:29 PM, Philippe Proulx
<eeppeliteloop at gmail.com> wrote:
> Refs: #747
>
> Signed-off-by: Philippe Proulx <eeppeliteloop at gmail.com>
> ---
>  src/bin/lttng-relayd/main.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/bin/lttng-relayd/main.c b/src/bin/lttng-relayd/main.c
> index 00b8735..07cb8f5 100644
> --- a/src/bin/lttng-relayd/main.c
> +++ b/src/bin/lttng-relayd/main.c
> @@ -2499,6 +2499,11 @@ restart:
>
>                         health_code_update();
>
> +                       /* Do not waste time if no returned events for this fd */
> +                       if (!revents) {
> +                               continue;
> +                       }
> +
>                         /* Thread quit pipe has been closed. Killing thread. */
>                         ret = check_thread_quit_pipe(pollfd, revents);
>                         if (ret) {
> --
> 2.1.3
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list