[lttng-dev] lttng upgrade to 2.8.1, need help.

Aravind HT aravind.ht at gmail.com
Tue Nov 15 19:14:29 UTC 2016


Hi,


I was trying to upgrade to lttng-2.8.1 on my yocto build ( using
http://cgit.openembedded.org/cgit.cgi/openembedded-core/tree/meta/recipes-kernel/lttng/lttng-tools_git.bb?h=master)
and I am running into two sets of problems
1) "lttng stop <session-name>"command hanging/waiting for ever. ( Focus of
this mail )
2) hash-table related crashes being seen.( asserts, segmentation faults ) (
Not focussing on this in this mail. Will try to follow up with the details
for this. )



LTTNG HANGING :

With both versions of liburcu 0.8.6 and 0.9.2 (
http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-support/liburcu/liburcu_0.9.2.bb)
I see this hang. I started out with 0.8.6 and saw issues and tried with
 0.9.2 but still the same.

1) 'lttng stop <session-name>' command just wait for data. I think the
hang has already happened and its just that we discover it at a user
level via this command.

A couple of markers to see that consumerd is hanging are :-

a. gdb'ing into consumerd ( 32bit and 64 bit, which ever as per the
above ) will show its doing a recvmsg() without returning

Thread 5 (Thread 0x7f43e27c5700 (LWP 3024)):
#0  0x00007f43e90b0b6d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000042d2e7 in lttng_recvmsg_nosigpipe (msg=<optimized
out>, sockfd=<optimized out>) at ../../src/common/compat/socket.h:36
#2  lttcomm_recv_unix_sock (sock=26, buf=0x7f43e27c15c0, len=12668) at
unix.c:186
#3  0x00000000004231d1 in lttng_ustconsumer_request_metadata
(ctx=0x64f030, channel=0x7f43d00d5e90, timer=16384, timer at entry=0,
wait=0) at ust-consumer.c:2855
#4  0x000000000042655b in lttng_ustconsumer_sync_metadata
(ctx=0x64f030, metadata=0x7f43d00d9630) at ust-consumer.c:2275
#5  0x0000000000413350 in do_sync_metadata (ctx=<optimized out>,
metadata=<optimized out>) at consumer-stream.c:462
#6  consumer_stream_sync_metadata (ctx=0x64f030, session_id=0) at
consumer-stream.c:539
#7  0x0000000000422101 in lttng_ustconsumer_read_subbuffer
(stream=0x7f43d0088a90, ctx=0x64f030) at ust-consumer.c:2573

b. Launch a new app such that its part of an active session during this period.

The backend directory for this app will get created ( on the node on
which lttng_relayd is collecting traces <live-streaming> ), but the
directory itself will be empty (trace files of 0 bytes) and does not
show any traces.


2) The hang is partial where parts of sesssiond and consumerd are
found working ( ????, many threads in these components are working
while some are hung ).
1.b-above shows that its a partial hang. Some parts of tracing (
registration ) are working fine, while the flushing/clearing of traces
from subbuffers is not happening.


I have tried upgrading urcu libs, 0.8.6.x to 0.9.2.y and it doesnt seem to help.


3) The health-check feature is not picking this up immediately.

Complete stack traces for sessiond and consumerd below, not sure who
is hanging to whose neck :-)

####################################### COMPLETE STACK TRACES
##########################################

#### CONSUMERD-64 threads ####
Thread 8 (Thread 0x7f43e88f1700 (LWP 3021)):
#0  0x00007f43e8de6733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000041f7c9 in compat_epoll_wait (events=0x7f43e88f0bc0,
timeout=-1) at compat-epoll.c:233
#2  0x00000000004056cd in thread_manage_health (data=0x14) at
health-consumerd.c:244
#3  0x00007f43e90a8294 in start_thread (arg=0x7f43e88f1700) at
pthread_create.c:336
#4  0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7f43e37c7700 (LWP 3022)):
#0  0x00007f43e8de6733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000041f7c9 in compat_epoll_wait (events=0x7f43e37c6bb0,
timeout=-1) at compat-epoll.c:233
#2  0x000000000040d8af in consumer_thread_channel_poll (data=0x16) at
consumer.c:2855
#3  0x00007f43e90a8294 in start_thread (arg=0x7f43e37c7700) at
pthread_create.c:336
#4  0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7f43e2fc6700 (LWP 3023)):
#0  0x00007f43e8de6733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000041f7c9 in compat_epoll_wait (events=0x7f43e2fc5bb0,
timeout=-1) at compat-epoll.c:233
#2  0x000000000040b4ae in consumer_thread_metadata_poll
(data=0x64f030) at consumer.c:2272
#3  0x00007f43e90a8294 in start_thread (arg=0x7f43e2fc6700) at
pthread_create.c:336
#4  0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7f43e27c5700 (LWP 3024)):
#0  0x00007f43e90b0b6d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000042d2e7 in lttng_recvmsg_nosigpipe (msg=<optimized
out>, sockfd=<optimized out>) at ../../src/common/compat/socket.h:36
#2  lttcomm_recv_unix_sock (sock=26, buf=0x7f43e27c15c0, len=12668) at
unix.c:186
#3  0x00000000004231d1 in lttng_ustconsumer_request_metadata
(ctx=0x64f030, channel=0x7f43d00d5e90, timer=16384, timer at entry=0,
wait=0) at ust-consumer.c:2855
#4  0x000000000042655b in lttng_ustconsumer_sync_metadata
(ctx=0x64f030, metadata=0x7f43d00d9630) at ust-consumer.c:2275
#5  0x0000000000413350 in do_sync_metadata (ctx=<optimized out>,
metadata=<optimized out>) at consumer-stream.c:462
#6  consumer_stream_sync_metadata (ctx=0x64f030, session_id=0) at
consumer-stream.c:539
#7  0x0000000000422101 in lttng_ustconsumer_read_subbuffer
(stream=0x7f43d0088a90, ctx=0x64f030) at ust-consumer.c:2573
#8  0x000000000040f75c in lttng_consumer_read_subbuffer
(stream=0x7f43d0088a90, ctx=0x64f030) at consumer.c:3271
#9  0x000000000040c765 in consumer_thread_data_poll (data=0x1a) at
consumer.c:2637
#10 0x00007f43e90a8294 in start_thread (arg=0x7f43e27c5700) at
pthread_create.c:336
#11 0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7f43dbfff700 (LWP 3025)):
#0  0x00007f43e8ddabcd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000408992 in lttng_consumer_poll_socket
(consumer_sockpoll=0x7f43dbffebc0) at consumer.c:1134
#2  0x000000000040f23b in consumer_thread_sessiond_poll
(data=0x64f030) at consumer.c:3173
#3  0x00007f43e90a8294 in start_thread (arg=0x7f43dbfff700) at
pthread_create.c:336
#4  0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7f43e1fc4700 (LWP 3026)):
#0  0x00007f43e8d3219f in do_sigwaitinfo (set=<optimized out>,
set at entry=0x7f43e1fc3ad0, info=info at entry=0x7f43e1fc3b50) at
../sysdeps/unix/sysv/linux/sigwaitinfo.c:55
#1  0x00007f43e8d3223d in __GI___sigwaitinfo (set=0x7f43e1fc3ad0,
info=0x7f43e1fc3b50) at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:80
#2  0x0000000000411996 in consumer_timer_thread (data=0x64f030) at
consumer-timer.c:576
#3  0x00007f43e90a8294 in start_thread (arg=0x7f43e1fc4700) at
pthread_create.c:336
#4  0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7f43cbafd700 (LWP 8020)):
---Type <return> to continue, or q <return> to quit---
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f43e9b1aa00 in futex (val3=<optimized out>,
uaddr2=<optimized out>, timeout=<optimized out>, val=<optimized out>,
op=<optimized out>, uaddr=<optimized out>)
    at /usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu/futex.h:65
#2  futex_async (timeout=<optimized out>, uaddr2=<optimized out>,
val3=<optimized out>, val=<optimized out>, op=<optimized out>,
uaddr=<optimized out>)
    at /usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu/futex.h:97
#3  wait_gp () at /usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu.c:244
#4  wait_for_readers (input_readers=0x7f43cbafcbe0,
cur_snap_readers=0x0, qsreaders=0x7f43cbafcbf0) at
/usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu.c:327
#5  0x00007f43e9b1abd0 in synchronize_rcu_memb () at
/usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu.c:468
#6  0x00007f43e9b1b254 in call_rcu_thread (arg=0x7f43d0012bd0) at
/usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu-call-rcu-impl.h:362
#7  0x00007f43e90a8294 in start_thread (arg=0x7f43cbafd700) at
pthread_create.c:336
#8  0x00007f43e8de613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7f43e9f39980 (LWP 3017)):
#0  0x00007f43e90a945d in pthread_join (threadid=139929430522320,
thread_return=0x0) at pthread_join.c:90
#1  0x0000000000404fc8 in main (argc=0, argv=0x0) at lttng-consumerd.c:590



################# sessiond threads ########################



Thread 12 (Thread 0x7ff72976b700 (LWP 1697)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff72976ab30,
timeout=-1) at compat-epoll.c:233
#2  0x000000000042528a in thread_ht_cleanup (data=0xc) at ht-cleanup.c:71
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff72976b700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7ff723fff700 (LWP 1699)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff723ffeb40,
timeout=-1) at compat-epoll.c:233
#2  0x000000000044c0cd in thread_manage_health (data=0x1c) at main.c:4264
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff723fff700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7ff7237fe700 (LWP 1700)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff7237fdb30,
timeout=-1) at compat-epoll.c:233
#2  0x000000000044b235 in thread_manage_clients (data=0x17) at main.c:4443
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff7237fe700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7ff722ffd700 (LWP 1701)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000000000457962 in futex (val3=<optimized out>,
uaddr2=<optimized out>, timeout=<optimized out>, val=<optimized out>,
op=<optimized out>, uaddr=<optimized out>)
    at /build/home/htaravind/yocto-1.8/build/tmp/sysroots/re-64b/usr/include/urcu/futex.h:65
#2  futex_async (timeout=<optimized out>, uaddr2=<optimized out>,
val3=<optimized out>, val=<optimized out>, op=<optimized out>,
uaddr=<optimized out>)
    at /build/home/htaravind/yocto-1.8/build/tmp/sysroots/re-64b/usr/include/urcu/futex.h:97
#3  futex_nto1_wait (futex=0x69e0c0 <ust_cmd_queue>) at futex.c:91
#4  0x0000000000444d24 in thread_dispatch_ust_registration
(data=0x69e0c0 <ust_cmd_queue>) at main.c:2100
#5  0x00007ff72a84c294 in start_thread (arg=0x7ff722ffd700) at
pthread_create.c:336
#6  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7ff7227fc700 (LWP 1702)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff7227fbb30,
timeout=-1) at compat-epoll.c:233
#2  0x000000000044323a in thread_registration_apps (data=0x16) at main.c:2197
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff7227fc700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7ff721ffb700 (LWP 1703)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff721ffab30,
timeout=-1) at compat-epoll.c:233
#2  0x0000000000442b7a in thread_manage_apps (data=0x1a) at main.c:1638
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff721ffb700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7ff7217fa700 (LWP 1704)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff7217f9b30,
timeout=-1) at compat-epoll.c:233
#2  0x000000000043f554 in ust_thread_manage_notify (data=0x15) at
ust-thread.c:74
---Type <return> to continue, or q <return> to quit---
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff7217fa700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7ff720ff9700 (LWP 1705)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff720ff8b30,
timeout=-1) at compat-epoll.c:233
#2  0x00000000004412cd in agent_thread_manage_registration (data=0x18)
at agent-thread.c:267
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff720ff9700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ff7207f8700 (LWP 3018)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff7207f7b30,
timeout=-1) at compat-epoll.c:233
#2  0x0000000000446528 in thread_manage_consumer (data=0x699a40
<ustconsumer64_data>) at main.c:1449
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff7207f8700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ff71fff7700 (LWP 3028)):
#0  0x00007ff72a386733 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000000045f9c9 in compat_epoll_wait (events=0x7ff71fff6b30,
timeout=-1) at compat-epoll.c:233
#2  0x0000000000446528 in thread_manage_consumer (data=0x697900
<ustconsumer32_data>) at main.c:1449
#3  0x00007ff72a84c294 in start_thread (arg=0x7ff71fff7700) at
pthread_create.c:336
#4  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ff71f7f6700 (LWP 3036)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007ff72b61256a in futex (val3=<optimized out>,
uaddr2=<optimized out>, timeout=<optimized out>, val=<optimized out>,
op=<optimized out>, uaddr=<optimized out>)
    at /usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu/futex.h:65
#2  futex_async (timeout=<optimized out>, uaddr2=<optimized out>,
val3=<optimized out>, val=<optimized out>, op=<optimized out>,
uaddr=<optimized out>)
    at /usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu/futex.h:97
#3  call_rcu_wait (crdp=<optimized out>) at
/usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu-call-rcu-impl.h:246
#4  call_rcu_thread (arg=0x7ff71402dbb0) at
/usr/src/debug/liburcu/0.9.2-r0/userspace-rcu-0.9.2/urcu-call-rcu-impl.h:381
#5  0x00007ff72a84c294 in start_thread (arg=0x7ff71f7f6700) at
pthread_create.c:336
#6  0x00007ff72a38613d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ff72bc6ca00 (LWP 1611)):
#0  0x00007ff72a84d45d in pthread_join (threadid=140699387271632,
thread_return=0x0) at pthread_join.c:90
#1  0x000000000040c136 in main (argc=0, argv=0x0) at main.c:6161




Kindly let me know if you need more info, this is blocking my upgrade.


Regards,
Aravind.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20161116/b815686f/attachment-0001.html>


More information about the lttng-dev mailing list