[lttng-dev] Crash in LTTng lttng-tools 2.12 snapshot_channel

Codres, Bogdan Bogdan.Codres at windriver.com
Wed Dec 8 08:45:15 EST 2021


Hello all,

My name is Bogdan Codres from Wind River.

Recently, we received a crash from one of our customer. This happened only once
and we do not have a clear path on how to reproduce this.

The crash happened on ARMv7 and the version of lttng-tools was 2.12.
This is the backtrace of the crash:


(gdb) bt
#0 __libc_do_syscall () at libc-do-syscall.S:49
#1 0xb6e13ad4 in __libc_signal_restore_set (set=0xb39f94e0) at ../sysdeps/unix/sysv/linux/internal-signals.h:84
#2 __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3 0xb6e061a6 in __GI_abort () at abort.c:79
#4 0xb6e0ed90 in __assert_fail_base (fmt=0xb6ebfed0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x517e10 "!stream->trace_chunk", assertion at entry=0xb39fe300 "\001", file=0x51d844 "../../../../git/src/common/ust-consumer/ust-consumer.c", file at entry=0x0,
line=1124, line at entry=5363780, function=function at entry=0x51d234 <__PRETTY_FUNCTION__.15949> "snapshot_channel") at assert.c:92
#5 0xb6e0ee0e in __GI___assert_fail (assertion=0xb39fe300 "\001", file=0x0, line=5363780, line at entry=1124, function=0x51d234 <__PRETTY_FUNCTION__.15949> "snapshot_channel") at assert.c:101
#6 0x004f5840 in snapshot_channel (channel=0xb42008d0, key=1, path=path at entry=0xb39f9964 "ust/uid/0/32-bit", relayd_id=relayd_id at entry=18446744073709551615, nb_packets_per_stream=0, ctx=ctx at entry=0x544048) at ../../../../git/src/common/ust-consumer/ust-consumer.c:1124
#7 0x004f9a08 in lttng_ustconsumer_recv_cmd (ctx=0x544048, sock=30, consumer_sockpoll=<optimized out>) at ../../../../git/src/common/ust-consumer/ust-consumer.c:1790
#8 0x004dfac0 in consumer_thread_sessiond_poll (data=0x544048) at ../../../../git/src/common/consumer/consumer.c:3361
#9 0xb6ee7b00 in start_thread (arg=0x98396ec3) at pthread_create.c:486
#10 0xb6e853bc in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /sysroots/armv7at2-neon-wrs-linux-gnueabi/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


There's an assert( ! stream->trace_chunk ) which fails i.e. the stream trace_chunk exists.

There is a comment for the function, saying "the caller must take RCU read side lock and channel lock".
The RCU read side lock is taken by the snapshot_channel, but from what I could see, nothing seems to take the channel lock in the functions calling the snapshot_channel.

As this crash looked like a race condition, and if the comments in the function are correct and the channel lock is missing,

it could indeed be a race condition, and therefore I wondered if anyone else has seen it.

I did some source code investigation and I saw that in

lttng_kconsumer_recv_cmd which have a similar structure

like the lttng_ustconsumer_recv_cmd ... --> we see pthread_mutex_lock(&channel>lock); ---> in LTTNG_CONSUMER_SNAPSHOT_CHANNEL


else {
 pthread_mutex_lock(&channel->lock);
 if (msg.u.snapshot_channel.metadata == 1) {
 ret = lttng_kconsumer_snapshot_metadata(channel, key,
 msg.u.snapshot_channel.pathname,
 msg.u.snapshot_channel.relayd_id, ctx);
 if (ret < 0)
{ ERR("Snapshot metadata failed"); ret_code = LTTCOMM_CONSUMERD_SNAPSHOT_FAILED; }
} else {
 ret = lttng_kconsumer_snapshot_channel(channel, key,
 msg.u.snapshot_channel.pathname,
 msg.u.snapshot_channel.relayd_id,
 msg.u.snapshot_channel.nb_packets_per_stream,
 ctx);
 if (ret < 0)
{ ERR("Snapshot channel failed"); ret_code = LTTCOMM_CONSUMERD_SNAPSHOT_FAILED; }
}
 pthread_mutex_unlock(&channel->lock);

So, my question is this: shouldn't be used also in lttng_ustconsumer_recv_cmd  a mutex lock for channel like
it's used in lttng_kconsumer_recv_cmd  ?
What's your opinion on this issue ?



Best Regards,
Ph.D. eng. Bogdan Codres
Senior Engineer at RDC-EMEA, Professional Services, Wind River
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20211208/98dc9838/attachment-0001.htm>


More information about the lttng-dev mailing list