[lttng-dev] Relayd trace drops

Wed Dec 16 08:02:25 EST 2015

Hi Mathieu,

Ive been able to analyze this sessiond - consumerd hang scenario and I see
that the fix is not available on 2.6.1 or on 2.7.0.

Root Cause :

The hang is mainly caused because of a race condition for the
consumer_metadata_socket between the following two flows of code
Flow 1. do_sync_metadata() -> lttng_ustconsumer_sync_metadat() ->
lttng_ustconsumer_request_metadata() -> lttncomm_recv_unix_sock() ->
recvmsg()

and

Flow 2. lttng_consumer_recv_cmd() -> lttng_ust_consumer_recv_cmd() ->
lttng_ustconsumer_recv_metadata()

Flow 1 above initiates a metadata requests and recvmsg()s for the reply
from sessiond on the consumer_metadata_socket ( it has taken the
metadata_socket_lock and the stream->lock ) , but simultaneously this
socket is being polled from Flow 2. Once in a while Flow 2 gets to receive
while Flow 1 just hangs on recvmsg() . Since Flow 1 has taken the
stream->lock, this then introduces a deadlock at a later stage
(on_buffer_ready() when the metadata is being flushed. )

Think using the metadata_socket_lock in Flow 2 before attempting to receive
may address this issue, but is a bit complicated as the message would have
already been received at the point of using that lock. The other
alternative is to remove the complete recv() from Flow 1 and let that be
taken care by Flow 2.

Please let me know if this has been fixed in the later builds, may be I
searched it wrong.

Regards,
Aravind.

On Sat, Dec 12, 2015 at 4:13 AM, Mathieu Desnoyers <
mathieu.desnoyers at efficios.com> wrote:

> Hi,
>
> Please upgrade to 2.6.1 or 2.7.0, those are old issues.
>
> Thanks,
>
> Mathieu
>
>
> ----- On Dec 11, 2015, at 1:42 PM, Aravind HT <aravind.ht at gmail.com>
> wrote:
>
>
> NPBS understand it, I will continue to investigate this, got a bit side
> tracked with the sessiond hang issue that I mentioned earlier.
> Making some progress on that, will update as soon as I find the root
> cause, it looks like a mutex deadlock between
> the following threads in the consumerd daemon
>
> #1  0x0000000000410b5d in lttcomm_recv_unix_sock (sock=26, buf=buf at entry=0x7f16acca57f0,
> len=len at entry=4476) at unix.c:175
> #2  0x000000000041dc96 in lttng_ustconsumer_request_metadata (ctx=ctx at entry=0x640d10,
> channel=0x7f16980192b0, timer=timer at entry=0, wait=wait at entry=0) at
> ust-consumer.c:2331
> #3  0x000000000041ff3c in lttng_ustconsumer_sync_metadata (ctx=ctx at entry=0x640d10,
> metadata=metadata at entry=0x7f169801aa40) at ust-consumer.c:1829
> #4  0x000000000040eabf in do_sync_metadata (ctx=<optimized out>,
> metadata=<optimized out>) at consumer-stream.c:468
> #5  consumer_stream_sync_metadata (ctx=ctx at entry=0x640d10, session_id=8)
> at consumer-stream.c:545
> #6  0x000000000041d316 in lttng_ustconsumer_read_subbuffer
> (stream=stream at entry=0x7f1698018fa0, ctx=ctx at entry=0x640d10) at
> ust-consumer.c:2063
> #7  0x000000000040b613 in lttng_consumer_read_subbuffer
> (stream=0x7f1698018fa0, ctx=0x640d10) at consumer.c:3203
> #8  0x0000000000409608 in consumer_thread_data_poll (data=0x640d10) at
> consumer.c:2557
> #9  0x00007f16af6a6fe3 in start_thread (arg=0x7f16acca7700) at
> pthread_create.c:312
> #10 0x00007f16af3dcafd in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> and
>
> consumer_thread_metadata_poll() calling lttng_consumer_read_subbuffer() /
> on_buffer_ready() calling pthread_mutex_lock(stream->lock) and then hanging
>
>
> This again can be reproduced by the same script, just running consecutive
> sessions of short duration (<10 seconds) with traces getting captured on a
> remote node.
> I hit this every time I run the script.
>
> Regards,
> Aravind.
>
>
>
> On Fri, Dec 11, 2015 at 11:33 PM, Jonathan Rajotte Julien <
> Jonathan.rajotte-julien at efficios.com> wrote:
>
>> Hi Aravind,
>>
>> On 2015-12-11 12:16 PM, Aravind HT wrote:
>>
>>> Hi,
>>>
>>> Did you get a chance to reproduce the problem with the scripts ? Let me
>>> know if you need any help running it.
>>>
>>
>> Time is a scarce resource unfortunately. But until I have the time to
>> look at it you might want to investigate if you are able to reproduce it on
>> 2.6.1/2.7.0/master on 2 regular (x86_64, ubuntu/debian) VM this would
>> really help.
>>
>> Cheers!
>>
>>
>>> Regards,
>>> Aravind.
>>>
>>>
>>> On Wed, Dec 9, 2015 at 3:04 PM, Aravind HT <aravind.ht at gmail.com
>>> <mailto:aravind.ht at gmail.com>> wrote:
>>>
>>>     Sorry about that, not sure how it got missed.
>>>     Here it is.
>>>
>>>
>>>     # This needs a two node set up (1. local current node 2. remote node
>>> )
>>>     # relayd runs on the current node where traces are captured from
>>>     the remote node
>>>     # remote node runs test applications which generate traces.
>>>     # the launch_script_RN is executed on the current node and uses
>>>     ssh  to execute commands on the remote node. So this part may not
>>>     work in every case and may prompt for a password.
>>>     # if experiencing problems with ssh , kindly check
>>>
>>> http://serverfault.com/questions/241588/how-to-automate-ssh-login-with-password
>>>
>>>     # ====================       To Run  =============================
>>>     launch_script_RN.py self_profile -c /tmp/configFile.txt
>>>
>>>
>>>
>>>     # configFile.txt is the file which has configuration params that
>>>     launchScript
>>>     # needs to configure lttng sessions. Below is an explanation of
>>>     the different options.
>>>     # ===================     configFile.txt
>>> =============================
>>>
>>>     [section1]
>>>     # final out put file path
>>>     OutputFile = /tmp/Final_report.txt
>>>     # the remote node hostname on which test applications run and the
>>>     test sessions will be created; this should be something that could
>>>     be used with ssh. Traces will be transported from this node to the
>>>     lttng_relayd running on the current node.
>>>     Node = MY_REMOTE_NODE
>>>     # Sub buffer size to start this with
>>>     SubBufSize = 16k
>>>     # Sub buffer count
>>>     SubBufCount = 4
>>>     # per uid buffer
>>>     BufferScheme = --buffers-uid
>>>     # yes
>>>     EnableTracing = yes
>>>     # Bit rate of the test applications. Comman seperated example "1,
>>>     3, 3, 50" sayss 4 test applications producing 1, 3, 3, and 50 Kb/s
>>>     traces.
>>>     # So with the below, we just start with 1 test application
>>>     producing 10 kb/s
>>>     TestApps = 10
>>>     # session life time in seonds
>>>     TestTime = 10
>>>     # Max number of successive sessions to configure. if n then n-1
>>>     sessions are run, ex MaxRun = 2 will run 1 session.
>>>     MaxRun = 100
>>>
>>>
>>>     # ====================  Place the following files under
>>>     ===============
>>>
>>>     # /tmp on the remote node
>>>     clean_RemNode_apps.sh
>>>     report_lttng_script.sh
>>>
>>>     # rest of the scripts under /usr/sbin on the current local node on
>>>     which lttng_realyd runs
>>>     # Define a trace point MY_TRACE to take a single string arg with
>>>     LOG_TEST_APP_PROFILING as the provider, compile test lttng_testApp
>>>     and place it under /usr/sbin of the remote host
>>>
>>>     # in launch_script_RN.py change currentNodeIP to the IP address on
>>>     which relayd is receiving, default ports are used.
>>>
>>>     # lttng_relayd is started as
>>>     /usr/bin/lttng-relayd -o /var/log/lttng-traces -d
>>>
>>>     # lttng_sessiond is started as
>>>     /usr/bin/lttng-sessiond --consumerd32-path
>>>     /usr/lib/lttng/libexec/lttng-consumerd --consumerd32-libdir
>>>     /usr/lib/ --consumerd64-path
>>>     /usr/lib64/lttng/libexec/lttng-consumerd --consumerd64-libdir
>>>     /usr/lib64/ -b --no-kernel
>>>
>>>
>>>
>>>
>>>     Regards,
>>>     Aravind.
>>>
>>>     On Wed, Dec 9, 2015 at 12:57 AM, Jonathan Rajotte Julien
>>>     <Jonathan.rajotte-julien at efficios.com
>>>     <mailto:Jonathan.rajotte-julien at efficios.com>> wrote:
>>>
>>>         Hi Aravind,
>>>
>>>         There is no README in the archive you sent.
>>>
>>>         Cheers
>>>
>>>         On 2015-12-08 07:51 AM, Aravind HT wrote:
>>>
>>>             Hi,
>>>
>>>             I am trying to upgrade in parallel, but this issue may
>>>             still be present after I upgrade or may be temporarily
>>>             masked. So I need to find the root cause for this and then
>>>             see if its available on the latest before committing to
>>>             upgrade.
>>>
>>>             There is another issue i'm hitting, the lttng list command
>>>             hangs after lttng destroy session when running the profiling.
>>>
>>>             I found that consumerd 64 goes into an infinite loop
>>>             waiting to flush metadata in
>>>             lttng_ustconsumer_recv_metadata() :: while
>>>             (consumer_metadata_cache_flushed(channel, offset + len,
>>>             timer)) .
>>>             In consumer_metadata_cache,
>>>             channel->metadata_stream->endpoint_status is
>>>             CONSUMER_ENDPOINT_ACTIVE,
>>>             metadata_stream->ust_metadata_pushed is 0 with offset
>>>             having some value. This call always returns a 1 from the
>>>             last else{} block resulting in an infinite loop. Upon
>>>             searching the forum, I found the same issue being reported
>>>             here :
>>>
>>> https://www.mail-archive.com/lttng-dev@lists.lttng.org/msg07982.html
>>>
>>>             Regards,
>>>             Aravind.
>>>
>>>
>>>             On Tue, Dec 8, 2015 at 12:43 AM, Jonathan Rajotte
>>>             <jonathan.r.julien at gmail.com
>>>             <mailto:jonathan.r.julien at gmail.com>> wrote:
>>>
>>>                 Hi Aravind,
>>>
>>>                 Did you have the chance to upgrade to 2.6.1.If so
>>>             where you able
>>>                 to reproduce?
>>>
>>>                 Cheers
>>>
>>>                 On Mon, Dec 7, 2015 at 1:07 PM, Aravind HT
>>>             <aravind.ht at gmail.com <mailto:aravind.ht at gmail.com>>
>>>                 wrote:
>>>
>>>                     Hi,
>>>
>>>                     I have attached the complete profiling scripts
>>>             here, its a bit
>>>                     shabby, im new to python.
>>>
>>>                     There is a README which has the details on how to
>>>             execute it.
>>>                     Im using a Yocto 1.6 on x86_64 platforms on both
>>>             the nodes.
>>>
>>>
>>>                     Running this script when there are other sessions
>>>             running
>>>                     seems to reproduce this problem easily.
>>>                     Please try it and let me know if you have any issues
>>>                     reproducing the problem.
>>>
>>>                     Regards,
>>>                     Aravind.
>>>
>>>                     On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau
>>>                     <jeremie.galarneau at efficios.com
>>>             <mailto:jeremie.galarneau at efficios.com>
>>>                     <mailto:jeremie.galarneau at efficios.com
>>>             <mailto:jeremie.galarneau at efficios.com>>> wrote:
>>>
>>>                         On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT
>>>                         <aravind.ht at gmail.com
>>>             <mailto:aravind.ht at gmail.com> <mailto:aravind.ht at gmail.com
>>>
>>>             <mailto:aravind.ht at gmail.com>>> wrote:
>>>                         > I am using 2.6.0 .I will try to share the
>>>             code that I'm
>>>                         using here in some
>>>                         > time. If there are any specific fixes that
>>>             are relevant
>>>                         to this issue, see
>>>                         > if you can provide a link to them. I would
>>>             ideally like
>>>                         to try them out
>>>                         > before trying a full upgrade to the latest
>>>             versions.
>>>
>>>                         Hi,
>>>
>>>                         Can you provide more information on the
>>>             system? Which
>>>                         distribution,
>>>                         architecture, kernel version?
>>>
>>>                         The verbose sessiond logs might help pinpoint any
>>>                         unexpected behaviour
>>>                         here (are all applications registering as
>>>             expected?).
>>>
>>>                         Jérémie
>>>
>>>                         >
>>>                         > On Fri, Dec 4, 2015 at 6:11 PM, Jérémie
>>>             Galarneau
>>>                         > <jeremie.galarneau at efficios.com
>>>             <mailto:jeremie.galarneau at efficios.com>
>>>                         <mailto:jeremie.galarneau at efficios.com
>>>             <mailto:jeremie.galarneau at efficios.com>>> wrote:
>>>                         >>
>>>                         >> Hi Aravind,
>>>                         >>
>>>                         >> Can't say I have looked at everything you
>>>             sent yet, but
>>>                         as a
>>>                         >> preemptive question, which version are we
>>>             talking about
>>>                         here? 2.6.0 or
>>>                         >> 2.6.1? 2.6.1 contains a lot of relay daemon
>>>             fixes.
>>>                         >>
>>>                         >> Thanks,
>>>                         >> Jérémie
>>>                         >>
>>>                         >> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT
>>>                         <aravind.ht at gmail.com
>>>             <mailto:aravind.ht at gmail.com>> wrote:
>>>                         >> > Hi,
>>>                         >> >
>>>                         >> > I am trying to obtain the performance
>>>             characteristics
>>>                         of lttng with the
>>>                         >> > use
>>>                         >> > of test applications. Traces are being
>>>             produced on a
>>>                         local node and
>>>                         >> > delivered to relayd that is running on a
>>>             separate
>>>                         node for storage.
>>>                         >> >
>>>                         >> > An lttng session with the test
>>>             applications producing
>>>                         an initial bit
>>>                         >> > rate of
>>>                         >> > 10 kb/s is started and run for about 30
>>>             seconds. The
>>>                         starting sub-buffer
>>>                         >> > size is kept at 128 kb and sub-buf count
>>>             at 4. The
>>>                         session is then
>>>                         >> > stopped
>>>                         >> > and destroyed and traces are analyzed to
>>>             see if there
>>>                         are any drops.
>>>                         >> > This is
>>>                         >> > being done in a loop with every
>>>             subsequent session
>>>                         having an increment
>>>                         >> > of 2
>>>                         >> > kb/s as long as there are no drops. If
>>>             there are
>>>                         drops, I increase the
>>>                         >> > buffer size by a factor of x2 without
>>>             incrementing
>>>                         the bit rate.
>>>                         >> >
>>>                         >> > I see trace drops happening consistently
>>>             with test
>>>                         apps producing traces
>>>                         >> > at
>>>                         >> > less than 40 kb/s, it doesnt seem to help
>>>             even if I
>>>                         started with 1mb x 4
>>>                         >> > sub-buffers.
>>>                         >> >
>>>                         >> > Analysis :
>>>                         >> >
>>>                         >> > I have attached the lttng_relayd ,
>>>             lttng_consumerd_64
>>>                         logs and the
>>>                         >> > entire
>>>                         >> > trace directory, hope you will be able to
>>>             view it.
>>>                         >> > I have modified lttng_relayd code to dump
>>>             the traces
>>>                         being captured in
>>>                         >> > the
>>>                         >> > lttng_relayd logs along with debug info.
>>>                         >> >
>>>                         >> > Each test app is producing logs in the
>>>             form of  :
>>>                         >> > "TraceApp PID - 31940 THID - 31970
>>>             @threadRate - 1032
>>>                         b/s appRate - 2079
>>>                         >> > b/s
>>>                         >> > threadTraceNum - 9 appTraceNum - 18
>>>      sleepTime - 192120"
>>>                         >> >
>>>                         >> > The test application PID, test
>>>             application thread id,
>>>                         thread bit rate,
>>>                         >> > test
>>>                         >> > app bit rate, thread trace number and
>>>             application
>>>                         trace number s are
>>>                         >> > part of
>>>                         >> > the trace. So in the above trace, the
>>>             thread is
>>>                         producing at 1 kb/s and
>>>                         >> > the
>>>                         >> > whole test app is producing at 2 kb/s.
>>>                         >> >
>>>                         >> > If we look at the babeltrace out put, we
>>>             see that the
>>>                         Trace with
>>>                         >> > TraceApp
>>>                         >> > PID - 31940 appTraceNum 2 is missing ,
>>>             with 1, 3, 4,
>>>                         5 and so on being
>>>                         >> > successfully captured.
>>>                         >> > I looked at the lttng_relayd logs and
>>>             found that
>>>                         trace of "appTraceNum
>>>                         >> > 2" is
>>>                         >> > not delivered/generated by the consumerd
>>>             to the
>>>                         relayd in sequence with
>>>                         >> > other traces. To rule out that this is
>>>             not a test
>>>                         application problem,
>>>                         >> > you
>>>                         >> > can look at line ltttng_relayd log :
>>>             12778 and see
>>>                         traces from
>>>                         >> > appTraceNum -
>>>                         >> > 1 to appTraceNum - 18 including the
>>>             appTraceNum 2 are
>>>                         "re-delivered" by
>>>                         >> > the
>>>                         >> > consumerd to the relayd.
>>>                         >> > Essentially, I see appTraceNum 1 through
>>>             appTraceNum
>>>                         18 being delivered
>>>                         >> > twice, once individually where
>>>             appTraceNum 2 is
>>>                         missing and once as a
>>>                         >> > group
>>>                         >> > at line 12778 where its present.
>>>                         >> >
>>>                         >> >
>>>                         >> > Request help with
>>>                         >> > 1. why traces are delivered twice, is it
>>>             by design or
>>>                         a genuine problem
>>>                         >> > ?
>>>                         >> > 2. how to avoid traces being dropped even
>>>             though
>>>                         buffers are
>>>                         >> > sufficiently
>>>                         >> > large enough ?
>>>                         >> >
>>>                         >> >
>>>                         >> > Regards,
>>>                         >> > Aravind.
>>>                         >> >
>>>                         >> >
>>>             _______________________________________________
>>>                         >> > lttng-dev mailing list
>>>                         >> > lttng-dev at lists.lttng.org
>>>             <mailto:lttng-dev at lists.lttng.org>
>>>                         >> >
>>>             http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>                         >> >
>>>                         >>
>>>                         >>
>>>                         >>
>>>                         >> --
>>>                         >> Jérémie Galarneau
>>>                         >> EfficiOS Inc.
>>>                         >> http://www.efficios.com
>>>                         >
>>>                         >
>>>
>>>
>>>
>>>                         --
>>>                         Jérémie Galarneau
>>>                         EfficiOS Inc.
>>>             http://www.efficios.com
>>>
>>>
>>>
>>>             _______________________________________________
>>>                     lttng-dev mailing list
>>>             lttng-dev at lists.lttng.org
>>>             <mailto:lttng-dev at lists.lttng.org>
>>>             <mailto:lttng-dev at lists.lttng.org
>>>             <mailto:lttng-dev at lists.lttng.org>>
>>>             http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>
>>>
>>>
>>>
>>>                 --     Jonathan Rajotte Julien
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             lttng-dev mailing list
>>>             lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
>>>             http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>
>>>
>>>         --         Jonathan R. Julien
>>>         Efficios
>>>
>>>
>>>
>>>
>> --
>> Jonathan R. Julien
>> Efficios
>>
>>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151216/a12df355/attachment-0001.html>