[lttng-dev] Relayd trace drops

Aravind HT aravind.ht at gmail.com
Wed Dec 30 03:53:41 EST 2015


Hi Mathieu,

I will look into the commit. Appreciate you replying during the vacation,
have a nice one.
Happy new year to all.

Regards,
Aravind.

On Wed, Dec 30, 2015 at 7:35 AM, Mathieu Desnoyers <
mathieu.desnoyers at efficios.com> wrote:

> Hi Aravind,
>
> You may want to have a look at this commit, which appears
> in 2.6.1 too:
>
> commit c585821bc78955b3d747fcd733aa1d2b81a3258e
> Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Date:   Wed Aug 19 14:44:59 2015 -0700
>
>     Fix: sessiond vs consumerd push/get metadata deadlock
>
>     We need to unlock the registry while we push metadata to break a
>     circular dependency between the consumerd metadata lock and the
> sessiond
>     registry lock. Indeed, pushing metadata to the consumerd awaits that it
>     gets pushed all the way to relayd, but doing so requires grabbing the
>     metadata lock. If a concurrent metadata request is being performed by
>     consumerd, this can try to grab the registry lock on the sessiond while
>     holding the metadata lock on the consumer daemon. Those push and pull
>     schemes are performed on two different bidirectionnal communication
>     sockets.
>
>     Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
>     Signed-off-by: Jérémie Galarneau <jeremie.galarneau at efficios.com>
>
> Happy new year! By the way, everyone at EfficiOS is on vacation until
> January 4th
> (that includes me). ;-)
>
> Mathieu
>
> ----- On Dec 28, 2015, at 12:24 PM, Aravind HT <aravind.ht at gmail.com>
> wrote:
>
> Hi,
> Ive not had a chance to try it on 2.6.1 or 2.7.0 yet, will try and let you
> know.
> However, I was able to come up with a fix for 2.6.0 and I do not see any
> such changes on 2.6.1 or 2.7.0, so forwarding you the patch.
>
>
> diff -Naur ./src/bin/lttng-sessiond/ust-consumer.c
> /tmp/LTTNG_BKP_28_12_1/bin/lttng-sessiond/ust-consumer.c
> --- ./src/bin/lttng-sessiond/ust-consumer.c 2015-12-28 07:45:34.610430830
> -0800
> +++ /tmp/LTTNG_BKP_28_12_1/bin/lttng-sessiond/ust-consumer.c 2015-12-28
> 07:58:18.906918555 -0800
> @@ -1,3 +1,4 @@
> +
>  /*
>   * Copyright (C) 2011 - David Goulet <david.goulet at polymtl.ca>
>   *
> @@ -486,17 +487,48 @@
>   ust_reg = reg_pid->registry->reg.ust;
>   }
>   assert(ust_reg);
> +/* ust_reg->lock can also have been taken from push_metadata making this
> metadata request
> +*  wait without sending a response to the consumer which gets stuck on
> the recv() call
> +*  waiting for a response from sessiond. This metadata request from the
> consumer would
> +*  have taken the stream->lock, do_sync_metadata() making the
> push_metadata wait for the stream->lock to be
> +*  released, making the sessiond thread pushing the metadata wait on the
> recv() for a
> +*  response from the consumer, creating a deadlock. So to avoid this
> deadlock, making
> +*  the request metadata flow to fail incase if sessiond is already
> pushing metadata on
> +*  its own. */
> +        ret = pthread_mutex_trylock(&ust_reg->lock);
> +        if(!ret) {
> +                ret_push = ust_app_push_metadata(ust_reg, socket, 1);
> +                pthread_mutex_unlock(&ust_reg->lock);
> +                if (ret_push < 0) {
> + ERR("Error Pushing metadata");
> +                /* -EPIPE error would be thrown if the
> appRegistry->metada_closed was set,
> +                *  this could have happened if ust_app_unregister() was
> called when there
> +                *  was a problem with the sock ( epoll() ) from
> thread_manage_apps(). This
> +                *  would have made the current func return an error,
> upon which, the whole
> +                *  thread_manage_consumer would have been killed. To
> avoid this, send
> +                *  an error back to consumerd and continue buisness as
> usual */
> +                 if( ret_push == -EPIPE ) {
> +                        DBG("app metadata was closed, so sending ERR msg
> to Consumerd\n");
> +                        memset(&msg, 0, sizeof(msg));
> +                         msg.cmd_type = LTTNG_ERR_UND;
> +                        (void) consumer_send_msg(socket, &msg);
> +                        ret = 0;
> +                 } else {
> +                        ret = -1;
> +                 }
> +        } else {
> +                 DBG("UST Consumer metadata pushed successfully");
> +                ret = 0;
> +         }
> +
> +        } else {
> +                DBG(" ust_reg->lock is taken, cannot push metadata for
> this request at present \n");
> +                memset(&msg, 0, sizeof(msg));
> +                msg.cmd_type = LTTNG_ERR_UND;
> +                (void) consumer_send_msg(socket, &msg);
> +                ret = 0;
>
> - pthread_mutex_lock(&ust_reg->lock);
> - ret_push = ust_app_push_metadata(ust_reg, socket, 1);
> - pthread_mutex_unlock(&ust_reg->lock);
> - if (ret_push < 0) {
> - ERR("Pushing metadata");
> - ret = -1;
> - goto end;
> - }
> - DBG("UST Consumer metadata pushed successfully");
> - ret = 0;
> +        }
>
>  end:
>   rcu_read_unlock();
>
>
> Regards,
> Aravind.
>
> On Wed, Dec 16, 2015 at 7:48 PM, Mathieu Desnoyers <
> mathieu.desnoyers at efficios.com> wrote:
>
>>
>> ----- On Dec 16, 2015, at 8:02 AM, Aravind HT <aravind.ht at gmail.com>
>> wrote:
>>
>> Hi Mathieu,
>> Ive been able to analyze this sessiond - consumerd hang scenario and I
>> see that the fix is not available on 2.6.1 or on 2.7.0.
>>
>>
>> Did you actually reproduce the issue with 2.6.1 or 2.7.0 ? Can we get the
>> associated backtrace reproduced on those versions ?
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>> Root Cause :
>>
>> The hang is mainly caused because of a race condition for the
>> consumer_metadata_socket between the following two flows of code
>> Flow 1. do_sync_metadata() -> lttng_ustconsumer_sync_metadat() ->
>> lttng_ustconsumer_request_metadata() -> lttncomm_recv_unix_sock() ->
>> recvmsg()
>>
>> and
>>
>> Flow 2. lttng_consumer_recv_cmd() -> lttng_ust_consumer_recv_cmd() ->
>> lttng_ustconsumer_recv_metadata()
>>
>> Flow 1 above initiates a metadata requests and recvmsg()s for the reply
>> from sessiond on the consumer_metadata_socket ( it has taken the
>> metadata_socket_lock and the stream->lock ) , but simultaneously this
>> socket is being polled from Flow 2. Once in a while Flow 2 gets to receive
>> while Flow 1 just hangs on recvmsg() . Since Flow 1 has taken the
>> stream->lock, this then introduces a deadlock at a later stage
>> (on_buffer_ready() when the metadata is being flushed. )
>>
>> Think using the metadata_socket_lock in Flow 2 before attempting to
>> receive may address this issue, but is a bit complicated as the message
>> would have already been received at the point of using that lock. The other
>> alternative is to remove the complete recv() from Flow 1 and let that be
>> taken care by Flow 2.
>>
>> Please let me know if this has been fixed in the later builds, may be I
>> searched it wrong.
>>
>>
>> Regards,
>> Aravind.
>>
>> On Sat, Dec 12, 2015 at 4:13 AM, Mathieu Desnoyers <
>> mathieu.desnoyers at efficios.com> wrote:
>>
>>> Hi,
>>>
>>> Please upgrade to 2.6.1 or 2.7.0, those are old issues.
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>> ----- On Dec 11, 2015, at 1:42 PM, Aravind HT <aravind.ht at gmail.com>
>>> wrote:
>>>
>>>
>>> NPBS understand it, I will continue to investigate this, got a bit side
>>> tracked with the sessiond hang issue that I mentioned earlier.
>>> Making some progress on that, will update as soon as I find the root
>>> cause, it looks like a mutex deadlock between
>>> the following threads in the consumerd daemon
>>>
>>> #1  0x0000000000410b5d in lttcomm_recv_unix_sock (sock=26, buf=buf at entry=0x7f16acca57f0,
>>> len=len at entry=4476) at unix.c:175
>>> #2  0x000000000041dc96 in lttng_ustconsumer_request_metadata
>>> (ctx=ctx at entry=0x640d10, channel=0x7f16980192b0, timer=timer at entry=0,
>>> wait=wait at entry=0) at ust-consumer.c:2331
>>> #3  0x000000000041ff3c in lttng_ustconsumer_sync_metadata (ctx=ctx at entry=0x640d10,
>>> metadata=metadata at entry=0x7f169801aa40) at ust-consumer.c:1829
>>> #4  0x000000000040eabf in do_sync_metadata (ctx=<optimized out>,
>>> metadata=<optimized out>) at consumer-stream.c:468
>>> #5  consumer_stream_sync_metadata (ctx=ctx at entry=0x640d10,
>>> session_id=8) at consumer-stream.c:545
>>> #6  0x000000000041d316 in lttng_ustconsumer_read_subbuffer
>>> (stream=stream at entry=0x7f1698018fa0, ctx=ctx at entry=0x640d10) at
>>> ust-consumer.c:2063
>>> #7  0x000000000040b613 in lttng_consumer_read_subbuffer
>>> (stream=0x7f1698018fa0, ctx=0x640d10) at consumer.c:3203
>>> #8  0x0000000000409608 in consumer_thread_data_poll (data=0x640d10) at
>>> consumer.c:2557
>>> #9  0x00007f16af6a6fe3 in start_thread (arg=0x7f16acca7700) at
>>> pthread_create.c:312
>>> #10 0x00007f16af3dcafd in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>>
>>> and
>>>
>>> consumer_thread_metadata_poll() calling lttng_consumer_read_subbuffer()
>>> / on_buffer_ready() calling pthread_mutex_lock(stream->lock) and then
>>> hanging
>>>
>>>
>>> This again can be reproduced by the same script, just running
>>> consecutive sessions of short duration (<10 seconds) with traces getting
>>> captured on a remote node.
>>> I hit this every time I run the script.
>>>
>>> Regards,
>>> Aravind.
>>>
>>>
>>>
>>> On Fri, Dec 11, 2015 at 11:33 PM, Jonathan Rajotte Julien <
>>> Jonathan.rajotte-julien at efficios.com> wrote:
>>>
>>>> Hi Aravind,
>>>>
>>>> On 2015-12-11 12:16 PM, Aravind HT wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Did you get a chance to reproduce the problem with the scripts ? Let
>>>>> me know if you need any help running it.
>>>>>
>>>>
>>>> Time is a scarce resource unfortunately. But until I have the time to
>>>> look at it you might want to investigate if you are able to reproduce it on
>>>> 2.6.1/2.7.0/master on 2 regular (x86_64, ubuntu/debian) VM this would
>>>> really help.
>>>>
>>>> Cheers!
>>>>
>>>>
>>>>> Regards,
>>>>> Aravind.
>>>>>
>>>>>
>>>>> On Wed, Dec 9, 2015 at 3:04 PM, Aravind HT <aravind.ht at gmail.com
>>>>> <mailto:aravind.ht at gmail.com>> wrote:
>>>>>
>>>>>     Sorry about that, not sure how it got missed.
>>>>>     Here it is.
>>>>>
>>>>>
>>>>>     # This needs a two node set up (1. local current node 2. remote
>>>>> node )
>>>>>     # relayd runs on the current node where traces are captured from
>>>>>     the remote node
>>>>>     # remote node runs test applications which generate traces.
>>>>>     # the launch_script_RN is executed on the current node and uses
>>>>>     ssh  to execute commands on the remote node. So this part may not
>>>>>     work in every case and may prompt for a password.
>>>>>     # if experiencing problems with ssh , kindly check
>>>>>
>>>>> http://serverfault.com/questions/241588/how-to-automate-ssh-login-with-password
>>>>>
>>>>>     # ====================       To Run  =============================
>>>>>     launch_script_RN.py self_profile -c /tmp/configFile.txt
>>>>>
>>>>>
>>>>>
>>>>>     # configFile.txt is the file which has configuration params that
>>>>>     launchScript
>>>>>     # needs to configure lttng sessions. Below is an explanation of
>>>>>     the different options.
>>>>>     # ===================     configFile.txt
>>>>> =============================
>>>>>
>>>>>     [section1]
>>>>>     # final out put file path
>>>>>     OutputFile = /tmp/Final_report.txt
>>>>>     # the remote node hostname on which test applications run and the
>>>>>     test sessions will be created; this should be something that could
>>>>>     be used with ssh. Traces will be transported from this node to the
>>>>>     lttng_relayd running on the current node.
>>>>>     Node = MY_REMOTE_NODE
>>>>>     # Sub buffer size to start this with
>>>>>     SubBufSize = 16k
>>>>>     # Sub buffer count
>>>>>     SubBufCount = 4
>>>>>     # per uid buffer
>>>>>     BufferScheme = --buffers-uid
>>>>>     # yes
>>>>>     EnableTracing = yes
>>>>>     # Bit rate of the test applications. Comman seperated example "1,
>>>>>     3, 3, 50" sayss 4 test applications producing 1, 3, 3, and 50 Kb/s
>>>>>     traces.
>>>>>     # So with the below, we just start with 1 test application
>>>>>     producing 10 kb/s
>>>>>     TestApps = 10
>>>>>     # session life time in seonds
>>>>>     TestTime = 10
>>>>>     # Max number of successive sessions to configure. if n then n-1
>>>>>     sessions are run, ex MaxRun = 2 will run 1 session.
>>>>>     MaxRun = 100
>>>>>
>>>>>
>>>>>     # ====================  Place the following files under
>>>>>     ===============
>>>>>
>>>>>     # /tmp on the remote node
>>>>>     clean_RemNode_apps.sh
>>>>>     report_lttng_script.sh
>>>>>
>>>>>     # rest of the scripts under /usr/sbin on the current local node on
>>>>>     which lttng_realyd runs
>>>>>     # Define a trace point MY_TRACE to take a single string arg with
>>>>>     LOG_TEST_APP_PROFILING as the provider, compile test lttng_testApp
>>>>>     and place it under /usr/sbin of the remote host
>>>>>
>>>>>     # in launch_script_RN.py change currentNodeIP to the IP address on
>>>>>     which relayd is receiving, default ports are used.
>>>>>
>>>>>     # lttng_relayd is started as
>>>>>     /usr/bin/lttng-relayd -o /var/log/lttng-traces -d
>>>>>
>>>>>     # lttng_sessiond is started as
>>>>>     /usr/bin/lttng-sessiond --consumerd32-path
>>>>>     /usr/lib/lttng/libexec/lttng-consumerd --consumerd32-libdir
>>>>>     /usr/lib/ --consumerd64-path
>>>>>     /usr/lib64/lttng/libexec/lttng-consumerd --consumerd64-libdir
>>>>>     /usr/lib64/ -b --no-kernel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     Regards,
>>>>>     Aravind.
>>>>>
>>>>>     On Wed, Dec 9, 2015 at 12:57 AM, Jonathan Rajotte Julien
>>>>>     <Jonathan.rajotte-julien at efficios.com
>>>>>     <mailto:Jonathan.rajotte-julien at efficios.com>> wrote:
>>>>>
>>>>>         Hi Aravind,
>>>>>
>>>>>         There is no README in the archive you sent.
>>>>>
>>>>>         Cheers
>>>>>
>>>>>         On 2015-12-08 07:51 AM, Aravind HT wrote:
>>>>>
>>>>>             Hi,
>>>>>
>>>>>             I am trying to upgrade in parallel, but this issue may
>>>>>             still be present after I upgrade or may be temporarily
>>>>>             masked. So I need to find the root cause for this and then
>>>>>             see if its available on the latest before committing to
>>>>>             upgrade.
>>>>>
>>>>>             There is another issue i'm hitting, the lttng list command
>>>>>             hangs after lttng destroy session when running the
>>>>> profiling.
>>>>>
>>>>>             I found that consumerd 64 goes into an infinite loop
>>>>>             waiting to flush metadata in
>>>>>             lttng_ustconsumer_recv_metadata() :: while
>>>>>             (consumer_metadata_cache_flushed(channel, offset + len,
>>>>>             timer)) .
>>>>>             In consumer_metadata_cache,
>>>>>             channel->metadata_stream->endpoint_status is
>>>>>             CONSUMER_ENDPOINT_ACTIVE,
>>>>>             metadata_stream->ust_metadata_pushed is 0 with offset
>>>>>             having some value. This call always returns a 1 from the
>>>>>             last else{} block resulting in an infinite loop. Upon
>>>>>             searching the forum, I found the same issue being reported
>>>>>             here :
>>>>> https://www.mail-archive.com/lttng-dev@lists.lttng.org/msg07982.html
>>>>>
>>>>>             Regards,
>>>>>             Aravind.
>>>>>
>>>>>
>>>>>             On Tue, Dec 8, 2015 at 12:43 AM, Jonathan Rajotte
>>>>>             <jonathan.r.julien at gmail.com
>>>>>             <mailto:jonathan.r.julien at gmail.com>> wrote:
>>>>>
>>>>>                 Hi Aravind,
>>>>>
>>>>>                 Did you have the chance to upgrade to 2.6.1.If so
>>>>>             where you able
>>>>>                 to reproduce?
>>>>>
>>>>>                 Cheers
>>>>>
>>>>>                 On Mon, Dec 7, 2015 at 1:07 PM, Aravind HT
>>>>>             <aravind.ht at gmail.com <mailto:aravind.ht at gmail.com>>
>>>>>                 wrote:
>>>>>
>>>>>                     Hi,
>>>>>
>>>>>                     I have attached the complete profiling scripts
>>>>>             here, its a bit
>>>>>                     shabby, im new to python.
>>>>>
>>>>>                     There is a README which has the details on how to
>>>>>             execute it.
>>>>>                     Im using a Yocto 1.6 on x86_64 platforms on both
>>>>>             the nodes.
>>>>>
>>>>>
>>>>>                     Running this script when there are other sessions
>>>>>             running
>>>>>                     seems to reproduce this problem easily.
>>>>>                     Please try it and let me know if you have any
>>>>> issues
>>>>>                     reproducing the problem.
>>>>>
>>>>>                     Regards,
>>>>>                     Aravind.
>>>>>
>>>>>                     On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau
>>>>>                     <jeremie.galarneau at efficios.com
>>>>>             <mailto:jeremie.galarneau at efficios.com>
>>>>>                     <mailto:jeremie.galarneau at efficios.com
>>>>>             <mailto:jeremie.galarneau at efficios.com>>> wrote:
>>>>>
>>>>>                         On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT
>>>>>                         <aravind.ht at gmail.com
>>>>>             <mailto:aravind.ht at gmail.com> <mailto:aravind.ht at gmail.com
>>>>>
>>>>>             <mailto:aravind.ht at gmail.com>>> wrote:
>>>>>                         > I am using 2.6.0 .I will try to share the
>>>>>             code that I'm
>>>>>                         using here in some
>>>>>                         > time. If there are any specific fixes that
>>>>>             are relevant
>>>>>                         to this issue, see
>>>>>                         > if you can provide a link to them. I would
>>>>>             ideally like
>>>>>                         to try them out
>>>>>                         > before trying a full upgrade to the latest
>>>>>             versions.
>>>>>
>>>>>                         Hi,
>>>>>
>>>>>                         Can you provide more information on the
>>>>>             system? Which
>>>>>                         distribution,
>>>>>                         architecture, kernel version?
>>>>>
>>>>>                         The verbose sessiond logs might help pinpoint
>>>>> any
>>>>>                         unexpected behaviour
>>>>>                         here (are all applications registering as
>>>>>             expected?).
>>>>>
>>>>>                         Jérémie
>>>>>
>>>>>                         >
>>>>>                         > On Fri, Dec 4, 2015 at 6:11 PM, Jérémie
>>>>>             Galarneau
>>>>>                         > <jeremie.galarneau at efficios.com
>>>>>             <mailto:jeremie.galarneau at efficios.com>
>>>>>                         <mailto:jeremie.galarneau at efficios.com
>>>>>             <mailto:jeremie.galarneau at efficios.com>>> wrote:
>>>>>                         >>
>>>>>                         >> Hi Aravind,
>>>>>                         >>
>>>>>                         >> Can't say I have looked at everything you
>>>>>             sent yet, but
>>>>>                         as a
>>>>>                         >> preemptive question, which version are we
>>>>>             talking about
>>>>>                         here? 2.6.0 or
>>>>>                         >> 2.6.1? 2.6.1 contains a lot of relay daemon
>>>>>             fixes.
>>>>>                         >>
>>>>>                         >> Thanks,
>>>>>                         >> Jérémie
>>>>>                         >>
>>>>>                         >> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT
>>>>>                         <aravind.ht at gmail.com
>>>>>             <mailto:aravind.ht at gmail.com>> wrote:
>>>>>                         >> > Hi,
>>>>>                         >> >
>>>>>                         >> > I am trying to obtain the performance
>>>>>             characteristics
>>>>>                         of lttng with the
>>>>>                         >> > use
>>>>>                         >> > of test applications. Traces are being
>>>>>             produced on a
>>>>>                         local node and
>>>>>                         >> > delivered to relayd that is running on a
>>>>>             separate
>>>>>                         node for storage.
>>>>>                         >> >
>>>>>                         >> > An lttng session with the test
>>>>>             applications producing
>>>>>                         an initial bit
>>>>>                         >> > rate of
>>>>>                         >> > 10 kb/s is started and run for about 30
>>>>>             seconds. The
>>>>>                         starting sub-buffer
>>>>>                         >> > size is kept at 128 kb and sub-buf count
>>>>>             at 4. The
>>>>>                         session is then
>>>>>                         >> > stopped
>>>>>                         >> > and destroyed and traces are analyzed to
>>>>>             see if there
>>>>>                         are any drops.
>>>>>                         >> > This is
>>>>>                         >> > being done in a loop with every
>>>>>             subsequent session
>>>>>                         having an increment
>>>>>                         >> > of 2
>>>>>                         >> > kb/s as long as there are no drops. If
>>>>>             there are
>>>>>                         drops, I increase the
>>>>>                         >> > buffer size by a factor of x2 without
>>>>>             incrementing
>>>>>                         the bit rate.
>>>>>                         >> >
>>>>>                         >> > I see trace drops happening consistently
>>>>>             with test
>>>>>                         apps producing traces
>>>>>                         >> > at
>>>>>                         >> > less than 40 kb/s, it doesnt seem to help
>>>>>             even if I
>>>>>                         started with 1mb x 4
>>>>>                         >> > sub-buffers.
>>>>>                         >> >
>>>>>                         >> > Analysis :
>>>>>                         >> >
>>>>>                         >> > I have attached the lttng_relayd ,
>>>>>             lttng_consumerd_64
>>>>>                         logs and the
>>>>>                         >> > entire
>>>>>                         >> > trace directory, hope you will be able to
>>>>>             view it.
>>>>>                         >> > I have modified lttng_relayd code to dump
>>>>>             the traces
>>>>>                         being captured in
>>>>>                         >> > the
>>>>>                         >> > lttng_relayd logs along with debug info.
>>>>>                         >> >
>>>>>                         >> > Each test app is producing logs in the
>>>>>             form of  :
>>>>>                         >> > "TraceApp PID - 31940 THID - 31970
>>>>>             @threadRate - 1032
>>>>>                         b/s appRate - 2079
>>>>>                         >> > b/s
>>>>>                         >> > threadTraceNum - 9 appTraceNum - 18
>>>>>        sleepTime - 192120"
>>>>>                         >> >
>>>>>                         >> > The test application PID, test
>>>>>             application thread id,
>>>>>                         thread bit rate,
>>>>>                         >> > test
>>>>>                         >> > app bit rate, thread trace number and
>>>>>             application
>>>>>                         trace number s are
>>>>>                         >> > part of
>>>>>                         >> > the trace. So in the above trace, the
>>>>>             thread is
>>>>>                         producing at 1 kb/s and
>>>>>                         >> > the
>>>>>                         >> > whole test app is producing at 2 kb/s.
>>>>>                         >> >
>>>>>                         >> > If we look at the babeltrace out put, we
>>>>>             see that the
>>>>>                         Trace with
>>>>>                         >> > TraceApp
>>>>>                         >> > PID - 31940 appTraceNum 2 is missing ,
>>>>>             with 1, 3, 4,
>>>>>                         5 and so on being
>>>>>                         >> > successfully captured.
>>>>>                         >> > I looked at the lttng_relayd logs and
>>>>>             found that
>>>>>                         trace of "appTraceNum
>>>>>                         >> > 2" is
>>>>>                         >> > not delivered/generated by the consumerd
>>>>>             to the
>>>>>                         relayd in sequence with
>>>>>                         >> > other traces. To rule out that this is
>>>>>             not a test
>>>>>                         application problem,
>>>>>                         >> > you
>>>>>                         >> > can look at line ltttng_relayd log :
>>>>>             12778 and see
>>>>>                         traces from
>>>>>                         >> > appTraceNum -
>>>>>                         >> > 1 to appTraceNum - 18 including the
>>>>>             appTraceNum 2 are
>>>>>                         "re-delivered" by
>>>>>                         >> > the
>>>>>                         >> > consumerd to the relayd.
>>>>>                         >> > Essentially, I see appTraceNum 1 through
>>>>>             appTraceNum
>>>>>                         18 being delivered
>>>>>                         >> > twice, once individually where
>>>>>             appTraceNum 2 is
>>>>>                         missing and once as a
>>>>>                         >> > group
>>>>>                         >> > at line 12778 where its present.
>>>>>                         >> >
>>>>>                         >> >
>>>>>                         >> > Request help with
>>>>>                         >> > 1. why traces are delivered twice, is it
>>>>>             by design or
>>>>>                         a genuine problem
>>>>>                         >> > ?
>>>>>                         >> > 2. how to avoid traces being dropped even
>>>>>             though
>>>>>                         buffers are
>>>>>                         >> > sufficiently
>>>>>                         >> > large enough ?
>>>>>                         >> >
>>>>>                         >> >
>>>>>                         >> > Regards,
>>>>>                         >> > Aravind.
>>>>>                         >> >
>>>>>                         >> >
>>>>>             _______________________________________________
>>>>>                         >> > lttng-dev mailing list
>>>>>                         >> > lttng-dev at lists.lttng.org
>>>>>             <mailto:lttng-dev at lists.lttng.org>
>>>>>                         >> >
>>>>>             http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>>>                         >> >
>>>>>                         >>
>>>>>                         >>
>>>>>                         >>
>>>>>                         >> --
>>>>>                         >> Jérémie Galarneau
>>>>>                         >> EfficiOS Inc.
>>>>>                         >> http://www.efficios.com
>>>>>                         >
>>>>>                         >
>>>>>
>>>>>
>>>>>
>>>>>                         --
>>>>>                         Jérémie Galarneau
>>>>>                         EfficiOS Inc.
>>>>>             http://www.efficios.com
>>>>>
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>                     lttng-dev mailing list
>>>>>             lttng-dev at lists.lttng.org
>>>>>             <mailto:lttng-dev at lists.lttng.org>
>>>>>             <mailto:lttng-dev at lists.lttng.org
>>>>>             <mailto:lttng-dev at lists.lttng.org>>
>>>>>             http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 --     Jonathan Rajotte Julien
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             lttng-dev mailing list
>>>>>             lttng-dev at lists.lttng.org <mailto:
>>>>> lttng-dev at lists.lttng.org>
>>>>>             http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>>>
>>>>>
>>>>>         --         Jonathan R. Julien
>>>>>         Efficios
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Jonathan R. Julien
>>>> Efficios
>>>>
>>>>
>>>
>>> _______________________________________________
>>> lttng-dev mailing list
>>> lttng-dev at lists.lttng.org
>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> http://www.efficios.com
>>>
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
>>
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151230/9b1c6a4f/attachment-0001.html>


More information about the lttng-dev mailing list