[lttng-dev] Relayd trace drops

Jonathan Rajotte jonathan.r.julien at gmail.com
Mon Dec 7 14:13:25 EST 2015


Hi Aravind,

Did you have the chance to upgrade to 2.6.1.If so where you able to
reproduce?

Cheers

On Mon, Dec 7, 2015 at 1:07 PM, Aravind HT <aravind.ht at gmail.com> wrote:

> Hi,
>
> I have attached the complete profiling scripts here, its a bit shabby, im
> new to python.
>
> There is a README which has the details on how to execute it.
> Im using a Yocto 1.6 on x86_64 platforms on both the nodes.
>
>
> Running this script when there are other sessions running seems to
> reproduce this problem easily.
> Please try it and let me know if you have any issues reproducing the
> problem.
>
> Regards,
> Aravind.
>
> On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau <
> jeremie.galarneau at efficios.com> wrote:
>
>> On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT <aravind.ht at gmail.com> wrote:
>> > I am using 2.6.0 .I will try to share the code that I'm using here in
>> some
>> > time. If there are any specific fixes that are relevant to this issue,
>> see
>> > if you can provide a link to them. I would ideally like to try them out
>> > before trying a full upgrade to the latest versions.
>>
>> Hi,
>>
>> Can you provide more information on the system? Which distribution,
>> architecture, kernel version?
>>
>> The verbose sessiond logs might help pinpoint any unexpected behaviour
>> here (are all applications registering as expected?).
>>
>> Jérémie
>>
>> >
>> > On Fri, Dec 4, 2015 at 6:11 PM, Jérémie Galarneau
>> > <jeremie.galarneau at efficios.com> wrote:
>> >>
>> >> Hi Aravind,
>> >>
>> >> Can't say I have looked at everything you sent yet, but as a
>> >> preemptive question, which version are we talking about here? 2.6.0 or
>> >> 2.6.1? 2.6.1 contains a lot of relay daemon fixes.
>> >>
>> >> Thanks,
>> >> Jérémie
>> >>
>> >> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT <aravind.ht at gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I am trying to obtain the performance characteristics of lttng with
>> the
>> >> > use
>> >> > of test applications. Traces are being produced on a local node and
>> >> > delivered to relayd that is running on a separate node for storage.
>> >> >
>> >> > An lttng session with the test applications producing an initial bit
>> >> > rate of
>> >> > 10 kb/s is started and run for about 30 seconds. The starting
>> sub-buffer
>> >> > size is kept at 128 kb and sub-buf count at 4. The session is then
>> >> > stopped
>> >> > and destroyed and traces are analyzed to see if there are any drops.
>> >> > This is
>> >> > being done in a loop with every subsequent session having an
>> increment
>> >> > of 2
>> >> > kb/s as long as there are no drops. If there are drops, I increase
>> the
>> >> > buffer size by a factor of x2 without incrementing the bit rate.
>> >> >
>> >> > I see trace drops happening consistently with test apps producing
>> traces
>> >> > at
>> >> > less than 40 kb/s, it doesnt seem to help even if I started with 1mb
>> x 4
>> >> > sub-buffers.
>> >> >
>> >> > Analysis :
>> >> >
>> >> > I have attached the lttng_relayd , lttng_consumerd_64 logs and the
>> >> > entire
>> >> > trace directory, hope you will be able to view it.
>> >> > I have modified lttng_relayd code to dump the traces being captured
>> in
>> >> > the
>> >> > lttng_relayd logs along with debug info.
>> >> >
>> >> > Each test app is producing logs in the form of  :
>> >> > "TraceApp PID - 31940 THID - 31970 @threadRate - 1032 b/s appRate -
>> 2079
>> >> > b/s
>> >> > threadTraceNum - 9 appTraceNum - 18  sleepTime - 192120"
>> >> >
>> >> > The test application PID, test application thread id, thread bit
>> rate,
>> >> > test
>> >> > app bit rate, thread trace number and application trace number s are
>> >> > part of
>> >> > the trace. So in the above trace, the thread is producing at 1 kb/s
>> and
>> >> > the
>> >> > whole test app is producing at 2 kb/s.
>> >> >
>> >> > If we look at the babeltrace out put, we see that the Trace with
>> >> > TraceApp
>> >> > PID - 31940 appTraceNum 2 is missing , with 1, 3, 4, 5 and so on
>> being
>> >> > successfully captured.
>> >> > I looked at the lttng_relayd logs and found that trace of
>> "appTraceNum
>> >> > 2" is
>> >> > not delivered/generated by the consumerd to the relayd in sequence
>> with
>> >> > other traces. To rule out that this is not a test application
>> problem,
>> >> > you
>> >> > can look at line ltttng_relayd log : 12778 and see traces from
>> >> > appTraceNum -
>> >> > 1 to appTraceNum - 18 including the appTraceNum 2 are "re-delivered"
>> by
>> >> > the
>> >> > consumerd to the relayd.
>> >> > Essentially, I see appTraceNum 1 through appTraceNum 18 being
>> delivered
>> >> > twice, once individually where appTraceNum 2 is missing and once as a
>> >> > group
>> >> > at line 12778 where its present.
>> >> >
>> >> >
>> >> > Request help with
>> >> > 1. why traces are delivered twice, is it by design or a genuine
>> problem
>> >> > ?
>> >> > 2. how to avoid traces being dropped even though buffers are
>> >> > sufficiently
>> >> > large enough ?
>> >> >
>> >> >
>> >> > Regards,
>> >> > Aravind.
>> >> >
>> >> > _______________________________________________
>> >> > lttng-dev mailing list
>> >> > lttng-dev at lists.lttng.org
>> >> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jérémie Galarneau
>> >> EfficiOS Inc.
>> >> http://www.efficios.com
>> >
>> >
>>
>>
>>
>> --
>> Jérémie Galarneau
>> EfficiOS Inc.
>> http://www.efficios.com
>>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>


-- 
Jonathan Rajotte Julien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151207/87b9f065/attachment-0001.html>


More information about the lttng-dev mailing list