[lttng-dev] Relayd trace drops

Aravind HT aravind.ht at gmail.com
Mon Dec 7 13:07:24 EST 2015


Hi,

I have attached the complete profiling scripts here, its a bit shabby, im
new to python.

There is a README which has the details on how to execute it.
Im using a Yocto 1.6 on x86_64 platforms on both the nodes.


Running this script when there are other sessions running seems to
reproduce this problem easily.
Please try it and let me know if you have any issues reproducing the
problem.

Regards,
Aravind.

On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau <
jeremie.galarneau at efficios.com> wrote:

> On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT <aravind.ht at gmail.com> wrote:
> > I am using 2.6.0 .I will try to share the code that I'm using here in
> some
> > time. If there are any specific fixes that are relevant to this issue,
> see
> > if you can provide a link to them. I would ideally like to try them out
> > before trying a full upgrade to the latest versions.
>
> Hi,
>
> Can you provide more information on the system? Which distribution,
> architecture, kernel version?
>
> The verbose sessiond logs might help pinpoint any unexpected behaviour
> here (are all applications registering as expected?).
>
> Jérémie
>
> >
> > On Fri, Dec 4, 2015 at 6:11 PM, Jérémie Galarneau
> > <jeremie.galarneau at efficios.com> wrote:
> >>
> >> Hi Aravind,
> >>
> >> Can't say I have looked at everything you sent yet, but as a
> >> preemptive question, which version are we talking about here? 2.6.0 or
> >> 2.6.1? 2.6.1 contains a lot of relay daemon fixes.
> >>
> >> Thanks,
> >> Jérémie
> >>
> >> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT <aravind.ht at gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > I am trying to obtain the performance characteristics of lttng with
> the
> >> > use
> >> > of test applications. Traces are being produced on a local node and
> >> > delivered to relayd that is running on a separate node for storage.
> >> >
> >> > An lttng session with the test applications producing an initial bit
> >> > rate of
> >> > 10 kb/s is started and run for about 30 seconds. The starting
> sub-buffer
> >> > size is kept at 128 kb and sub-buf count at 4. The session is then
> >> > stopped
> >> > and destroyed and traces are analyzed to see if there are any drops.
> >> > This is
> >> > being done in a loop with every subsequent session having an increment
> >> > of 2
> >> > kb/s as long as there are no drops. If there are drops, I increase the
> >> > buffer size by a factor of x2 without incrementing the bit rate.
> >> >
> >> > I see trace drops happening consistently with test apps producing
> traces
> >> > at
> >> > less than 40 kb/s, it doesnt seem to help even if I started with 1mb
> x 4
> >> > sub-buffers.
> >> >
> >> > Analysis :
> >> >
> >> > I have attached the lttng_relayd , lttng_consumerd_64 logs and the
> >> > entire
> >> > trace directory, hope you will be able to view it.
> >> > I have modified lttng_relayd code to dump the traces being captured in
> >> > the
> >> > lttng_relayd logs along with debug info.
> >> >
> >> > Each test app is producing logs in the form of  :
> >> > "TraceApp PID - 31940 THID - 31970 @threadRate - 1032 b/s appRate -
> 2079
> >> > b/s
> >> > threadTraceNum - 9 appTraceNum - 18  sleepTime - 192120"
> >> >
> >> > The test application PID, test application thread id, thread bit rate,
> >> > test
> >> > app bit rate, thread trace number and application trace number s are
> >> > part of
> >> > the trace. So in the above trace, the thread is producing at 1 kb/s
> and
> >> > the
> >> > whole test app is producing at 2 kb/s.
> >> >
> >> > If we look at the babeltrace out put, we see that the Trace with
> >> > TraceApp
> >> > PID - 31940 appTraceNum 2 is missing , with 1, 3, 4, 5 and so on being
> >> > successfully captured.
> >> > I looked at the lttng_relayd logs and found that trace of "appTraceNum
> >> > 2" is
> >> > not delivered/generated by the consumerd to the relayd in sequence
> with
> >> > other traces. To rule out that this is not a test application problem,
> >> > you
> >> > can look at line ltttng_relayd log : 12778 and see traces from
> >> > appTraceNum -
> >> > 1 to appTraceNum - 18 including the appTraceNum 2 are "re-delivered"
> by
> >> > the
> >> > consumerd to the relayd.
> >> > Essentially, I see appTraceNum 1 through appTraceNum 18 being
> delivered
> >> > twice, once individually where appTraceNum 2 is missing and once as a
> >> > group
> >> > at line 12778 where its present.
> >> >
> >> >
> >> > Request help with
> >> > 1. why traces are delivered twice, is it by design or a genuine
> problem
> >> > ?
> >> > 2. how to avoid traces being dropped even though buffers are
> >> > sufficiently
> >> > large enough ?
> >> >
> >> >
> >> > Regards,
> >> > Aravind.
> >> >
> >> > _______________________________________________
> >> > lttng-dev mailing list
> >> > lttng-dev at lists.lttng.org
> >> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >> >
> >>
> >>
> >>
> >> --
> >> Jérémie Galarneau
> >> EfficiOS Inc.
> >> http://www.efficios.com
> >
> >
>
>
>
> --
> Jérémie Galarneau
> EfficiOS Inc.
> http://www.efficios.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151207/2e6e63c1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Lttng_dev_mailing_scripts.rar
Type: application/rar
Size: 5726 bytes
Desc: not available
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151207/2e6e63c1/attachment.rar>


More information about the lttng-dev mailing list