[lttng-dev] Relayd trace drops

Tue Dec 8 14:27:47 EST 2015

Hi Aravind,

There is no README in the archive you sent.

Cheers

On 2015-12-08 07:51 AM, Aravind HT wrote:
> Hi,
>
> I am trying to upgrade in parallel, but this issue may still be 
> present after I upgrade or may be temporarily masked. So I need to 
> find the root cause for this and then see if its available on the 
> latest before committing to upgrade.
>
> There is another issue i'm hitting, the lttng list command hangs after 
> lttng destroy session when running the profiling.
>
> I found that consumerd 64 goes into an infinite loop waiting to flush 
> metadata in lttng_ustconsumer_recv_metadata() :: while 
> (consumer_metadata_cache_flushed(channel, offset + len, timer)) .
> In consumer_metadata_cache, channel->metadata_stream->endpoint_status 
> is CONSUMER_ENDPOINT_ACTIVE, metadata_stream->ust_metadata_pushed is 0 
> with offset having some value. This call always returns a 1 from the 
> last else{} block resulting in an infinite loop. Upon searching the 
> forum, I found the same issue being reported here :
> https://www.mail-archive.com/lttng-dev@lists.lttng.org/msg07982.html
>
> Regards,
> Aravind.
>
>
> On Tue, Dec 8, 2015 at 12:43 AM, Jonathan Rajotte 
> <jonathan.r.julien at gmail.com> wrote:
>
>     Hi Aravind,
>
>     Did you have the chance to upgrade to 2.6.1.If so where you able
>     to reproduce?
>
>     Cheers
>
>     On Mon, Dec 7, 2015 at 1:07 PM, Aravind HT <aravind.ht at gmail.com>
>     wrote:
>
>         Hi,
>
>         I have attached the complete profiling scripts here, its a bit
>         shabby, im new to python.
>
>         There is a README which has the details on how to execute it.
>         Im using a Yocto 1.6 on x86_64 platforms on both the nodes.
>
>
>         Running this script when there are other sessions running
>         seems to reproduce this problem easily.
>         Please try it and let me know if you have any issues
>         reproducing the problem.
>
>         Regards,
>         Aravind.
>
>         On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau
>         <jeremie.galarneau at efficios.com
>         <mailto:jeremie.galarneau at efficios.com>> wrote:
>
>             On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT
>             <aravind.ht at gmail.com <mailto:aravind.ht at gmail.com>> wrote:
>             > I am using 2.6.0 .I will try to share the code that I'm
>             using here in some
>             > time. If there are any specific fixes that are relevant
>             to this issue, see
>             > if you can provide a link to them. I would ideally like
>             to try them out
>             > before trying a full upgrade to the latest versions.
>
>             Hi,
>
>             Can you provide more information on the system? Which
>             distribution,
>             architecture, kernel version?
>
>             The verbose sessiond logs might help pinpoint any
>             unexpected behaviour
>             here (are all applications registering as expected?).
>
>             Jérémie
>
>             >
>             > On Fri, Dec 4, 2015 at 6:11 PM, Jérémie Galarneau
>             > <jeremie.galarneau at efficios.com
>             <mailto:jeremie.galarneau at efficios.com>> wrote:
>             >>
>             >> Hi Aravind,
>             >>
>             >> Can't say I have looked at everything you sent yet, but
>             as a
>             >> preemptive question, which version are we talking about
>             here? 2.6.0 or
>             >> 2.6.1? 2.6.1 contains a lot of relay daemon fixes.
>             >>
>             >> Thanks,
>             >> Jérémie
>             >>
>             >> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT
>             <aravind.ht at gmail.com> wrote:
>             >> > Hi,
>             >> >
>             >> > I am trying to obtain the performance characteristics
>             of lttng with the
>             >> > use
>             >> > of test applications. Traces are being produced on a
>             local node and
>             >> > delivered to relayd that is running on a separate
>             node for storage.
>             >> >
>             >> > An lttng session with the test applications producing
>             an initial bit
>             >> > rate of
>             >> > 10 kb/s is started and run for about 30 seconds. The
>             starting sub-buffer
>             >> > size is kept at 128 kb and sub-buf count at 4. The
>             session is then
>             >> > stopped
>             >> > and destroyed and traces are analyzed to see if there
>             are any drops.
>             >> > This is
>             >> > being done in a loop with every subsequent session
>             having an increment
>             >> > of 2
>             >> > kb/s as long as there are no drops. If there are
>             drops, I increase the
>             >> > buffer size by a factor of x2 without incrementing
>             the bit rate.
>             >> >
>             >> > I see trace drops happening consistently with test
>             apps producing traces
>             >> > at
>             >> > less than 40 kb/s, it doesnt seem to help even if I
>             started with 1mb x 4
>             >> > sub-buffers.
>             >> >
>             >> > Analysis :
>             >> >
>             >> > I have attached the lttng_relayd , lttng_consumerd_64
>             logs and the
>             >> > entire
>             >> > trace directory, hope you will be able to view it.
>             >> > I have modified lttng_relayd code to dump the traces
>             being captured in
>             >> > the
>             >> > lttng_relayd logs along with debug info.
>             >> >
>             >> > Each test app is producing logs in the form of  :
>             >> > "TraceApp PID - 31940 THID - 31970 @threadRate - 1032
>             b/s appRate - 2079
>             >> > b/s
>             >> > threadTraceNum - 9 appTraceNum - 18  sleepTime - 192120"
>             >> >
>             >> > The test application PID, test application thread id,
>             thread bit rate,
>             >> > test
>             >> > app bit rate, thread trace number and application
>             trace number s are
>             >> > part of
>             >> > the trace. So in the above trace, the thread is
>             producing at 1 kb/s and
>             >> > the
>             >> > whole test app is producing at 2 kb/s.
>             >> >
>             >> > If we look at the babeltrace out put, we see that the
>             Trace with
>             >> > TraceApp
>             >> > PID - 31940 appTraceNum 2 is missing , with 1, 3, 4,
>             5 and so on being
>             >> > successfully captured.
>             >> > I looked at the lttng_relayd logs and found that
>             trace of "appTraceNum
>             >> > 2" is
>             >> > not delivered/generated by the consumerd to the
>             relayd in sequence with
>             >> > other traces. To rule out that this is not a test
>             application problem,
>             >> > you
>             >> > can look at line ltttng_relayd log : 12778 and see
>             traces from
>             >> > appTraceNum -
>             >> > 1 to appTraceNum - 18 including the appTraceNum 2 are
>             "re-delivered" by
>             >> > the
>             >> > consumerd to the relayd.
>             >> > Essentially, I see appTraceNum 1 through appTraceNum
>             18 being delivered
>             >> > twice, once individually where appTraceNum 2 is
>             missing and once as a
>             >> > group
>             >> > at line 12778 where its present.
>             >> >
>             >> >
>             >> > Request help with
>             >> > 1. why traces are delivered twice, is it by design or
>             a genuine problem
>             >> > ?
>             >> > 2. how to avoid traces being dropped even though
>             buffers are
>             >> > sufficiently
>             >> > large enough ?
>             >> >
>             >> >
>             >> > Regards,
>             >> > Aravind.
>             >> >
>             >> > _______________________________________________
>             >> > lttng-dev mailing list
>             >> > lttng-dev at lists.lttng.org
>             >> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>             >> >
>             >>
>             >>
>             >>
>             >> --
>             >> Jérémie Galarneau
>             >> EfficiOS Inc.
>             >> http://www.efficios.com
>             >
>             >
>
>
>
>             --
>             Jérémie Galarneau
>             EfficiOS Inc.
>             http://www.efficios.com
>
>
>
>         _______________________________________________
>         lttng-dev mailing list
>         lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
>         http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
>
>
>     -- 
>     Jonathan Rajotte Julien
>
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Jonathan R. Julien
Efficios