[lttng-dev] Babeltrace performance issue in live-reading mode

liguang li liguang.lee6 at gmail.com
Wed Oct 11 10:44:25 UTC 2017


Hi,


On Thu, Sep 21, 2017 at 11:03 PM, Jonathan Rajotte Julien <
Jonathan.rajotte-julien at efficios.com> wrote:

> Hi,
>
> On 2017-09-21 05:34 AM, liguang li wrote:
> > Hi,
> >
> > On Wed, Sep 20, 2017 at 10:58 PM, Jonathan Rajotte Julien <
> Jonathan.rajotte-julien at efficios.com <mailto:Jonathan.rajotte-
> julien at efficios.com>> wrote:
> >
> >     Hi,
> >
> >     On 2017-09-20 05:12 AM, liguang li wrote:
> >     >
> >     >
> >     > On Tue, Sep 19, 2017 at 10:57 PM, Jonathan Rajotte-Julien <
> jonathan.rajotte-julien at efficios.com <mailto:jonathan.rajotte-
> julien at efficios.com> <mailto:jonathan.rajotte-julien at efficios.com <mailto:
> jonathan.rajotte-julien at efficios.com>>> wrote:
> >     >
> >     >     On Tue, Sep 19, 2017 at 03:53:27PM +0800, liguang li wrote:
> >     >     >    Hi,
> >     >     >    On Mon, Sep 18, 2017 at 11:18 PM, Jonathan Rajotte-Julien
> >     >     >    <[1]jonathan.rajotte-julien at efficios.com <mailto:
> jonathan.rajotte-julien at efficios.com> <mailto:jonathan.rajotte-
> julien at efficios.com <mailto:jonathan.rajotte-julien at efficios.com>>> wrote:
> >     >     >
> >     >     >      Hi,
> >     >     >      On Mon, Sep 18, 2017 at 11:32:07AM +0800, liguang li
> wrote:
> >     >     >      >    Hi,
> >     >     >      >
> >     >     >      >    Create a session in live-reading mode, run a
> application which
> >     >     >      having very
> >     >     >      >    high event throughput, then prints
> >     >     >      >    the events with babeltrace. We found the live
> trace viewer are
> >     >     >      viewing
> >     >     >      >    events a few seconds ago, and as time
> >     >     >
> >     >     >      Could you provide us the version used for babeltrace,
> lttng-tools and
> >     >     >      lttng-ust?
> >     >     >
> >     >     >     Babeltrace: 1.5.1
> >     >
> >     >     Update to babeltrace 1.5.3.
> >     >
> >     >     >     Lttng-tools: 2.8.6
> >     >
> >     >     Update to lttng-tools 2.8.8
> >     >
> >     >     >     Lttng-ust: 2.8.2
> >     >     >
> >     >     >      >    goes on, the delay will be bigger and bigger.
> >     >     >
> >     >     >      A similar issues was observed a couple months ago,
> which implicated
> >     >     >      multiple delayed ack
> >     >     >      problems during communication between lttng-relayd and
> babeltrace.
> >     >     >
> >     >     >      The following fixes were merged:
> >     >     >
> >     >     >      [1]
> >     >     >      [2]https://github.com/lttng/lttng-tools/commit/
> b6025e9476332b75eb8184345c3eb3e924780088 <https://github.com/lttng/
> lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088> <
> https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3
> e924780088 <https://github.com/lttng/lttng-tools/commit/
> b6025e9476332b75eb8184345c3eb3e924780088>>
> >     >     >      [2]
> >     >     >      [3]https://github.com/efficios/babeltrace/commit/
> de417d04317ca3bc30f59685a9d19de670e4b11d <https://github.com/efficios/
> babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d> <
> https://github.com/efficios/babeltrace/commit/
> de417d04317ca3bc30f59685a9d19de670e4b11d <https://github.com/efficios/
> babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d>>
> >     >     >      [3]
> >     >     >      [4]https://github.com/efficios/babeltrace/commit/
> 4594dbd8f7c2af2446a3e310bee74ba4a2e9d648 <https://github.com/efficios/
> babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648> <
> https://github.com/efficios/babeltrace/commit/
> 4594dbd8f7c2af2446a3e310bee74ba4a2e9d648 <https://github.com/efficios/
> babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648>>
> >     >     >
> >     >     >      In the event that you are already using an updated
> version of babeltrace
> >     >     >      and
> >     >     >      lttng-tools, it would be pertinent to provide us with a
> simple
> >     >     >      reproducer so we
> >     >     >      can assess the issue.
> >     >
> >     >     The version you are using does not include the mentioned
> fixes. Please update
> >     >     and redo your experiment.
> >     >
> >     >
> >     > Test this issue in the version you have listed, the issue still
> exists.
> >
> >     Given that previous versions had a major timing problem there I
> would expect to have some improvement.
> >
> >     In that case, we will need a lot more information on your
> benchmarking strategy.
> >     We will need a simple reproducer, your benchmark code (r,gnuplot
> etc.), your overall methodology
> >     to be able to reproduce the issue locally. Otherwise, it will be
> very hard
> >     to come to any conclusion.
> >
> >
> > Sorry, we can not provide the detailed information about the benchmark
> code due to the company regulations.
> >
> >
> >     >
> >     >
> >     >     Cheers
> >     >
> >     >     >
> >     >     >
> >     >     >     Steps:
> >     >     >     lttng create session --live -U net://*
> >     >     >     lttng enable-channel -s session -u ch1
> >     >     >     lttng enable-event -s session -c ch1 -u -a
> >     >     >     lttng start
> >     >     >
> >     >     >     Run a high event throughput application, which is
> multithreaded
> >     >     >    application>
> >     >
> >     > In the multithreaded application, each tracepoint will have the
> wall
> >     > time of the system,then we can easily reproduce this issue through
> >     > comparing the time of recorded event and the system wall time.
> >     >
> >     >
> >     >     >     babeltrace -i lttng-live net://*
> >     >     >
> >     >     >     After a while, we found the timestamp of the event in
> the babeltrace is
> >     >     >    different with the time in host
> >     >     >     which run the application. And the delay will be bigger
> and bigger with
> >     >     >    time goes.
> >     >     >
> >     >     >
> >     >     >      Cheers
> >     >     >      >    I checked the source code, found Babeltrace in
> live-reading mode
> >     >     >      will read
> >     >     >      >    the recorded events in the CTF
> >     >     >      >    files, and then parse and print it in a single
> thread. The process
> >     >     >      is a
> >     >     >      >    little slow, do you have any ideas to
> >     >     >      >    improve the process.
> >     >
> >     >
> >     > From my understanding of the source code, the process of parse and
> >     > print event will consume a lot of time. For example, the
> multithreaded
> >     > application will consume 3 CPUs, in a specified time,3 subbuffers
> will
> >     > be filled and sent to lttng-relayd daemon, recorded into the CTF
> files.
> >     > If in the specified time, babeltrace only handled 2 subbuffers'
> event,
> >     > thenthe issue will happens.
> >
> >     Did you perform a bisection for where the delay come from? Reception
> of packet? formatting of event?
>
> ?
>
> >     What is the throughput of the application?
>
>
> ?
> This is a key element if anyone want to test at home. It can be expressed
> in event/s, event/ms.
>
> >     How many tracepoint definition>     Does babeltrace catch up if a
> quiescent period is given?
> >
> >
> > I think so.
> >
> >
> >     Could you provide us with statistics, timing data, etc.?
> >
> >     What type of delay are we talking about?
> >
> >
> > For example, the system wall time is 5:18 PM now, but the Babeltrace is
> still printing events which
> > have time 5:10 PM. Then stop running the application, the Babeltrace
> still need some times to print
> > the recorded events.
> >
> > I think the root cause is that the parsing and printing process is a bit
> slow. So i want to know if there
> > are any method to improve the performance of this process.
>
> A simple experiment would be to run a bounded workload
> to perform a quick timing evaluation of babeltrace with a streamed trace
> (on disk) and in live mode.
> Calculate the time (time babeltrace path_to_streamed_trace) it takes
> babeltrace to read the on disk trace
> and perform the same experiment with a live trace. It is important to
> bound your experiment
> and have mostly the same workload in both experiment.
>
>
I made a simple example to illustrate the issue which i had met.

Run the application for about 60 seconds, the trace is stored on disk. The
time of reading the trace using babeltrace is about 150 seconds. The speed
of processing is much lower than generating the trace.

I think the performance of babeltrace in live mode will not greater than
reading the trace on disk. So in live mode, timestamp of the event will
have delay (printing event with timestamp of few minutes ago), i wonder to
know if there are any methods to improve the performance of babeltrace in
live mode.

Regards,
Liguang


> This will give us a rough estimate of the disparity between both scenario.
>
> Cheers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20171011/6f25cfe6/attachment.html>


More information about the lttng-dev mailing list