[lttng-dev] Babeltrace performance issue in live-reading mode
Jonathan Rajotte Julien
Jonathan.rajotte-julien at efficios.com
Thu Sep 21 15:03:51 UTC 2017
Hi,
On 2017-09-21 05:34 AM, liguang li wrote:
> Hi,
>
> On Wed, Sep 20, 2017 at 10:58 PM, Jonathan Rajotte Julien <Jonathan.rajotte-julien at efficios.com <mailto:Jonathan.rajotte-julien at efficios.com>> wrote:
>
> Hi,
>
> On 2017-09-20 05:12 AM, liguang li wrote:
> >
> >
> > On Tue, Sep 19, 2017 at 10:57 PM, Jonathan Rajotte-Julien <jonathan.rajotte-julien at efficios.com <mailto:jonathan.rajotte-julien at efficios.com> <mailto:jonathan.rajotte-julien at efficios.com <mailto:jonathan.rajotte-julien at efficios.com>>> wrote:
> >
> > On Tue, Sep 19, 2017 at 03:53:27PM +0800, liguang li wrote:
> > > Hi,
> > > On Mon, Sep 18, 2017 at 11:18 PM, Jonathan Rajotte-Julien
> > > <[1]jonathan.rajotte-julien at efficios.com <mailto:jonathan.rajotte-julien at efficios.com> <mailto:jonathan.rajotte-julien at efficios.com <mailto:jonathan.rajotte-julien at efficios.com>>> wrote:
> > >
> > > Hi,
> > > On Mon, Sep 18, 2017 at 11:32:07AM +0800, liguang li wrote:
> > > > Hi,
> > > >
> > > > Create a session in live-reading mode, run a application which
> > > having very
> > > > high event throughput, then prints
> > > > the events with babeltrace. We found the live trace viewer are
> > > viewing
> > > > events a few seconds ago, and as time
> > >
> > > Could you provide us the version used for babeltrace, lttng-tools and
> > > lttng-ust?
> > >
> > > Babeltrace: 1.5.1
> >
> > Update to babeltrace 1.5.3.
> >
> > > Lttng-tools: 2.8.6
> >
> > Update to lttng-tools 2.8.8
> >
> > > Lttng-ust: 2.8.2
> > >
> > > > goes on, the delay will be bigger and bigger.
> > >
> > > A similar issues was observed a couple months ago, which implicated
> > > multiple delayed ack
> > > problems during communication between lttng-relayd and babeltrace.
> > >
> > > The following fixes were merged:
> > >
> > > [1]
> > > [2]https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088 <https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088> <https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088 <https://github.com/lttng/lttng-tools/commit/b6025e9476332b75eb8184345c3eb3e924780088>>
> > > [2]
> > > [3]https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d <https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d> <https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d <https://github.com/efficios/babeltrace/commit/de417d04317ca3bc30f59685a9d19de670e4b11d>>
> > > [3]
> > > [4]https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648 <https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648> <https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648 <https://github.com/efficios/babeltrace/commit/4594dbd8f7c2af2446a3e310bee74ba4a2e9d648>>
> > >
> > > In the event that you are already using an updated version of babeltrace
> > > and
> > > lttng-tools, it would be pertinent to provide us with a simple
> > > reproducer so we
> > > can assess the issue.
> >
> > The version you are using does not include the mentioned fixes. Please update
> > and redo your experiment.
> >
> >
> > Test this issue in the version you have listed, the issue still exists.
>
> Given that previous versions had a major timing problem there I would expect to have some improvement.
>
> In that case, we will need a lot more information on your benchmarking strategy.
> We will need a simple reproducer, your benchmark code (r,gnuplot etc.), your overall methodology
> to be able to reproduce the issue locally. Otherwise, it will be very hard
> to come to any conclusion.
>
>
> Sorry, we can not provide the detailed information about the benchmark code due to the company regulations.
>
>
> >
> >
> > Cheers
> >
> > >
> > >
> > > Steps:
> > > lttng create session --live -U net://*
> > > lttng enable-channel -s session -u ch1
> > > lttng enable-event -s session -c ch1 -u -a
> > > lttng start
> > >
> > > Run a high event throughput application, which is multithreaded
> > > application>
> >
> > In the multithreaded application, each tracepoint will have the wall
> > time of the system,then we can easily reproduce this issue through
> > comparing the time of recorded event and the system wall time.
> >
> >
> > > babeltrace -i lttng-live net://*
> > >
> > > After a while, we found the timestamp of the event in the babeltrace is
> > > different with the time in host
> > > which run the application. And the delay will be bigger and bigger with
> > > time goes.
> > >
> > >
> > > Cheers
> > > > I checked the source code, found Babeltrace in live-reading mode
> > > will read
> > > > the recorded events in the CTF
> > > > files, and then parse and print it in a single thread. The process
> > > is a
> > > > little slow, do you have any ideas to
> > > > improve the process.
> >
> >
> > From my understanding of the source code, the process of parse and
> > print event will consume a lot of time. For example, the multithreaded
> > application will consume 3 CPUs, in a specified time,3 subbuffers will
> > be filled and sent to lttng-relayd daemon, recorded into the CTF files.
> > If in the specified time, babeltrace only handled 2 subbuffers' event,
> > thenthe issue will happens.
>
> Did you perform a bisection for where the delay come from? Reception of packet? formatting of event?
?
> What is the throughput of the application?
?
This is a key element if anyone want to test at home. It can be expressed in event/s, event/ms.
> How many tracepoint definition> Does babeltrace catch up if a quiescent period is given?
>
>
> I think so.
>
>
> Could you provide us with statistics, timing data, etc.?
>
> What type of delay are we talking about?
>
>
> For example, the system wall time is 5:18 PM now, but the Babeltrace is still printing events which
> have time 5:10 PM. Then stop running the application, the Babeltrace still need some times to print
> the recorded events.
>
> I think the root cause is that the parsing and printing process is a bit slow. So i want to know if there
> are any method to improve the performance of this process.
A simple experiment would be to run a bounded workload
to perform a quick timing evaluation of babeltrace with a streamed trace (on disk) and in live mode.
Calculate the time (time babeltrace path_to_streamed_trace) it takes babeltrace to read the on disk trace
and perform the same experiment with a live trace. It is important to bound your experiment
and have mostly the same workload in both experiment.
This will give us a rough estimate of the disparity between both scenario.
Cheers
More information about the lttng-dev
mailing list