[lttng-dev] Profiling LTTng tracepoint latency on different arm platforms

Sun Sep 10 10:18:31 EDT 2023

Hey Mathieu,

We see that upon recording a tracepoint, there are multiple stages of reserve-commit-write,

where atomics and shared memory accesses take up a big part of the recording time,

we're wondering, is there a "light-mode" of recording a tracepoint involving less logic or

a mode which can potentially have lower latency?

Also, are there any recent docs to share regarding tracepoint latency?

Regards,

Anas.

________________________________
From: Yitschak, Yehuda
Sent: Wednesday, June 21, 2023 5:21:35 PM
To: Mathieu Desnoyers; Mousa, Anas; lttng-dev at lists.lttng.org
Subject: RE: [EXTERNAL][lttng-dev] Profiling LTTng tracepoint latency on different arm platforms

> On 6/21/23 01:39, Yitschak, Yehuda wrote:
> >> On 6/20/23 10:20, Mathieu Desnoyers via lttng-dev wrote:
> >>> On 6/20/23 06:27, Mousa, Anas via lttng-dev wrote:
> >>>> Hello,
> >>>
> >>>>
> >>>>
> >>
> Arethereanysuggestionstorootcausethehighlatencyandpotentiallyimprovei
> >> to
> >> n*platform****1*?
> >>>>
> >>>> Thanks and best regards,
> >>>>
> >>>> Anas.
> >>>>
> >>>
> >>> I recommend using "perf" when tracing with the sample program in a
> >>> loop to figure out the hot spots. With that information on the "fast"
> >>> and "slow" system, we might be able to figure out what differs.
> >>>
> >>> Also, comparing the kernel configurations of the two systems can help.
> >>> Also comparing the glibc versions of the two systems would be relevant.
> >>>
> >>
> >> Also make sure you benchmark the lttng "snapshot" mode [1] to make
> >> sure you don't run into a situation where the disk/network I/O
> >> throughput cannot cope with the generated event throughput, thus
> >> causing the ring buffer to discard events. This would therefore
> >> "speed up" tracing from the application perspective because
> >> discarding an event is faster than writing it to a ring buffer.
> >
> > You mean we should avoid the "discard" loss mode and use "overwrite"
> loss mode since discard mode can fake fast performance ?
>
> Yes. In addition to use "overwrite-when-buffer-full" mode, the "snapshot"
> session also ensures that no consumer daemon extracts the trace data
> (unless an explicit snapshot record is performed), which allows comparing
> the ring buffer producer performance with minimal noise.
>
> If you really want to benchmark the discard-when-buffer-full mode and the
> the consumer daemon I/O behavior, then you need to take into account
> event discarded counts and the actual trace data size that was written to
> disk.

Since you mentioned this, is there any "stat" command which lists events such as discards and disk writes, etc ?
I looked this up in the past but couldn't find anything

>
> Thanks,
>
> Mathieu
>
> >
> >>
> >> Thanks,
> >>
> >> Mathieu
> >>
> >> [1] https://lttng.org/docs/v2.13/#doc-taking-a-snapshot
> >>
> >>> Thanks,
> >>>
> >>> Mathieu
> >>>
> >>>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
> >>
> >> _______________________________________________
> >> lttng-dev mailing list
> >> lttng-dev at lists.lttng.org
> >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20230910/f4cec0e0/attachment.htm>