[lttng-dev] Latency tracing using LTTng
anup.pemmaiah at gmail.com
Thu Dec 28 05:02:47 UTC 2017
On Wed, Dec 27, 2017 at 6:40 PM, Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:
> On Wed, Dec 27, 2017 at 05:21:30PM -0800, Anup Pemmaiah wrote:
> > Hi,
> > I am trying to find the reason of a intermittent latency that I am
> > on my Linux system running on a RT_PREEMPT kernel. I have patched it on a
> > Ubuntu 16.04 LTS. Have stopped the desktop lightdm service and this is
> on a
> Could we get more details on the version of the RT kernel?
The kernel and RT_PREEMPT patch is 4.4.38-rt49
> > Arm aarch64 platform.
> > At a very high level, there are two processes interacting with each
> > Process-1 does some work and sends UDP messages to Process-2 (Using the
> > FASTRTPS real time Publish-Subscribe protocol).
> Are you using the eProsima implementation? If so, could you give us more
> information on the tracing strategy?
Yes. In my situation, it is part of the ROS(http://www.ros.org/) stack. So
ROS middlewear uses eProsima underneath for publishing and subscribing. My
process sits on top of ROS stack and
uses it for message passing.
Right now I have not addigned at RT priorities. All tasks are running with
SCHED_OTHER with default prio 20.
I load the modules as follows
$echo 100000 > /sys/kernel/debug/latency/syscalls/default/threshold
$sudo lttng-sessiond --daemonize
$echo 1 > /sys/kernel/debug/latency/syscalls/default/tracking_on
Process-1 has a thread which keeps doing some work and publishing messages
every 100ms. I start Process-1 as
It gives some relevant information about few stack traces.
Timestamp Channel CPU Event type Contents Binary Location Function
16:13:32.971 988 199 u_5 5 lttng_profile:off_cpu_sample
overhead=313869, context._vtid=11713 11713
16:13:32.973 972 119 u_0 0 lttng_profile:off_cpu_sample
overhead=809697, context._vtid=11713 11713
16:13:32.973 981 912 u_4 4 lttng_profile:off_cpu_sample
16:13:32.973 999 544 u_3 3 lttng_profile:off_cpu_sample
/opt/ros2/lib/, overhead=1289108, context._vtid=11712 11712
Once in a while when I kill the process with <ctrl-c> it hangs but not
Process-2 has a thread which subscribes for the messages.
When I start LD_PRELOAD=liblttng-profile.so <Process-2>, it just hangs
> > Data is sent from Process-1 to Process-2 every 100ms. I see that often
> > there is a delay in Process-2 receiving the message.
> > Trying to use LTTng and along with other supported profiling tools like
> > latency-tracker, lttng-profile.
> > Something like the description in this link
> > http://lttng.org/blog/2015/03/18/full-stack-latencies is what I am
> > for, but I am facing issues. The latency_tracker kernel modules and also
> > lttng-profile is a bit flaky as it sometimes work and sometimes panics
> > kernel. Simple test cases as provided in
> > https://github.com/fdoray/latency_tracker_demo works sometimes. And also
> Not sure if those aged properly.
> If you did not already, I would recommend you take a look at the
> github page  that this project references and uses. It changed quite a
> in the last 3 years.
>  https://github.com/efficios/latency-tracker
Yes, that is where I downloaded the source and followed.
> > profiling Process-1 works by providing some useful traces(intermittently
> > kernel panics), but profiling Process-2 the process just hangs.
> What version of lttng-modules are you using? The latency_tracker_demo
> script seems to use lttng-modules 2.6. 2.6 is not supported anymore, it
> would be
> interesting to perform you experimentation with a newer version.
Sorry for not being clear in my earlier msg. I did not use the install
scripts. All the kernel mdoules, LTTng modules, latency-tracker modules and
user space libs
lttng-ust, lttng-tools, lttng-profile was downloaded from git hub few weeks
ago and build from source and loaded.
>From the latency_tracker_demo, I just built the sleep-demo and used the
trace.sh to do the tracing
> Still, could you provide the panics call stack?
I do not see a panic trace all the time. In the previous run, when I tried
to kill the Process-2 with "kill -9 <pid>", as it was hung, the system just
rebooted without panic.
This is from the syslog, and I see the same output on console
1613 Dec 27 16:53:37 kernel: [12566.627939] ring buffer relay-metadata: 0
records written, 0 records overrun
1614 Dec 27 16:53:37 kernel: [12566.634983] ring buffer relay-metadata,
cpu -1: non-consumed data
1615 Dec 27 16:53:37 kernel: [12566.634983] [ 12288 bytes written, 8192
bytes read ]
1616 Dec 27 16:53:37 kernel: [12566.646189] ring buffer: relay-metadata,
cpu -1: 8192 bytes committed
1617 Dec 27 16:53:37 kernel: [12566.652780] ring buffer relay-discard, cpu
0: 0 records written, 0 records overrun
1618 Dec 27 16:53:38 kernel: [12566.660865] ring buffer relay-discard, cpu
1: 0 records written, 0 records overrun
1619 Dec 27 16:53:38 kernel: [12566.668924] ring buffer relay-discard, cpu
2: 0 records written, 0 records overrun
1620 Dec 27 16:53:38 kernel: [12566.676897] ring buffer relay-discard, cpu
3: 0 records written, 0 records overrun
1621 Dec 27 16:53:38 kernel: [12566.684809] ring buffer relay-discard, cpu
4: 0 records written, 0 records overrun
1622 Dec 27 16:53:38 kernel: [12566.692681] ring buffer relay-discard, cpu
5: 0 records written, 0 records overrun
1623 Dec 27 16:54:16 avahi-daemon: Registering new address record for
10.31.33.48 on eth0.IPv4.
1624 Dec 27 16:54:16 rsyslogd-2007: action 'action 9' suspended, next
retry is Wed Dec 27 16:55:46 2017 [v8.16.0 t ry
1625 Dec 27 16:54:16 systemd-timesyncd: Network configuration
changed, trying to establish connection.
1626 Dec 27 16:54:16 systemd-timesyncd: Synchronized to time server
1627 Dec 27 16:54:17 rsyslogd: [origin software="rsyslogd"
swVersion="8.16.0" x-pid="924" x-info="http://www.rsysl og.com"] start
1628 Dec 27 16:54:17 rsyslogd-2222: command 'KLogPermitNonKernelFacility'
is currently not permitted - did you alr eady set it via a RainerScript
command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]
1629 Dec 27 16:54:17 rsyslogd: rsyslogd's groupid changed to 115
1630 Dec 27 16:54:17 rsyslogd: rsyslogd's userid changed to 108
1631 Dec 27 16:54:17 kernel: [ 0.000000] Booting Linux on physical CPU
1632 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup subsys
1633 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup subsys cpu
1634 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup subsys
1635 Dec 27 16:54:17 kernel: [ 0.000000] Linux version
4.4.38-rt49-tegra (anup.pemmaiah at anup-pemmaiah) (gcc ver sion 4.9.2
(GCC) ) #1 SMP PREEMPT RT Thu Nov 30 17:12:21 PST 2017
1636 Dec 27 16:54:17 kernel: [ 0.000000] Boot CPU: AArch64 Processor
1637 Dec 27 16:54:17 kernel: [ 0.000000] earlycon: Early serial console
> > Not sure how updated the above pasted links are, because I notice there
> > no more usec_threshold parameter and I use the threshold
> > in /sys/kernel/debug/latency/syscalls/default/threshold
> > I understand each situation is different, but was curious if any of you
> > have comments/suggestions that would help in debugging latency related
> > issues using LTTng. I have looked many places to get required info but of
> > not much help.
> Most of the maintainers are on vacation during this part of the year so it
> take some time to get answers. I'm sure that Julien Desfossez will be
> interested in your use case.
I undersatnd and any suggestions will be very helpful
> > Your comments/suggestions will be greatly appreciated.
> > Thanks
> > Anup
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> Jonathan Rajotte-Julien
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lttng-dev