[lttng-dev] Latency tracing using LTTng

Thu Dec 28 05:02:47 UTC 2017

On Wed, Dec 27, 2017 at 6:40 PM, Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:

> Hi,
>
> On Wed, Dec 27, 2017 at 05:21:30PM -0800, Anup Pemmaiah wrote:
> > Hi,
> >
> > I am trying to find the reason of a intermittent latency that I am
> noticing
> > on my Linux system running on a RT_PREEMPT kernel. I have patched it on a
> > Ubuntu 16.04 LTS. Have stopped the desktop lightdm service and this is
> on a
>
> Could we get more details on the version of the RT kernel?
> Version/patchset?
>

The kernel and RT_PREEMPT patch is 4.4.38-rt49

>
> > Arm aarch64 platform.
> >
> > At a very high level, there are two processes interacting with each
> other.
> > Process-1 does some work and sends UDP messages to Process-2 (Using the
> > FASTRTPS real time Publish-Subscribe protocol).
>
> Are you using the eProsima implementation? If so, could you give us more
> information on the tracing strategy?
>

Yes. In my situation, it is part of the ROS(http://www.ros.org/) stack. So
ROS middlewear uses eProsima underneath for publishing and subscribing. My
process sits on top of ROS stack and
uses it for message passing.

Right now I have not addigned at RT priorities. All tasks are running with
SCHED_OTHER with default prio 20.

I load the modules as follows

$modprobe latency_tracker

$modprobe latency_tracker_syscalls

$modprobe lttng-probe-latency_tracker

$echo 100000 > /sys/kernel/debug/latency/syscalls/default/threshold

Start LTTng

$sudo lttng-sessiond --daemonize

$echo 1 > /sys/kernel/debug/latency/syscalls/default/tracking_on

Process-1 has a thread which keeps doing some work and publishing messages
every 100ms. I start Process-1 as

LD_PRELOAD=liblttng-profile.so <Process-1>

It gives some relevant information about few stack traces.

Timestamp Channel CPU Event type Contents Binary Location Function
Location Source
Location TID
16:13:32.971 988 199 u_5 5 lttng_profile:off_cpu_sample
stack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
[0x7f98233238]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne14input_socket_c11read_packetERNS_8packet_sE+0x160)
[0x7f9826e418]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZN8velodyne8driver_c16read_from_socketEv+0x30)
[0x7f9826c390]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8fu,
overhead=313869, context._vtid=11713 11713

16:13:32.973 972 119 u_0 0 lttng_profile:off_cpu_sample
stack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
[0x7f98233238]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne14input_socket_c11read_packetERNS_8packet_sE+0x160)
[0x7f9826e418]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZN8velodyne8driver_c16read_from_socketEv+0x30)
[0x7f9826c390]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8fu,
overhead=809697, context._vtid=11713 11713

16:13:32.973 981 912 u_4 4 lttng_profile:off_cpu_sample
stack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
[0x7f98233238]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8functionIFvvEE+0x94)
[0x7f9826c4a4]
/home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN8velodyne8driver_cEKFvSt8functionIFvvEEEEPS4_St5_BindIFS2_IMS4_FvvEESB_EEEEE6_M_runEv+0x8c)
[0x7f9826d9fc]
/usr/lib/aarch64-linux-gnu/libstdc++.so., overhead=469267,
context._vtid=11715 11715

16:13:32.973 999 544 u_3 3 lttng_profile:off_cpu_sample
stack=/lib/aarch64-linux-gnu/libpthread.so.0(sendmsg+0x2c)
[0x7f9823303c]
/opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14UDPv4Transport17SendThroughSocketEPKhjRKNS1_9Locator_tERN4asio21basic_datagram_socketINS8_2ip3udpEEE+0xf8)
[0x7f9647ad88]
/opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14UDPv4Transport4SendEPKhjRKNS1_9Locator_tES7_+0x158)
[0x7f9647c3e0]
/opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14SenderResource4SendEPKhjRKNS1_9Locator_tE+0x4c)
[0x7f9644b58c]
/opt/ros2/lib/, overhead=1289108, context._vtid=11712 11712

Once in a while when I kill the process with <ctrl-c> it hangs but not
always.

Process-2 has a thread which subscribes for the messages.

When I start LD_PRELOAD=liblttng-profile.so <Process-2>, it just hangs

>
> > Data is sent from Process-1 to Process-2 every 100ms. I see that often
> > there is a delay in Process-2 receiving the message.
> >
> >
> > Trying to use LTTng and along with other supported profiling tools like
> > latency-tracker, lttng-profile.
> > Something like the description in this link
> > http://lttng.org/blog/2015/03/18/full-stack-latencies is what I am
> looking
> > for, but I am facing issues. The latency_tracker kernel modules and also
> > lttng-profile is a bit flaky as it sometimes work and sometimes panics
> the
> > kernel.  Simple test cases as provided in
> > https://github.com/fdoray/latency_tracker_demo works sometimes. And also
>
> Not sure if those aged properly.
>
> If you did not already, I would recommend you take a look at the
> latency-tracker
> github page [1] that this project references and uses. It changed quite a
> lot
> in the last 3 years.
>
> [1] https://github.com/efficios/latency-tracker

Yes, that is where I downloaded the source and followed.

>
>
>
> > profiling Process-1 works by providing some useful traces(intermittently
> > kernel panics), but profiling Process-2 the process just hangs.
>
> What version of lttng-modules are you using? The latency_tracker_demo
> install
> script seems to use lttng-modules 2.6. 2.6 is not supported anymore, it
> would be
> interesting to perform you experimentation with a newer version.
>

Sorry for not being clear in my earlier msg. I did not use the install
scripts. All the kernel mdoules, LTTng modules, latency-tracker modules and
user space libs
lttng-ust, lttng-tools, lttng-profile was downloaded from git hub few weeks
ago and build from source and loaded.

>From the latency_tracker_demo, I just built the sleep-demo and used the
trace.sh to do the tracing

>
> Still, could you provide the panics call stack?
>

I do not see a panic trace all the time. In the previous run, when I tried
to kill the Process-2 with "kill -9 <pid>", as it was hung, the system just
rebooted without panic.

This is from the syslog, and I see the same output on console

1613 Dec 27 16:53:37  kernel: [12566.627939] ring buffer relay-metadata: 0
records written, 0 records overrun
1614 Dec 27 16:53:37  kernel: [12566.634983] ring buffer relay-metadata,
cpu -1: non-consumed data
1615 Dec 27 16:53:37  kernel: [12566.634983]   [ 12288 bytes written, 8192
bytes read ]
1616 Dec 27 16:53:37  kernel: [12566.646189] ring buffer: relay-metadata,
cpu -1: 8192 bytes committed
1617 Dec 27 16:53:37  kernel: [12566.652780] ring buffer relay-discard, cpu
0: 0 records written, 0 records overrun
1618 Dec 27 16:53:38  kernel: [12566.660865] ring buffer relay-discard, cpu
1: 0 records written, 0 records overrun
1619 Dec 27 16:53:38  kernel: [12566.668924] ring buffer relay-discard, cpu
2: 0 records written, 0 records overrun
1620 Dec 27 16:53:38  kernel: [12566.676897] ring buffer relay-discard, cpu
3: 0 records written, 0 records overrun
1621 Dec 27 16:53:38  kernel: [12566.684809] ring buffer relay-discard, cpu
4: 0 records written, 0 records overrun
1622 Dec 27 16:53:38  kernel: [12566.692681] ring buffer relay-discard, cpu
5: 0 records written, 0 records overrun
1623 Dec 27 16:54:16  avahi-daemon[948]: Registering new address record for
10.31.33.48 on eth0.IPv4.
1624 Dec 27 16:54:16  rsyslogd-2007: action 'action 9' suspended, next
retry is Wed Dec 27 16:55:46 2017 [v8.16.0 t     ry
http://www.rsyslog.com/e/2007 ]
1625 Dec 27 16:54:16  systemd-timesyncd[816]: Network configuration
changed, trying to establish connection.
1626 Dec 27 16:54:16  systemd-timesyncd[816]: Synchronized to time server
91.189.94.4:123 (ntp.ubuntu.com).
1627 Dec 27 16:54:17  rsyslogd: [origin software="rsyslogd"
swVersion="8.16.0" x-pid="924" x-info="http://www.rsysl     og.com"] start
1628 Dec 27 16:54:17  rsyslogd-2222: command 'KLogPermitNonKernelFacility'
is currently not permitted - did you alr     eady set it via a RainerScript
command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]
1629 Dec 27 16:54:17  rsyslogd: rsyslogd's groupid changed to 115
1630 Dec 27 16:54:17  rsyslogd: rsyslogd's userid changed to 108
1631 Dec 27 16:54:17  kernel: [    0.000000] Booting Linux on physical CPU
0x100
1632 Dec 27 16:54:17  kernel: [    0.000000] Initializing cgroup subsys
cpuset
1633 Dec 27 16:54:17  kernel: [    0.000000] Initializing cgroup subsys cpu
1634 Dec 27 16:54:17  kernel: [    0.000000] Initializing cgroup subsys
cpuacct
1635 Dec 27 16:54:17  kernel: [    0.000000] Linux version
4.4.38-rt49-tegra (anup.pemmaiah at anup-pemmaiah) (gcc ver     sion 4.9.2
(GCC) ) #1 SMP PREEMPT RT Thu Nov 30 17:12:21 PST 2017
1636 Dec 27 16:54:17  kernel: [    0.000000] Boot CPU: AArch64 Processor
[411fd073]
1637 Dec 27 16:54:17  kernel: [    0.000000] earlycon: Early serial console
at MMIO32

>
> >
> > Not sure how updated the above pasted links are, because I notice there
> is
> > no more usec_threshold parameter and I use the threshold
> > in /sys/kernel/debug/latency/syscalls/default/threshold
> >
> > I understand each situation is different, but was curious if  any of you
> > have comments/suggestions that would help in debugging latency related
> > issues using LTTng. I have looked many places to get required info but of
> > not much help.
>
> Most of the maintainers are on vacation during this part of the year so it
> might
> take some time to get answers. I'm sure that Julien Desfossez will be
> interested in your use case.
>
> Cheers
>

I undersatnd and any suggestions will be very helpful

>
> >
> > Your comments/suggestions will be greatly appreciated.
> >
> > Thanks
> > Anup
>
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20171227/ecc7c053/attachment-0001.html>