[lttng-dev] Latency tracing using LTTng

Thu Jan 4 19:05:26 UTC 2018

Hi,

Was curious if any of the experts had any comments/suggestions on the issue
I am facing.

Thanks
Anup

On Wed, Dec 27, 2017 at 9:02 PM, Anup Pemmaiah <anup.pemmaiah at gmail.com>
wrote:

>
>
> On Wed, Dec 27, 2017 at 6:40 PM, Jonathan Rajotte-Julien <
> jonathan.rajotte-julien at efficios.com> wrote:
>
>> Hi,
>>
>> On Wed, Dec 27, 2017 at 05:21:30PM -0800, Anup Pemmaiah wrote:
>> > Hi,
>> >
>> > I am trying to find the reason of a intermittent latency that I am
>> noticing
>> > on my Linux system running on a RT_PREEMPT kernel. I have patched it on
>> a
>> > Ubuntu 16.04 LTS. Have stopped the desktop lightdm service and this is
>> on a
>>
>> Could we get more details on the version of the RT kernel?
>> Version/patchset?
>>
>
> The kernel and RT_PREEMPT patch is 4.4.38-rt49
>
>
>>
>> > Arm aarch64 platform.
>> >
>> > At a very high level, there are two processes interacting with each
>> other.
>> > Process-1 does some work and sends UDP messages to Process-2 (Using the
>> > FASTRTPS real time Publish-Subscribe protocol).
>>
>> Are you using the eProsima implementation? If so, could you give us more
>> information on the tracing strategy?
>>
>
>
>
> Yes. In my situation, it is part of the ROS(http://www.ros.org/) stack.
> So ROS middlewear uses eProsima underneath for publishing and subscribing.
> My process sits on top of ROS stack and
> uses it for message passing.
>
>
> Right now I have not addigned at RT priorities. All tasks are running with
> SCHED_OTHER with default prio 20.
>
> I load the modules as follows
>
> $modprobe latency_tracker
>
> $modprobe latency_tracker_syscalls
>
> $modprobe lttng-probe-latency_tracker
>
> $echo 100000 > /sys/kernel/debug/latency/syscalls/default/threshold
>
> Start LTTng
>
> $sudo lttng-sessiond --daemonize
>
> $echo 1 > /sys/kernel/debug/latency/syscalls/default/tracking_on
>
>
> Process-1 has a thread which keeps doing some work and publishing messages
> every 100ms. I start Process-1 as
>
> LD_PRELOAD=liblttng-profile.so <Process-1>
>
> It gives some relevant information about few stack traces.
>
> Timestamp Channel CPU Event type Contents Binary Location Function
> Location Source Location TID
> 16:13:32.971 988 199 u_5 5 lttng_profile:off_cpu_sample
> stack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
> [0x7f98233238]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZNK8velodyne14input_socket_c11r
> ead_packetERNS_8packet_sE+0x160) [0x7f9826e418]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZN8velodyne8driver_c16read_from_socketEv+0x30)
> [0x7f9826c390]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8fu,
> overhead=313869, context._vtid=11713 11713
>
> 16:13:32.973 972 119 u_0 0 lttng_profile:off_cpu_sample
> stack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
> [0x7f98233238]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZNK8velodyne14input_socket_c11r
> ead_packetERNS_8packet_sE+0x160) [0x7f9826e418]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZN8velodyne8driver_c16read_from_socketEv+0x30)
> [0x7f9826c390]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8fu,
> overhead=809697, context._vtid=11713 11713
>
> 16:13:32.973 981 912 u_4 4 lttng_profile:off_cpu_sample
> stack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
> [0x7f98233238]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8functionIFvvEE+0x94)
> [0x7f9826c4a4]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/instal
> l/lib/libvelodyne_driver.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_
> fnIMN8velodyne8driver_cEKFvSt8functionIFvvEEEEPS4_St5_
> BindIFS2_IMS4_FvvEESB_EEEEE6_M_runEv+0x8c) [0x7f9826d9fc]
> /usr/lib/aarch64-linux-gnu/libstdc++.so., overhead=469267,
> context._vtid=11715 11715
>
> 16:13:32.973 999 544 u_3 3 lttng_profile:off_cpu_sample
> stack=/lib/aarch64-linux-gnu/libpthread.so.0(sendmsg+0x2c) [0x7f9823303c]
> /opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14U
> DPv4Transport17SendThroughSocketEPKhjRKNS1_9Locator_tERN4asi
> o21basic_datagram_socketINS8_2ip3udpEEE+0xf8) [0x7f9647ad88]
> /opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14U
> DPv4Transport4SendEPKhjRKNS1_9Locator_tES7_+0x158) [0x7f9647c3e0]
> /opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14S
> enderResource4SendEPKhjRKNS1_9Locator_tE+0x4c) [0x7f9644b58c]
> /opt/ros2/lib/, overhead=1289108, context._vtid=11712 11712
>
> Once in a while when I kill the process with <ctrl-c> it hangs but not
> always.
>
> Process-2 has a thread which subscribes for the messages.
>
> When I start LD_PRELOAD=liblttng-profile.so <Process-2>, it just hangs
>
>
>
>>
>> > Data is sent from Process-1 to Process-2 every 100ms. I see that often
>> > there is a delay in Process-2 receiving the message.
>> >
>> >
>> > Trying to use LTTng and along with other supported profiling tools like
>> > latency-tracker, lttng-profile.
>> > Something like the description in this link
>> > http://lttng.org/blog/2015/03/18/full-stack-latencies is what I am
>> looking
>> > for, but I am facing issues. The latency_tracker kernel modules and also
>> > lttng-profile is a bit flaky as it sometimes work and sometimes panics
>> the
>> > kernel.  Simple test cases as provided in
>> > https://github.com/fdoray/latency_tracker_demo works sometimes. And
>> also
>>
>> Not sure if those aged properly.
>>
>> If you did not already, I would recommend you take a look at the
>> latency-tracker
>> github page [1] that this project references and uses. It changed quite a
>> lot
>> in the last 3 years.
>>
>> [1] https://github.com/efficios/latency-tracker
>
>
> Yes, that is where I downloaded the source and followed.
>
>>
>>
>>
>> > profiling Process-1 works by providing some useful traces(intermittently
>> > kernel panics), but profiling Process-2 the process just hangs.
>>
>> What version of lttng-modules are you using? The latency_tracker_demo
>> install
>> script seems to use lttng-modules 2.6. 2.6 is not supported anymore, it
>> would be
>> interesting to perform you experimentation with a newer version.
>>
>
> Sorry for not being clear in my earlier msg. I did not use the install
> scripts. All the kernel mdoules, LTTng modules, latency-tracker modules and
> user space libs
> lttng-ust, lttng-tools, lttng-profile was downloaded from git hub few
> weeks ago and build from source and loaded.
>
> From the latency_tracker_demo, I just built the sleep-demo and used the
> trace.sh to do the tracing
>
>
>>
>> Still, could you provide the panics call stack?
>>
>
> I do not see a panic trace all the time. In the previous run, when I tried
> to kill the Process-2 with "kill -9 <pid>", as it was hung, the system just
> rebooted without panic.
>
>
> This is from the syslog, and I see the same output on console
>
> 1613 Dec 27 16:53:37  kernel: [12566.627939] ring buffer relay-metadata: 0
> records written, 0 records overrun
> 1614 Dec 27 16:53:37  kernel: [12566.634983] ring buffer relay-metadata,
> cpu -1: non-consumed data
> 1615 Dec 27 16:53:37  kernel: [12566.634983]   [ 12288 bytes written, 8192
> bytes read ]
> 1616 Dec 27 16:53:37  kernel: [12566.646189] ring buffer: relay-metadata,
> cpu -1: 8192 bytes committed
> 1617 Dec 27 16:53:37  kernel: [12566.652780] ring buffer relay-discard,
> cpu 0: 0 records written, 0 records overrun
> 1618 Dec 27 16:53:38  kernel: [12566.660865] ring buffer relay-discard,
> cpu 1: 0 records written, 0 records overrun
> 1619 Dec 27 16:53:38  kernel: [12566.668924] ring buffer relay-discard,
> cpu 2: 0 records written, 0 records overrun
> 1620 Dec 27 16:53:38  kernel: [12566.676897] ring buffer relay-discard,
> cpu 3: 0 records written, 0 records overrun
> 1621 Dec 27 16:53:38  kernel: [12566.684809] ring buffer relay-discard,
> cpu 4: 0 records written, 0 records overrun
> 1622 Dec 27 16:53:38  kernel: [12566.692681] ring buffer relay-discard,
> cpu 5: 0 records written, 0 records overrun
> 1623 Dec 27 16:54:16  avahi-daemon[948]: Registering new address record
> for 10.31.33.48 on eth0.IPv4.
> 1624 Dec 27 16:54:16  rsyslogd-2007: action 'action 9' suspended, next
> retry is Wed Dec 27 16:55:46 2017 [v8.16.0 t     ry
> http://www.rsyslog.com/e/2007 ]
> 1625 Dec 27 16:54:16  systemd-timesyncd[816]: Network configuration
> changed, trying to establish connection.
> 1626 Dec 27 16:54:16  systemd-timesyncd[816]: Synchronized to time server
> 91.189.94.4:123 (ntp.ubuntu.com).
> 1627 Dec 27 16:54:17  rsyslogd: [origin software="rsyslogd"
> swVersion="8.16.0" x-pid="924" x-info="http://www.rsysl     og.com"]
> start
> 1628 Dec 27 16:54:17  rsyslogd-2222: command 'KLogPermitNonKernelFacility'
> is currently not permitted - did you alr     eady set it via a RainerScript
> command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]
> 1629 Dec 27 16:54:17  rsyslogd: rsyslogd's groupid changed to 115
> 1630 Dec 27 16:54:17  rsyslogd: rsyslogd's userid changed to 108
> 1631 Dec 27 16:54:17  kernel: [    0.000000] Booting Linux on physical CPU
> 0x100
> 1632 Dec 27 16:54:17  kernel: [    0.000000] Initializing cgroup subsys
> cpuset
> 1633 Dec 27 16:54:17  kernel: [    0.000000] Initializing cgroup subsys cpu
> 1634 Dec 27 16:54:17  kernel: [    0.000000] Initializing cgroup subsys
> cpuacct
> 1635 Dec 27 16:54:17  kernel: [    0.000000] Linux version
> 4.4.38-rt49-tegra (anup.pemmaiah at anup-pemmaiah) (gcc ver     sion 4.9.2
> (GCC) ) #1 SMP PREEMPT RT Thu Nov 30 17:12:21 PST 2017
> 1636 Dec 27 16:54:17  kernel: [    0.000000] Boot CPU: AArch64 Processor
> [411fd073]
> 1637 Dec 27 16:54:17  kernel: [    0.000000] earlycon: Early serial
> console at MMIO32
>
>
>>
>> >
>> > Not sure how updated the above pasted links are, because I notice there
>> is
>> > no more usec_threshold parameter and I use the threshold
>> > in /sys/kernel/debug/latency/syscalls/default/threshold
>> >
>> > I understand each situation is different, but was curious if  any of you
>> > have comments/suggestions that would help in debugging latency related
>> > issues using LTTng. I have looked many places to get required info but
>> of
>> > not much help.
>>
>> Most of the maintainers are on vacation during this part of the year so
>> it might
>> take some time to get answers. I'm sure that Julien Desfossez will be
>> interested in your use case.
>>
>> Cheers
>>
>
> I undersatnd and any suggestions will be very helpful
>
>>
>> >
>> > Your comments/suggestions will be greatly appreciated.
>> >
>> > Thanks
>> > Anup
>>
>> > _______________________________________________
>> > lttng-dev mailing list
>> > lttng-dev at lists.lttng.org
>> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>> --
>> Jonathan Rajotte-Julien
>> EfficiOS
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20180104/f3bba227/attachment-0001.html>