[lttng-dev] Latency tracing using LTTng
Julien Desfossez
jdesfossez at efficios.com
Wed Jan 10 21:56:35 UTC 2018
Hi Anup,
Sorry for the delay.
I just tried the examples from the blog post (it has been a while !), I
had to update liblttng-profile-base/module_abi.h in lttng-profile like that:
#define LTTNGPROFILE_PROC "syscalls"
And it works, I did not patch lttng-modules to get the kernel events,
but I see them in ftrace, I will rebase the
0001-connect-to-latency_tracker-tracepoints.patch from the
latency-tracker to a more recent version of lttng-modules and it should
work as is. I also see the stack traces in the user-space and kernel trace.
I also see the flakyness you are talking about when sometimes I launch
sleep-demo and nothing happens, I will try to have a look at that.
I did not observe any kernel panic or hang, but I just tried on an
x86-64 4.4 non-RT kernel, so there might be some differences here.
Now for the main problem, which seems related to Process-2 that never
works with lttng-profile, do you think you could investigate a bit more
(see if it gets registered in dmesg), maybe the LD_PRELOAD is
overridden/lost if it is started by a launcher script, etc ?
Ideally, if you could get a minimal C program that has the same
behavior, that would help us investigate this a bit more.
Thanks,
Julien
On 2018-01-04 02:05 PM, Anup Pemmaiah wrote:
> Hi,
>
> Was curious if any of the experts had any comments/suggestions on the
> issue I am facing.
>
> Thanks
> Anup
>
> On Wed, Dec 27, 2017 at 9:02 PM, Anup Pemmaiah <anup.pemmaiah at gmail.com
> <mailto:anup.pemmaiah at gmail.com>> wrote:
>
>
>
> On Wed, Dec 27, 2017 at 6:40 PM, Jonathan Rajotte-Julien
> <jonathan.rajotte-julien at efficios.com
> <mailto:jonathan.rajotte-julien at efficios.com>> wrote:
>
> Hi,
>
> On Wed, Dec 27, 2017 at 05:21:30PM -0800, Anup Pemmaiah wrote:
> > Hi,
> >
> > I am trying to find the reason of a intermittent latency that I am noticing
> > on my Linux system running on a RT_PREEMPT kernel. I have patched it on a
> > Ubuntu 16.04 LTS. Have stopped the desktop lightdm service and this is on a
>
> Could we get more details on the version of the RT kernel?
> Version/patchset?
>
>
> The kernel and RT_PREEMPT patch is 4.4.38-rt49
>
>
>
> > Arm aarch64 platform.
> >
> > At a very high level, there are two processes interacting with each other.
> > Process-1 does some work and sends UDP messages to Process-2 (Using the
> > FASTRTPS real time Publish-Subscribe protocol).
>
> Are you using the eProsima implementation? If so, could you give
> us more
> information on the tracing strategy?
>
>
>
>
> Yes. In my situation, it is part of the ROS(http://www.ros.org/)
> stack. So ROS middlewear uses eProsima underneath for publishing and
> subscribing. My process sits on top of ROS stack and
> uses it for message passing.
>
>
> Right now I have not addigned at RT priorities. All tasks are
> running with SCHED_OTHER with default prio 20.
>
> I load the modules as follows
>
> $modprobe latency_tracker
>
> $modprobe latency_tracker_syscalls
>
> $modprobe lttng-probe-latency_tracker
>
> $echo 100000 > /sys/kernel/debug/latency/syscalls/default/threshold
>
> Start LTTng
>
> $sudo lttng-sessiond --daemonize
>
> $echo 1 > /sys/kernel/debug/latency/syscalls/default/tracking_on
>
>
> Process-1 has a thread which keeps doing some work and publishing
> messages every 100ms. I start Process-1 as
>
> LD_PRELOAD=liblttng-profile.so <Process-1>
>
> It gives some relevant information about few stack traces.
>
> TimestampChannelCPUEvent typeContentsBinary LocationFunction
> LocationSource LocationTID
> 16:13:32.971 988
> 199u_55lttng_profile:off_cpu_samplestack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
> [0x7f98233238]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne14input_socket_c11read_packetERNS_8packet_sE+0x160)
> [0x7f9826e418]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZN8velodyne8driver_c16read_from_socketEv+0x30)
> [0x7f9826c390]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8fu,
> overhead=313869, context._vtid=1171311713
>
> 16:13:32.973 972
> 119u_00lttng_profile:off_cpu_samplestack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
> [0x7f98233238]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne14input_socket_c11read_packetERNS_8packet_sE+0x160)
> [0x7f9826e418]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZN8velodyne8driver_c16read_from_socketEv+0x30)
> [0x7f9826c390]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8fu,
> overhead=809697, context._vtid=1171311713
>
> 16:13:32.973 981
> 912u_44lttng_profile:off_cpu_samplestack=/lib/aarch64-linux-gnu/libpthread.so.0(__nanosleep+0x28)
> [0x7f98233238]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNK8velodyne8driver_c3runESt8functionIFvvEE+0x94)
> [0x7f9826c4a4]
> /home/nvidia/workspace/test_672/grand_central/apex_ws/install/lib/libvelodyne_driver.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt7_Mem_fnIMN8velodyne8driver_cEKFvSt8functionIFvvEEEEPS4_St5_BindIFS2_IMS4_FvvEESB_EEEEE6_M_runEv+0x8c)
> [0x7f9826d9fc]
> /usr/lib/aarch64-linux-gnu/libstdc++.so., overhead=469267,
> context._vtid=1171511715
>
> 16:13:32.973 999
> 544u_33lttng_profile:off_cpu_samplestack=/lib/aarch64-linux-gnu/libpthread.so.0(sendmsg+0x2c)
> [0x7f9823303c]
> /opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14UDPv4Transport17SendThroughSocketEPKhjRKNS1_9Locator_tERN4asio21basic_datagram_socketINS8_2ip3udpEEE+0xf8)
> [0x7f9647ad88]
> /opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14UDPv4Transport4SendEPKhjRKNS1_9Locator_tES7_+0x158)
> [0x7f9647c3e0]
> /opt/ros2/lib/libfastrtps.so.1(_ZN8eprosima8fastrtps4rtps14SenderResource4SendEPKhjRKNS1_9Locator_tE+0x4c)
> [0x7f9644b58c]
> /opt/ros2/lib/, overhead=1289108, context._vtid=1171211712
>
> Once in a while when I kill the process with <ctrl-c> it hangs but
> not always.
>
> Process-2 has a thread which subscribes for the messages.
>
> When I start LD_PRELOAD=liblttng-profile.so <Process-2>, it just hangs
>
>
>
>
> > Data is sent from Process-1 to Process-2 every 100ms. I see that often
> > there is a delay in Process-2 receiving the message.
> >
> >
> > Trying to use LTTng and along with other supported profiling tools like
> > latency-tracker, lttng-profile.
> > Something like the description in this link
> > http://lttng.org/blog/2015/03/18/full-stack-latencies
> <http://lttng.org/blog/2015/03/18/full-stack-latencies> is what
> I am looking
> > for, but I am facing issues. The latency_tracker kernel modules and also
> > lttng-profile is a bit flaky as it sometimes work and sometimes panics the
> > kernel. Simple test cases as provided in
> > https://github.com/fdoray/latency_tracker_demo
> <https://github.com/fdoray/latency_tracker_demo> works
> sometimes. And also
>
> Not sure if those aged properly.
>
> If you did not already, I would recommend you take a look at the
> latency-tracker
> github page [1] that this project references and uses. It
> changed quite a lot
> in the last 3 years.
>
> [1] https://github.com/efficios/latency-tracker
> <https://github.com/efficios/latency-tracker>
>
>
> Yes, that is where I downloaded the source and followed.
>
>
>
>
> > profiling Process-1 works by providing some useful traces(intermittently
> > kernel panics), but profiling Process-2 the process just hangs.
>
> What version of lttng-modules are you using? The
> latency_tracker_demo install
> script seems to use lttng-modules 2.6. 2.6 is not supported
> anymore, it would be
> interesting to perform you experimentation with a newer version.
>
>
> Sorry for not being clear in my earlier msg. I did not use the
> install scripts. All the kernel mdoules, LTTng modules,
> latency-tracker modules and user space libs
> lttng-ust, lttng-tools, lttng-profile was downloaded from git hub
> few weeks ago and build from source and loaded.
>
> From the latency_tracker_demo, I just built the sleep-demo and used
> the trace.sh to do the tracing
>
>
>
> Still, could you provide the panics call stack?
>
>
> I do not see a panic trace all the time. In the previous run, when I
> tried to kill the Process-2 with "kill -9 <pid>", as it was hung,
> the system just rebooted without panic.
>
>
> This is from the syslog, and I see the same output on console
>
> 1613 Dec 27 16:53:37 kernel: [12566.627939] ring buffer
> relay-metadata: 0 records written, 0 records overrun
> 1614 Dec 27 16:53:37 kernel: [12566.634983] ring buffer
> relay-metadata, cpu -1: non-consumed data
> 1615 Dec 27 16:53:37 kernel: [12566.634983] [ 12288 bytes
> written, 8192 bytes read ]
> 1616 Dec 27 16:53:37 kernel: [12566.646189] ring buffer:
> relay-metadata, cpu -1: 8192 bytes committed
> 1617 Dec 27 16:53:37 kernel: [12566.652780] ring buffer
> relay-discard, cpu 0: 0 records written, 0 records overrun
> 1618 Dec 27 16:53:38 kernel: [12566.660865] ring buffer
> relay-discard, cpu 1: 0 records written, 0 records overrun
> 1619 Dec 27 16:53:38 kernel: [12566.668924] ring buffer
> relay-discard, cpu 2: 0 records written, 0 records overrun
> 1620 Dec 27 16:53:38 kernel: [12566.676897] ring buffer
> relay-discard, cpu 3: 0 records written, 0 records overrun
> 1621 Dec 27 16:53:38 kernel: [12566.684809] ring buffer
> relay-discard, cpu 4: 0 records written, 0 records overrun
> 1622 Dec 27 16:53:38 kernel: [12566.692681] ring buffer
> relay-discard, cpu 5: 0 records written, 0 records overrun
> 1623 Dec 27 16:54:16 avahi-daemon[948]: Registering new address
> record for 10.31.33.48 on eth0.IPv4.
> 1624 Dec 27 16:54:16 rsyslogd-2007: action 'action 9' suspended,
> next retry is Wed Dec 27 16:55:46 2017 [v8.16.0 t ry
> http://www.rsyslog.com/e/2007 ]
> 1625 Dec 27 16:54:16 systemd-timesyncd[816]: Network configuration
> changed, trying to establish connection.
> 1626 Dec 27 16:54:16 systemd-timesyncd[816]: Synchronized to time
> server 91.189.94.4:123 <http://91.189.94.4:123> (ntp.ubuntu.com
> <http://ntp.ubuntu.com>).
> 1627 Dec 27 16:54:17 rsyslogd: [origin software="rsyslogd"
> swVersion="8.16.0" x-pid="924" x-info="http://www.rsysl og.com
> <http://og.com>"] start
> 1628 Dec 27 16:54:17 rsyslogd-2222: command
> 'KLogPermitNonKernelFacility' is currently not permitted - did you
> alr eady set it via a RainerScript command (v6+ config)?
> [v8.16.0 try http://www.rsyslog.com/e/2222 ]
> 1629 Dec 27 16:54:17 rsyslogd: rsyslogd's groupid changed to 115
> 1630 Dec 27 16:54:17 rsyslogd: rsyslogd's userid changed to 108
> 1631 Dec 27 16:54:17 kernel: [ 0.000000] Booting Linux on
> physical CPU 0x100
> 1632 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup
> subsys cpuset
> 1633 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup
> subsys cpu
> 1634 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup
> subsys cpuacct
> 1635 Dec 27 16:54:17 kernel: [ 0.000000] Linux version
> 4.4.38-rt49-tegra (anup.pemmaiah at anup-pemmaiah) (gcc ver sion
> 4.9.2 (GCC) ) #1 SMP PREEMPT RT Thu Nov 30 17:12:21 PST 2017
> 1636 Dec 27 16:54:17 kernel: [ 0.000000] Boot CPU: AArch64
> Processor [411fd073]
> 1637 Dec 27 16:54:17 kernel: [ 0.000000] earlycon: Early serial
> console at MMIO32
>
>
>
> >
> > Not sure how updated the above pasted links are, because I notice there is
> > no more usec_threshold parameter and I use the threshold
> > in /sys/kernel/debug/latency/syscalls/default/threshold
> >
> > I understand each situation is different, but was curious if any of you
> > have comments/suggestions that would help in debugging latency related
> > issues using LTTng. I have looked many places to get required info but of
> > not much help.
>
> Most of the maintainers are on vacation during this part of the
> year so it might
> take some time to get answers. I'm sure that Julien Desfossez
> will be
> interested in your use case.
>
> Cheers
>
>
> I undersatnd and any suggestions will be very helpful
>
>
> >
> > Your comments/suggestions will be greatly appreciated.
> >
> > Thanks
> > Anup
>
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
>
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>
>
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
More information about the lttng-dev
mailing list