[lttng-dev] Latency tracing using LTTng
jdesfossez at efficios.com
Wed Jan 10 21:56:35 UTC 2018
Sorry for the delay.
I just tried the examples from the blog post (it has been a while !), I
had to update liblttng-profile-base/module_abi.h in lttng-profile like that:
#define LTTNGPROFILE_PROC "syscalls"
And it works, I did not patch lttng-modules to get the kernel events,
but I see them in ftrace, I will rebase the
0001-connect-to-latency_tracker-tracepoints.patch from the
latency-tracker to a more recent version of lttng-modules and it should
work as is. I also see the stack traces in the user-space and kernel trace.
I also see the flakyness you are talking about when sometimes I launch
sleep-demo and nothing happens, I will try to have a look at that.
I did not observe any kernel panic or hang, but I just tried on an
x86-64 4.4 non-RT kernel, so there might be some differences here.
Now for the main problem, which seems related to Process-2 that never
works with lttng-profile, do you think you could investigate a bit more
(see if it gets registered in dmesg), maybe the LD_PRELOAD is
overridden/lost if it is started by a launcher script, etc ?
Ideally, if you could get a minimal C program that has the same
behavior, that would help us investigate this a bit more.
On 2018-01-04 02:05 PM, Anup Pemmaiah wrote:
> Was curious if any of the experts had any comments/suggestions on the
> issue I am facing.
> On Wed, Dec 27, 2017 at 9:02 PM, Anup Pemmaiah <anup.pemmaiah at gmail.com
> <mailto:anup.pemmaiah at gmail.com>> wrote:
> On Wed, Dec 27, 2017 at 6:40 PM, Jonathan Rajotte-Julien
> <jonathan.rajotte-julien at efficios.com
> <mailto:jonathan.rajotte-julien at efficios.com>> wrote:
> On Wed, Dec 27, 2017 at 05:21:30PM -0800, Anup Pemmaiah wrote:
> > Hi,
> > I am trying to find the reason of a intermittent latency that I am noticing
> > on my Linux system running on a RT_PREEMPT kernel. I have patched it on a
> > Ubuntu 16.04 LTS. Have stopped the desktop lightdm service and this is on a
> Could we get more details on the version of the RT kernel?
> The kernel and RT_PREEMPT patch is 4.4.38-rt49
> > Arm aarch64 platform.
> > At a very high level, there are two processes interacting with each other.
> > Process-1 does some work and sends UDP messages to Process-2 (Using the
> > FASTRTPS real time Publish-Subscribe protocol).
> Are you using the eProsima implementation? If so, could you give
> us more
> information on the tracing strategy?
> Yes. In my situation, it is part of the ROS(http://www.ros.org/)
> stack. So ROS middlewear uses eProsima underneath for publishing and
> subscribing. My process sits on top of ROS stack and
> uses it for message passing.
> Right now I have not addigned at RT priorities. All tasks are
> running with SCHED_OTHER with default prio 20.
> I load the modules as follows
> $modprobe latency_tracker
> $modprobe latency_tracker_syscalls
> $modprobe lttng-probe-latency_tracker
> $echo 100000 > /sys/kernel/debug/latency/syscalls/default/threshold
> Start LTTng
> $sudo lttng-sessiond --daemonize
> $echo 1 > /sys/kernel/debug/latency/syscalls/default/tracking_on
> Process-1 has a thread which keeps doing some work and publishing
> messages every 100ms. I start Process-1 as
> LD_PRELOAD=liblttng-profile.so <Process-1>
> It gives some relevant information about few stack traces.
> TimestampChannelCPUEvent typeContentsBinary LocationFunction
> LocationSource LocationTID
> 16:13:32.971 988
> overhead=313869, context._vtid=1171311713
> 16:13:32.973 972
> overhead=809697, context._vtid=1171311713
> 16:13:32.973 981
> /usr/lib/aarch64-linux-gnu/libstdc++.so., overhead=469267,
> 16:13:32.973 999
> /opt/ros2/lib/, overhead=1289108, context._vtid=1171211712
> Once in a while when I kill the process with <ctrl-c> it hangs but
> not always.
> Process-2 has a thread which subscribes for the messages.
> When I start LD_PRELOAD=liblttng-profile.so <Process-2>, it just hangs
> > Data is sent from Process-1 to Process-2 every 100ms. I see that often
> > there is a delay in Process-2 receiving the message.
> > Trying to use LTTng and along with other supported profiling tools like
> > latency-tracker, lttng-profile.
> > Something like the description in this link
> > http://lttng.org/blog/2015/03/18/full-stack-latencies
> <http://lttng.org/blog/2015/03/18/full-stack-latencies> is what
> I am looking
> > for, but I am facing issues. The latency_tracker kernel modules and also
> > lttng-profile is a bit flaky as it sometimes work and sometimes panics the
> > kernel. Simple test cases as provided in
> > https://github.com/fdoray/latency_tracker_demo
> <https://github.com/fdoray/latency_tracker_demo> works
> sometimes. And also
> Not sure if those aged properly.
> If you did not already, I would recommend you take a look at the
> github page  that this project references and uses. It
> changed quite a lot
> in the last 3 years.
>  https://github.com/efficios/latency-tracker
> Yes, that is where I downloaded the source and followed.
> > profiling Process-1 works by providing some useful traces(intermittently
> > kernel panics), but profiling Process-2 the process just hangs.
> What version of lttng-modules are you using? The
> latency_tracker_demo install
> script seems to use lttng-modules 2.6. 2.6 is not supported
> anymore, it would be
> interesting to perform you experimentation with a newer version.
> Sorry for not being clear in my earlier msg. I did not use the
> install scripts. All the kernel mdoules, LTTng modules,
> latency-tracker modules and user space libs
> lttng-ust, lttng-tools, lttng-profile was downloaded from git hub
> few weeks ago and build from source and loaded.
> From the latency_tracker_demo, I just built the sleep-demo and used
> the trace.sh to do the tracing
> Still, could you provide the panics call stack?
> I do not see a panic trace all the time. In the previous run, when I
> tried to kill the Process-2 with "kill -9 <pid>", as it was hung,
> the system just rebooted without panic.
> This is from the syslog, and I see the same output on console
> 1613 Dec 27 16:53:37 kernel: [12566.627939] ring buffer
> relay-metadata: 0 records written, 0 records overrun
> 1614 Dec 27 16:53:37 kernel: [12566.634983] ring buffer
> relay-metadata, cpu -1: non-consumed data
> 1615 Dec 27 16:53:37 kernel: [12566.634983] [ 12288 bytes
> written, 8192 bytes read ]
> 1616 Dec 27 16:53:37 kernel: [12566.646189] ring buffer:
> relay-metadata, cpu -1: 8192 bytes committed
> 1617 Dec 27 16:53:37 kernel: [12566.652780] ring buffer
> relay-discard, cpu 0: 0 records written, 0 records overrun
> 1618 Dec 27 16:53:38 kernel: [12566.660865] ring buffer
> relay-discard, cpu 1: 0 records written, 0 records overrun
> 1619 Dec 27 16:53:38 kernel: [12566.668924] ring buffer
> relay-discard, cpu 2: 0 records written, 0 records overrun
> 1620 Dec 27 16:53:38 kernel: [12566.676897] ring buffer
> relay-discard, cpu 3: 0 records written, 0 records overrun
> 1621 Dec 27 16:53:38 kernel: [12566.684809] ring buffer
> relay-discard, cpu 4: 0 records written, 0 records overrun
> 1622 Dec 27 16:53:38 kernel: [12566.692681] ring buffer
> relay-discard, cpu 5: 0 records written, 0 records overrun
> 1623 Dec 27 16:54:16 avahi-daemon: Registering new address
> record for 10.31.33.48 on eth0.IPv4.
> 1624 Dec 27 16:54:16 rsyslogd-2007: action 'action 9' suspended,
> next retry is Wed Dec 27 16:55:46 2017 [v8.16.0 t ry
> http://www.rsyslog.com/e/2007 ]
> 1625 Dec 27 16:54:16 systemd-timesyncd: Network configuration
> changed, trying to establish connection.
> 1626 Dec 27 16:54:16 systemd-timesyncd: Synchronized to time
> server 220.127.116.11:123 <http://18.104.22.168:123> (ntp.ubuntu.com
> 1627 Dec 27 16:54:17 rsyslogd: [origin software="rsyslogd"
> swVersion="8.16.0" x-pid="924" x-info="http://www.rsysl og.com
> <http://og.com>"] start
> 1628 Dec 27 16:54:17 rsyslogd-2222: command
> 'KLogPermitNonKernelFacility' is currently not permitted - did you
> alr eady set it via a RainerScript command (v6+ config)?
> [v8.16.0 try http://www.rsyslog.com/e/2222 ]
> 1629 Dec 27 16:54:17 rsyslogd: rsyslogd's groupid changed to 115
> 1630 Dec 27 16:54:17 rsyslogd: rsyslogd's userid changed to 108
> 1631 Dec 27 16:54:17 kernel: [ 0.000000] Booting Linux on
> physical CPU 0x100
> 1632 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup
> subsys cpuset
> 1633 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup
> subsys cpu
> 1634 Dec 27 16:54:17 kernel: [ 0.000000] Initializing cgroup
> subsys cpuacct
> 1635 Dec 27 16:54:17 kernel: [ 0.000000] Linux version
> 4.4.38-rt49-tegra (anup.pemmaiah at anup-pemmaiah) (gcc ver sion
> 4.9.2 (GCC) ) #1 SMP PREEMPT RT Thu Nov 30 17:12:21 PST 2017
> 1636 Dec 27 16:54:17 kernel: [ 0.000000] Boot CPU: AArch64
> Processor [411fd073]
> 1637 Dec 27 16:54:17 kernel: [ 0.000000] earlycon: Early serial
> console at MMIO32
> > Not sure how updated the above pasted links are, because I notice there is
> > no more usec_threshold parameter and I use the threshold
> > in /sys/kernel/debug/latency/syscalls/default/threshold
> > I understand each situation is different, but was curious if any of you
> > have comments/suggestions that would help in debugging latency related
> > issues using LTTng. I have looked many places to get required info but of
> > not much help.
> Most of the maintainers are on vacation during this part of the
> year so it might
> take some time to get answers. I'm sure that Julien Desfossez
> will be
> interested in your use case.
> I undersatnd and any suggestions will be very helpful
> > Your comments/suggestions will be greatly appreciated.
> > Thanks
> > Anup
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> Jonathan Rajotte-Julien
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
More information about the lttng-dev