[lttng-dev] TMF: inconsistencies in the context vtid reporting

Sébastien Barthélémy barthelemy at crans.org
Fri Mar 22 06:02:44 EDT 2013


Hello,

I'm using simultaneous kernel and userspace tracing in order to
understand how a multi threaded application (mis)behaves at shutdown.

The idea is to use userspace tracepoints in the application, augmented
with the vtid context, in order to know which thread is calling what
and to relate that with the threads lifespan as reconstructed from the
kernel scheduler traces.

First, is the approach ok? Am I right to expect the vtid attached to
ust event to match the tid shown in the reconstructed kernel scheduler
state?
Should I expect "time shifts" between the kernel and userspace traces
(which would make the correlation harder).

I was hoping to see the unix signals in the kernel trace too but could
not find them so far.

So now, the problem is that I get inconsistencies: some UST events
report a vtid which is shown as waiting by the kernel trace (both in
the Control Flow and the Resources views). How could a waiting thread
trigger an UST event?

I believe my kernel trace is complete, since the following command
shows no output.

    $ babeltrace lttng-traces/auto-20130321-172308/kernel/ > /dev/null

My userspace trace is not complete. But I believe it does not matter,
because the missing events are seconds before the time window I'm
interested in, and AFAIK, there is no state reconstruction involved
(but may be there is?).

I eventually came to a more blatant inconsistency: the displayed
context vtid of an event is not always the same.

See the two attached screenshots. The same UST event is highlighted,
displayed once as part of a single UST trace and then in an experiment
(same UST trace + kernel trace)

the reported vtid is not the same: 15443 and 15411 respectively.

babeltrace goes for 15443:

    $ babeltrace
lttng-traces/auto-20130321-172308/ust/naoqi-bin-15411-20130321-172659/
| grep  "17:27:13.883622748"
    [warning] Tracer discarded 165 events between [17:27:00.328855374]
and [17:27:00.328857812]. You should consider recording a new trace
with larger buffers or with fewer events enabled.
    [17:27:13.883622748] (+0.000087543) ald-0987-de:naoqi-bin:15411
qi_alcommon:module_metacall_stop: { cpu_id = 1 }, { vtid = 15443, vpid
= 15411 }, { methodId = 34, moduleName = "ALMotion", methodName =
"getAngles::(mb)" }

I cannot tell it for sure but I think I also saw the same event in the
experiment view with the wrong vtid.

I'm using TMF 2.0.0.201303211612 (nightly build from yesterday).

And by the way, thank your for the support I already got on IRC
yesterday, you're a dream project to work with.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tmf_exp.png
Type: image/png
Size: 192266 bytes
Desc: not available
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20130322/6cc74401/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tmf_trace.png
Type: image/png
Size: 182554 bytes
Desc: not available
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20130322/6cc74401/attachment-0003.png>


More information about the lttng-dev mailing list