[ltt-dev] calling lttctl_destroy_trace() hangs in 2.6.27-lttng-0.43
Andrew McDermott
andrew.mcdermott at windriver.com
Wed Oct 22 07:57:04 EDT 2008
Hi,
I have a combined lttd that has logic in it for stopping the trace and
destroying the channels. i.e., I don't use lttctl to stop the trace;
the logic that is `lttctl -R -n <trace>' is in my modified lttd. The
rationale for this is that it is too expensive in certain confgurations
to call `lttctl -R' on my embedded target(s).
My modified version of the daemon calls lttctl_stop() and then
lttctl_destroy_trace() when a SIGTERM or SIGINT is received but I find
that in the latest versions (2.6.27-lttng-0.43 and ltt-control-0.55) the
sendto/recvfrom pair in lttctl_destroy_trace() hangs during sendto().
Here's a trace of lttd and the receipt of SIGTERM. Note: the first
sendto/recvfrom is the call to lttctl_stop().
poll(
[{fd=6, events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN|POLLPRI}, {fd=19, events=POLLIN|POLLPRI}, {fd=21, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN|POLLPRI}, {fd=25, events=POLLIN|POLLPRI}, {fd=27, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}], 12, -1) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn() = ? (mask now [])
sendto(4, "0\2\0\0\21\0\1\0\0\0\0\0\37\26\0\0wrsv-trace\0\0\0\0\0"..., 560, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 560
recvfrom(4, "$\0\0\0\2\0\0\0\0\0\0\0\37\26\0\0\0\0\0\0000\2\0\0\21\0"..., 580, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
sendto(4, "0\2\0\0\21\0\1\0\0\0\0\0\37\26\0\0wrsv-trace\0\0\0\0\0"...,
560, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12
[ Note: The call to sendto() int lttctl_destroy_trace() never returns. ]
However, if I issue an `lttctl -R -n <trace>' then the sendto/recvfrom
completes without hanging and the daemon exits as you would expect.
I noticed that in ltt/ltt-tracer.c the following comment/code:
/*
* Wait for lttd readers to release the files, therefore making sure
* the last subbuffers have been read.
*/
if (atomic_read(&trace->kref.refcount) > 1) {
int ret = 0;
__wait_event_interruptible(trace->kref_wq,
(atomic_read(&trace->kref.refcount) == 1), ret);
}
If I comment out this block then the sendto/recvfrom no longer hangs.
So, I have a few questions:
1. What prevents the sendto() completing when using
lttctl_destroy_trace() from within lttd where the trace channels are
still open?
2. Is there anything I can do (e.g. close the channels first before
calling lttctl_destroy_trace()) to prevent the sendto hanging? (I
experimented with this approach but I'm pretty sure I would be
missing any outstanding data in the kernel buffers on exit.)
3. What is the implication of leaving this code commented out?
4. If I comment this block out will I get the notified that there is
data still to be copied from the kernel buffers?
Thanks,
Andy.
More information about the lttng-dev
mailing list