[ltt-dev] calling lttctl_destroy_trace() hangs in 2.6.27-lttng-0.43

Wed Oct 22 07:57:04 EDT 2008

Hi,

I have a combined lttd that has logic in it for stopping the trace and
destroying the channels.  i.e., I don't use lttctl to stop the trace;
the logic that is `lttctl -R -n <trace>' is in my modified lttd.  The
rationale for this is that it is too expensive in certain confgurations
to call `lttctl -R' on my embedded target(s).

My modified version of the daemon calls lttctl_stop() and then
lttctl_destroy_trace() when a SIGTERM or SIGINT is received but I find
that in the latest versions (2.6.27-lttng-0.43 and ltt-control-0.55) the
sendto/recvfrom pair in lttctl_destroy_trace() hangs during sendto().

Here's a trace of lttd and the receipt of SIGTERM.  Note: the first
sendto/recvfrom is the call to lttctl_stop().

    poll(
    [{fd=6, events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN|POLLPRI}, {fd=19, events=POLLIN|POLLPRI}, {fd=21, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN|POLLPRI}, {fd=25, events=POLLIN|POLLPRI}, {fd=27, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}], 12, -1) = ? ERESTART_RESTARTBLOCK (To be restarted)
    --- SIGTERM (Terminated) @ 0 (0) ---
    sigreturn()                             = ? (mask now [])
    sendto(4, "0\2\0\0\21\0\1\0\0\0\0\0\37\26\0\0wrsv-trace\0\0\0\0\0"..., 560, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 560
    recvfrom(4, "$\0\0\0\2\0\0\0\0\0\0\0\37\26\0\0\0\0\0\0000\2\0\0\21\0"..., 580, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
    sendto(4, "0\2\0\0\21\0\1\0\0\0\0\0\37\26\0\0wrsv-trace\0\0\0\0\0"...,
    560, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12

[ Note: The call to sendto() int lttctl_destroy_trace() never returns. ]

However, if I issue an `lttctl -R -n <trace>' then the sendto/recvfrom
completes without hanging and the daemon exits as you would expect.

I noticed that in ltt/ltt-tracer.c the following comment/code:

	/*
	 * Wait for lttd readers to release the files, therefore making sure
	 * the last subbuffers have been read.
	 */
	if (atomic_read(&trace->kref.refcount) > 1) {
		int ret = 0;
		__wait_event_interruptible(trace->kref_wq,
			(atomic_read(&trace->kref.refcount) == 1), ret);
	}

If I comment out this block then the sendto/recvfrom no longer hangs.

So, I have a few questions:

 1. What prevents the sendto() completing when using
    lttctl_destroy_trace() from within lttd where the trace channels are
    still open?

 2. Is there anything I can do (e.g. close the channels first before
    calling lttctl_destroy_trace()) to prevent the sendto hanging?  (I
    experimented with this approach but I'm pretty sure I would be
    missing any outstanding data in the kernel buffers on exit.)

 3. What is the implication of leaving this code commented out?

 4. If I comment this block out will I get the notified that there is
    data still to be copied from the kernel buffers?

Thanks,
Andy.