[ltt-dev] calling lttctl_destroy_trace() hangs in 2.6.27-lttng-0.43

Mathieu Desnoyers compudj at krystal.dyndns.org
Fri Oct 24 10:54:59 EDT 2008


* Andrew McDermott (andrew.mcdermott at windriver.com) wrote:
> 
> Hi,
> 
> I have a combined lttd that has logic in it for stopping the trace and
> destroying the channels.  i.e., I don't use lttctl to stop the trace;
> the logic that is `lttctl -R -n <trace>' is in my modified lttd.  The
> rationale for this is that it is too expensive in certain confgurations
> to call `lttctl -R' on my embedded target(s).
> 
> My modified version of the daemon calls lttctl_stop() and then
> lttctl_destroy_trace() when a SIGTERM or SIGINT is received but I find
> that in the latest versions (2.6.27-lttng-0.43 and ltt-control-0.55) the
> sendto/recvfrom pair in lttctl_destroy_trace() hangs during sendto().
> 
> Here's a trace of lttd and the receipt of SIGTERM.  Note: the first
> sendto/recvfrom is the call to lttctl_stop().
> 
>     poll(
>     [{fd=6, events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN|POLLPRI}, {fd=19, events=POLLIN|POLLPRI}, {fd=21, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN|POLLPRI}, {fd=25, events=POLLIN|POLLPRI}, {fd=27, events=POLLIN|POLLPRI}, {fd=29, events=POLLIN|POLLPRI}], 12, -1) = ? ERESTART_RESTARTBLOCK (To be restarted)
>     --- SIGTERM (Terminated) @ 0 (0) ---
>     sigreturn()                             = ? (mask now [])
>     sendto(4, "0\2\0\0\21\0\1\0\0\0\0\0\37\26\0\0wrsv-trace\0\0\0\0\0"..., 560, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 560
>     recvfrom(4, "$\0\0\0\2\0\0\0\0\0\0\0\37\26\0\0\0\0\0\0000\2\0\0\21\0"..., 580, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
>     sendto(4, "0\2\0\0\21\0\1\0\0\0\0\0\37\26\0\0wrsv-trace\0\0\0\0\0"...,
>     560, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12
> 
> [ Note: The call to sendto() int lttctl_destroy_trace() never returns. ]
> 
> However, if I issue an `lttctl -R -n <trace>' then the sendto/recvfrom
> completes without hanging and the daemon exits as you would expect.
> 
> I noticed that in ltt/ltt-tracer.c the following comment/code:
> 
> 	/*
> 	 * Wait for lttd readers to release the files, therefore making sure
> 	 * the last subbuffers have been read.
> 	 */
> 	if (atomic_read(&trace->kref.refcount) > 1) {
> 		int ret = 0;
> 		__wait_event_interruptible(trace->kref_wq,
> 			(atomic_read(&trace->kref.refcount) == 1), ret);
> 	}
> 
> If I comment out this block then the sendto/recvfrom no longer hangs.
> 
> So, I have a few questions:
> 
>  1. What prevents the sendto() completing when using
>     lttctl_destroy_trace() from within lttd where the trace channels are
>     still open?
> 

lttd still holds the buffers for reading. This code makes sure lttctl
waits for the readers to release the buffers before it continues. This
is very important when lttctl happens to be run in batch mode (e.g. in
autotest) and we have to know when the trace has been fully consumed
before we restart the machine.

>  2. Is there anything I can do (e.g. close the channels first before
>     calling lttctl_destroy_trace()) to prevent the sendto hanging?  (I
>     experimented with this approach but I'm pretty sure I would be
>     missing any outstanding data in the kernel buffers on exit.)

Hrm, yes, you'd miss the last subbuffer switch.

> 
>  3. What is the implication of leaving this code commented out?
> 

You won't have the feature I told about above, but given you are doing
everything within lttd, I guess you don't worry so much about it.

>  4. If I comment this block out will I get the notified that there is
>     data still to be copied from the kernel buffers?

This block is really just there to make sure the control path blocks
until there are no readers left. The "nice way" to do it would be to
create a separate lttd thread that would do the destroy. This would not
block the rest of the reader threads, let them complete consuming the
subbuffers. That would unblock the control thread and everything would
be fine.

Mathieu

> 
> Thanks,
> Andy.
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list