[ltt-dev] lttctl locks up with RT Linux

jpaul at gdrs.com jpaul at gdrs.com
Thu Apr 22 18:25:01 EDT 2010


Hey Mathieu:

Thanks for looking at this. I'm a bit new to debugging at this level, so
you may need to provide me a bit more info on what you need. I attempted
to use "pstack" on the lttctl and lttd tasks ... no luck as pstack also
locked up.

I put a bit of tracing into liblttctl and discovered the lockup occurs
when a write of "traceName" (whatever traceName happens to be) occurs to
the "/mnt/debugfs/ltt/destroy_trace" file.

I'm guessing that you would like some tracing of the ltt kernel module.
Is there something that I can turn on, or another way I could get a
stack dump of that module after lockup?  I'll do a little research this
weekend on kernel debugging techniques.

I can certainly sprinkle in some printk statements in the ltt kernel
module source. Doing provided the following info:

- Control entered _ltt_trace_destroy (single underscore)
- Control entered del_timer_sync(&ltt_async_wakeup_timer) and never
returned

Does that help, or should I continue farther down this path?

Thanks

JP

-----Original Message-----
From: Mathieu Desnoyers [mailto:compudj at krystal.dyndns.org] 
Sent: Thursday, April 22, 2010 12:06 PM
To: John P. Paul
Cc: ltt-dev at lists.casi.polymtl.ca
Subject: Re: [ltt-dev] lttctl locks up with RT Linux

* jpaul at gdrs.com (jpaul at gdrs.com) wrote:
> Greetings:
> 
> I'm using a a 2.6.33.2 kernel. I have LTT up and running on the plain
vanilla kernel, but "lttctl -D trace1" never returns on the RT version
of the same kernel. I've downloaded and integrated the following pieces:
> 
> patch-2.6.33.2-lttng-0.211
> ltt-control-0.84-07042010
> lttv-0.12.31.04072010
> 
> Note that I've had to manually apply several of the patches from the
patch file. I can provide a list if desired.
> 
> After the lockup, I can do an ls on the /tmp/trace directory and see
that the following files have a non-zero length (remaining files in the
trace directory have a zero length):
> 
> fs_0, fs_1, kernel_0, kernel_1
> 
> I'm running on an Intel Core2 Duo system. I've built all the LTT
components into the kernel, so I do not have to load any modules at
runtime. I do execute an ltt-armall prior to issuing any "lttctl -C -w
/tmp/trace trace1" commands.
> 
> When the above occurs, I usually have to hard power down the machine
as a root issued "reboot" does not reboot the machine (even after trying
to kill the running ltt processes).
> 
> Any suggestions on how to get this working under the RT kernel would
be appreciated. Does LTT even function properly for RT kernels? If not,
it would be of great benefit to have it do so.  Please let me know if
additional debug info would be helpful. 

I bet there is something fishy on RT with __ltt_trace_destroy(). Having
an output of where the CPU is stalled in lttng code would help.


> 
> A couple additional notes:
> 
> - LTTV docs state that it requires glib 2.4 or greater. I believe this
is incorrect due to the following dependency:
> 
> $ rpm -qa glib2
> glib2-2.12.3-4.el5_3.1  << my default glib (RHEL5.x base)
> 
> state.c: In function 'copy_process_state':
> state.c:1344: error: 'GHashTableIter' undeclared (first use in this
function)
> 
> I've installed glib-2.22.5 to get around the above issue.

OK, the dependency seems to be glib 2.16 now. Will update the README
and LTTng manual accordingly.

Thanks,

Mathieu



--
This is an e-mail from General Dynamics Robotic Systems. It is for the intended recipient only and may contain confidential and privileged information. No one else may read, print, store, copy, forward or act in reliance on it or its attachments. If you are not the intended recipient, please return this message to the sender and delete the message and any attachments from your computer. Your cooperation is appreciated.





More information about the lttng-dev mailing list