[ltt-dev] lttctl locks up with RT Linux
Mathieu Desnoyers
compudj at krystal.dyndns.org
Tue May 11 07:59:39 EDT 2010
* jpaul at gdrs.com (jpaul at gdrs.com) wrote:
> Hey Mathieu:
>
> Thanks for looking at this. I'm a bit new to debugging at this level, so
> you may need to provide me a bit more info on what you need. I attempted
> to use "pstack" on the lttctl and lttd tasks ... no luck as pstack also
> locked up.
>
> I put a bit of tracing into liblttctl and discovered the lockup occurs
> when a write of "traceName" (whatever traceName happens to be) occurs to
> the "/mnt/debugfs/ltt/destroy_trace" file.
>
> I'm guessing that you would like some tracing of the ltt kernel module.
> Is there something that I can turn on, or another way I could get a
> stack dump of that module after lockup? I'll do a little research this
> weekend on kernel debugging techniques.
>
> I can certainly sprinkle in some printk statements in the ltt kernel
> module source. Doing provided the following info:
>
> - Control entered _ltt_trace_destroy (single underscore)
> - Control entered del_timer_sync(<t_async_wakeup_timer) and never
> returned
>
> Does that help, or should I continue farther down this path?
Can you try the following patch to see if it fixes your problem ?
lttng fix rt kernel teardown deadlock
LTTng has a teardown bug on RT (deadlock):
Deleting a timer (sync) while holding the traces mutex, and the handler takes
the same mutex, which leads to a deadlock.
Fix this by taking a RCU read lock in the timer (instead of the RT-specific fix
using the mutex), and by doing synchronize_rcu() in addition to
synchronize_sched() upon updates.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
---
ltt/ltt-tracer.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
Index: linux-2.6-lttng/ltt/ltt-tracer.c
===================================================================
--- linux-2.6-lttng.orig/ltt/ltt-tracer.c 2010-05-11 07:50:46.000000000 -0400
+++ linux-2.6-lttng/ltt/ltt-tracer.c 2010-05-11 07:55:46.000000000 -0400
@@ -41,6 +41,14 @@
#include <linux/vmalloc.h>
#include <asm/atomic.h>
+static void synchronize_trace(void)
+{
+ synchronize_sched();
+#ifdef CONFIG_PREEMPT_RT
+ synchronize_rcu();
+#endif
+}
+
static void async_wakeup(unsigned long data);
static DEFINE_TIMER(ltt_async_wakeup_timer, async_wakeup, 0, 0);
@@ -321,7 +329,7 @@ void ltt_module_unregister(enum ltt_modu
ltt_filter_unregister();
ltt_run_filter_owner = NULL;
/* Wait for preempt sections to finish */
- synchronize_sched();
+ synchronize_trace();
break;
case LTT_FUNCTION_FILTER_CONTROL:
ltt_filter_control_functor = ltt_filter_control_default;
@@ -429,13 +437,13 @@ static void async_wakeup(unsigned long d
* PREEMPT_RT does not allow spinlocks to be taken within preempt
* disable sections (spinlock taken in wake_up). However, mainline won't
* allow mutex to be taken in interrupt context. Ugly.
- * A proper way to do this would be to turn the timer into a
- * periodically woken up thread, but it adds to the footprint.
+ * Take a standard RCU read lock for RT kernels, which imply that we
+ * also have to synchronize_rcu() upon updates.
*/
#ifndef CONFIG_PREEMPT_RT
rcu_read_lock_sched();
#else
- ltt_lock_traces();
+ rcu_read_lock();
#endif
list_for_each_entry_rcu(trace, <t_traces.head, list) {
trace_async_wakeup(trace);
@@ -443,7 +451,7 @@ static void async_wakeup(unsigned long d
#ifndef CONFIG_PREEMPT_RT
rcu_read_unlock_sched();
#else
- ltt_unlock_traces();
+ rcu_read_unlock();
#endif
mod_timer(<t_async_wakeup_timer, jiffies + LTT_PERCPU_TIMER_INTERVAL);
@@ -901,7 +909,7 @@ int ltt_trace_alloc(const char *trace_na
set_kernel_trace_flag_all_tasks();
}
list_add_rcu(&trace->list, <t_traces.head);
- synchronize_sched();
+ synchronize_trace();
ltt_unlock_traces();
@@ -974,7 +982,7 @@ static int _ltt_trace_destroy(struct ltt
}
/* Everything went fine */
list_del_rcu(&trace->list);
- synchronize_sched();
+ synchronize_trace();
if (list_empty(<t_traces.head)) {
clear_kernel_trace_flag_all_tasks();
/*
@@ -1195,7 +1203,7 @@ static int _ltt_trace_stop(struct ltt_tr
trace->nr_channels);
trace->active = 0;
ltt_traces.num_active_traces--;
- synchronize_sched(); /* Wait for each tracing to be finished */
+ synchronize_trace(); /* Wait for each tracing to be finished */
}
module_put(ltt_run_filter_owner);
/* Everything went fine */
@@ -1327,12 +1335,12 @@ static void __exit ltt_exit(void)
list_for_each_entry_rcu(trace, <t_traces.head, list)
_ltt_trace_stop(trace);
/* Wait for quiescent state. Readers have preemption disabled. */
- synchronize_sched();
+ synchronize_trace();
/* Safe iteration is now permitted. It does not have to be RCU-safe
* because no readers are left. */
list_for_each_safe(pos, n, <t_traces.head) {
trace = container_of(pos, struct ltt_trace, list);
- /* _ltt_trace_destroy does a synchronize_sched() */
+ /* _ltt_trace_destroy does a synchronize_trace() */
_ltt_trace_destroy(trace);
__ltt_trace_destroy(trace);
}
>
> Thanks
>
> JP
>
> -----Original Message-----
> From: Mathieu Desnoyers [mailto:compudj at krystal.dyndns.org]
> Sent: Thursday, April 22, 2010 12:06 PM
> To: John P. Paul
> Cc: ltt-dev at lists.casi.polymtl.ca
> Subject: Re: [ltt-dev] lttctl locks up with RT Linux
>
> * jpaul at gdrs.com (jpaul at gdrs.com) wrote:
> > Greetings:
> >
> > I'm using a a 2.6.33.2 kernel. I have LTT up and running on the plain
> vanilla kernel, but "lttctl -D trace1" never returns on the RT version
> of the same kernel. I've downloaded and integrated the following pieces:
> >
> > patch-2.6.33.2-lttng-0.211
> > ltt-control-0.84-07042010
> > lttv-0.12.31.04072010
> >
> > Note that I've had to manually apply several of the patches from the
> patch file. I can provide a list if desired.
> >
> > After the lockup, I can do an ls on the /tmp/trace directory and see
> that the following files have a non-zero length (remaining files in the
> trace directory have a zero length):
> >
> > fs_0, fs_1, kernel_0, kernel_1
> >
> > I'm running on an Intel Core2 Duo system. I've built all the LTT
> components into the kernel, so I do not have to load any modules at
> runtime. I do execute an ltt-armall prior to issuing any "lttctl -C -w
> /tmp/trace trace1" commands.
> >
> > When the above occurs, I usually have to hard power down the machine
> as a root issued "reboot" does not reboot the machine (even after trying
> to kill the running ltt processes).
> >
> > Any suggestions on how to get this working under the RT kernel would
> be appreciated. Does LTT even function properly for RT kernels? If not,
> it would be of great benefit to have it do so. Please let me know if
> additional debug info would be helpful.
>
> I bet there is something fishy on RT with __ltt_trace_destroy(). Having
> an output of where the CPU is stalled in lttng code would help.
>
>
> >
> > A couple additional notes:
> >
> > - LTTV docs state that it requires glib 2.4 or greater. I believe this
> is incorrect due to the following dependency:
> >
> > $ rpm -qa glib2
> > glib2-2.12.3-4.el5_3.1 << my default glib (RHEL5.x base)
> >
> > state.c: In function 'copy_process_state':
> > state.c:1344: error: 'GHashTableIter' undeclared (first use in this
> function)
> >
> > I've installed glib-2.22.5 to get around the above issue.
>
> OK, the dependency seems to be glib 2.16 now. Will update the README
> and LTTng manual accordingly.
>
> Thanks,
>
> Mathieu
>
>
>
> --
> This is an e-mail from General Dynamics Robotic Systems. It is for the intended recipient only and may contain confidential and privileged information. No one else may read, print, store, copy, forward or act in reliance on it or its attachments. If you are not the intended recipient, please return this message to the sender and delete the message and any attachments from your computer. Your cooperation is appreciated.
>
>
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list