[ltt-dev] lttctl locks up with RT Linux

jpaul at gdrs.com jpaul at gdrs.com
Tue May 11 10:50:03 EDT 2010


Thanks Mathieu. I'm going to have to look at this a bit more. This may have fixed one problem and cause another as "lttctl -C -w /tmp/trace trace1" does not return and I'm getting a stack dump in the messages file (see below). The last msg I see from that "lttctl -C" command is "lttctl: Creating trace". The "lttctl: Forking lttd" is not displayed.

JP

May 11 10:05:00 ans-ebb3 kernel:  P5
May 11 10:05:00 ans-ebb3 kernel: INFO: RCU detected CPU stalls: P5 (detected by 0, t=10002 jiffies)
May 11 10:05:00 ans-ebb3 kernel: sending NMI to all CPUs:
May 11 10:05:00 ans-ebb3 kernel: NMI backtrace for cpu 1
May 11 10:05:00 ans-ebb3 kernel: Modules linked in: battery sbshc ehci_hcd sbs scsi_dh ohci_hcd dm_multipath uhci_hcd scsi_mod dm_mirror sd_mod libata ahci dm_mod dm_log dm_region_hash tpm_bios tpm intel_agp iTCO_vendor_support i2c_core pcspkr tpm_tis iTCO_wdt i2c_i801 autofs4 e1000e button serio_raw sg output video ac parport lp parport_pc ipv6 nvidia(P) [last unloaded: nvidia]
May 11 10:05:00 ans-ebb3 kernel: 
May 11 10:05:00 ans-ebb3 kernel: Pid: 0, comm: swapper Tainted: P            2.6.33.2-rt13.11.trace #10 PXT    /To Be Filled By O.E.M.
May 11 10:05:00 ans-ebb3 kernel: EIP: 0060:[<c062e4fa>] EFLAGS: 00000246 CPU: 1
May 11 10:05:00 ans-ebb3 kernel: EIP is at poll_idle+0x1e/0x5c
May 11 10:05:00 ans-ebb3 kernel: EAX: f74bc000 EBX: 6b655cfc ECX: 6b655cfc EDX: 000001a2
May 11 10:05:00 ans-ebb3 kernel: ESI: 000001a2 EDI: 00000000 EBP: f74bcf84 ESP: f74bcf7c
May 11 10:05:00 ans-ebb3 kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 preempt:04010002
May 11 10:05:00 ans-ebb3 kernel: Process swapper (pid: 0, ti=f74bc000 task=f74d90c0 task.ti=f74bc000)
May 11 10:05:00 ans-ebb3 kernel: Stack:
May 11 10:05:00 ans-ebb3 kernel:  f765b02c f765b01c f74bcf94 c062e7c9 00000001 c0877c90 f74bcfa4 c0401c0d
May 11 10:05:00 ans-ebb3 kernel: <0> 00000001 00000000 f74bcfb0 c06ca177 01020800 00000000 00000000 00000000
May 11 10:05:00 ans-ebb3 kernel: <0> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 000000d8
May 11 10:05:00 ans-ebb3 kernel: Call Trace:
May 11 10:05:00 ans-ebb3 kernel:  [<c062e7c9>] ? cpuidle_idle_call+0x6c/0xdf
May 11 10:05:00 ans-ebb3 kernel:  [<c0401c0d>] ? cpu_idle+0x49/0x80
May 11 10:05:00 ans-ebb3 kernel:  [<c06ca177>] ? start_secondary+0x1b9/0x1be
May 11 10:05:00 ans-ebb3 kernel: Code: ff ff ff 8d 65 f4 89 d8 5b 5e 5f 5d c3 55 89 e5 56 53 e8 7b 61 e2 ff 89 c3 89 d6 fb 89 e0 25 00 f0 ff ff eb 02 f3 90 f6 40 08 08 <74> f8 e8 60 61 e2 ff 29 d8 19 f2 e8 95 b6 e0 ff b9 40 42 0f 00 
May 11 10:05:00 ans-ebb3 kernel: Call Trace:
May 11 10:05:00 ans-ebb3 kernel:  [<c062e7c9>] cpuidle_idle_call+0x6c/0xdf
May 11 10:05:00 ans-ebb3 kernel:  [<c0401c0d>] cpu_idle+0x49/0x80
May 11 10:05:00 ans-ebb3 kernel:  [<c06ca177>] start_secondary+0x1b9/0x1be
May 11 10:05:00 ans-ebb3 kernel: Pid: 0, comm: swapper Tainted: P            2.6.33.2-rt13.11.trace #10
May 11 10:05:00 ans-ebb3 kernel: Call Trace:
May 11 10:05:00 ans-ebb3 kernel:  [<c04019d4>] ? show_regs+0x1a/0x20
May 11 10:05:00 ans-ebb3 kernel:  [<c06d0d7a>] nmi_watchdog_tick+0x9b/0x162
May 11 10:05:00 ans-ebb3 kernel:  [<c06d02e1>] do_nmi+0xd1/0x2e1
May 11 10:05:00 ans-ebb3 kernel:  [<c056355d>] ? ltt_chan_for_each_channel+0x35/0x4f
May 11 10:05:00 ans-ebb3 kernel:  [<c05628df>] ? ltt_chanbuf_idle_switch+0x0/0x1c
May 11 10:05:00 ans-ebb3 kernel:  [<c06cfd41>] nmi_stack_correct+0x28/0x2d
May 11 10:05:00 ans-ebb3 kernel:  [<c062e4fa>] ? poll_idle+0x1e/0x5c
May 11 10:05:00 ans-ebb3 kernel:  [<c062e7c9>] cpuidle_idle_call+0x6c/0xdf
May 11 10:05:00 ans-ebb3 kernel:  [<c0401c0d>] cpu_idle+0x49/0x80
May 11 10:05:00 ans-ebb3 kernel:  [<c06ca177>] start_secondary+0x1b9/0x1be
May 11 10:05:00 ans-ebb3 kernel: NMI backtrace for cpu 0
May 11 10:05:00 ans-ebb3 kernel: Modules linked in: battery sbshc ehci_hcd sbs scsi_dh ohci_hcd dm_multipath uhci_hcd scsi_mod dm_mirror sd_mod libata ahci dm_mod dm_log dm_region_hash tpm_bios tpm intel_agp iTCO_vendor_support i2c_core pcspkr tpm_tis iTCO_wdt i2c_i801 autofs4 e1000e button serio_raw sg output video ac parport lp parport_pc ipv6 nvidia(P) [last unloaded: nvidia]
May 11 10:05:00 ans-ebb3 kernel: 
May 11 10:05:00 ans-ebb3 kernel: Pid: 0, comm: swapper Tainted: P            2.6.33.2-rt13.11.trace #10 PXT    /To Be Filled By O.E.M.
May 11 10:05:00 ans-ebb3 kernel: EIP: 0060:[<c0570ab6>] EFLAGS: 00000807 CPU: 0
May 11 10:05:00 ans-ebb3 kernel: EIP is at delay_tsc+0x0/0x8a
May 11 10:05:00 ans-ebb3 kernel: EAX: 001c7b29 EBX: 00000001 ECX: 04f29000 EDX: 001c7b28
May 11 10:05:00 ans-ebb3 kernel: ESI: 00000000 EDI: c084a6c0 EBP: c0828e6c ESP: c0828e68
May 11 10:05:00 ans-ebb3 kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 preempt:04020002
May 11 10:05:00 ans-ebb3 kernel: Process swapper (pid: 0, ti=c0828000 task=c08407a0 task.ti=c0828000)
May 11 10:05:00 ans-ebb3 kernel: Stack:
May 11 10:05:00 ans-ebb3 kernel:  c0570a4f c0828e78 c0418e84 c084a740 c0828eb0 c047c35b c07c1944 00000000
May 11 10:05:00 ans-ebb3 kernel: <0> 00002712 c5805268 c084a6c0 c084a6ec 00000003 c084a740 00000083 00000000
May 11 10:05:00 ans-ebb3 kernel: <0> 00000000 c08407a0 c0828ec0 c047c45f 00000000 00000000 c0828ed4 c0441fbd
May 11 10:05:00 ans-ebb3 kernel: Call Trace:
May 11 10:05:00 ans-ebb3 kernel:  [<c0570a4f>] ? __const_udelay+0x2f/0x31
May 11 10:05:01 ans-ebb3 kernel:  [<c0418e84>] ? arch_trigger_all_cpu_backtrace+0x42/0x4f
May 11 10:05:01 ans-ebb3 kernel:  [<c047c35b>] ? __rcu_pending+0x258/0x309
May 11 10:05:01 ans-ebb3 kernel:  [<c047c45f>] ? rcu_check_callbacks+0x53/0x114
May 11 10:05:01 ans-ebb3 kernel:  [<c0441fbd>] ? update_process_times+0x32/0x43
May 11 10:05:01 ans-ebb3 kernel:  [<c04587b7>] ? tick_sched_timer+0x6d/0x9b
May 11 10:05:01 ans-ebb3 kernel:  [<c045874a>] ? tick_sched_timer+0x0/0x9b
May 11 10:05:01 ans-ebb3 kernel:  [<c044f8fc>] ? __run_hrtimer+0xeb/0x154
May 11 10:05:01 ans-ebb3 kernel:  [<c044fc4d>] ? hrtimer_interrupt+0xed/0x21e
May 11 10:05:01 ans-ebb3 kernel:  [<c04518c5>] ? sched_clock_idle_wakeup_event+0x16/0x18
May 11 10:05:01 ans-ebb3 kernel:  [<c0418512>] ? smp_apic_timer_interrupt+0xb4/0x106
May 11 10:05:01 ans-ebb3 kernel:  [<c06cfa66>] ? apic_timer_interrupt+0x2a/0x30
May 11 10:05:01 ans-ebb3 kernel:  [<c062e4f6>] ? poll_idle+0x1a/0x5c
May 11 10:05:01 ans-ebb3 kernel:  [<c062e7c9>] ? cpuidle_idle_call+0x6c/0xdf
May 11 10:05:01 ans-ebb3 kernel:  [<c0401c0d>] ? cpu_idle+0x49/0x80
May 11 10:05:01 ans-ebb3 kernel:  [<c06bdde4>] ? rest_init+0x80/0x82
May 11 10:05:01 ans-ebb3 kernel:  [<c087a859>] ? start_kernel+0x2fb/0x300
May 11 10:05:01 ans-ebb3 kernel:  [<c087a08c>] ? i386_start_kernel+0x8c/0x91
May 11 10:05:01 ans-ebb3 kernel: Code: c0 8b 0c 8d 18 7b 87 c0 6b c0 14 ba 40 ff 8d c0 55 89 e5 69 94 0a 98 00 00 00 fa 00 00 00 f7 e2 8d 42 01 ff 15 90 84 85 c0 5d c3 <55> 89 e5 57 56 89 c6 53 83 ec 04 89 e0 25 00 f0 ff ff ff 40 14 
May 11 10:05:01 ans-ebb3 kernel: Call Trace:
May 11 10:05:01 ans-ebb3 kernel:  [<c0570a4f>] ? __const_udelay+0x2f/0x31
May 11 10:05:01 ans-ebb3 kernel:  [<c0418e84>] arch_trigger_all_cpu_backtrace+0x42/0x4f
May 11 10:05:01 ans-ebb3 kernel:  [<c047c35b>] __rcu_pending+0x258/0x309
May 11 10:05:01 ans-ebb3 kernel:  [<c047c45f>] rcu_check_callbacks+0x53/0x114
May 11 10:05:01 ans-ebb3 kernel:  [<c0441fbd>] update_process_times+0x32/0x43
May 11 10:05:01 ans-ebb3 kernel:  [<c04587b7>] tick_sched_timer+0x6d/0x9b
May 11 10:05:01 ans-ebb3 kernel:  [<c045874a>] ? tick_sched_timer+0x0/0x9b
May 11 10:05:01 ans-ebb3 kernel:  [<c044f8fc>] __run_hrtimer+0xeb/0x154
May 11 10:05:01 ans-ebb3 kernel:  [<c044fc4d>] hrtimer_interrupt+0xed/0x21e
May 11 10:05:01 ans-ebb3 kernel:  [<c04518c5>] ? sched_clock_idle_wakeup_event+0x16/0x18
May 11 10:05:01 ans-ebb3 kernel:  [<c0418512>] smp_apic_timer_interrupt+0xb4/0x106
May 11 10:05:01 ans-ebb3 kernel:  [<c06cfa66>] apic_timer_interrupt+0x2a/0x30
May 11 10:05:01 ans-ebb3 kernel:  [<c062e4f6>] ? poll_idle+0x1a/0x5c
May 11 10:05:01 ans-ebb3 kernel:  [<c062e7c9>] cpuidle_idle_call+0x6c/0xdf
May 11 10:05:01 ans-ebb3 kernel:  [<c0401c0d>] cpu_idle+0x49/0x80
May 11 10:05:01 ans-ebb3 kernel:  [<c06bdde4>] rest_init+0x80/0x82
May 11 10:05:01 ans-ebb3 kernel:  [<c087a859>] start_kernel+0x2fb/0x300
May 11 10:05:01 ans-ebb3 kernel:  [<c087a08c>] i386_start_kernel+0x8c/0x91
May 11 10:05:01 ans-ebb3 kernel: Pid: 0, comm: swapper Tainted: P            2.6.33.2-rt13.11.trace #10
May 11 10:05:01 ans-ebb3 kernel: Call Trace:
May 11 10:05:01 ans-ebb3 kernel:  [<c04019d4>] ? show_regs+0x1a/0x20
May 11 10:05:01 ans-ebb3 kernel:  [<c06d0d7a>] nmi_watchdog_tick+0x9b/0x162
May 11 10:05:01 ans-ebb3 kernel:  [<c06d02e1>] do_nmi+0xd1/0x2e1
May 11 10:05:01 ans-ebb3 kernel:  [<c05645a8>] ? ltt_vtrace+0x0/0x79b
May 11 10:05:02 ans-ebb3 kernel:  [<c06cfd41>] nmi_stack_correct+0x28/0x2d
May 11 10:05:02 ans-ebb3 kernel:  [<c0570ab6>] ? delay_tsc+0x0/0x8a
May 11 10:05:02 ans-ebb3 kernel:  [<c0570a4f>] ? __const_udelay+0x2f/0x31
May 11 10:05:02 ans-ebb3 kernel:  [<c0418e84>] arch_trigger_all_cpu_backtrace+0x42/0x4f
May 11 10:05:02 ans-ebb3 kernel:  [<c047c35b>] __rcu_pending+0x258/0x309
May 11 10:05:02 ans-ebb3 kernel:  [<c047c45f>] rcu_check_callbacks+0x53/0x114
May 11 10:05:02 ans-ebb3 kernel:  [<c0441fbd>] update_process_times+0x32/0x43
May 11 10:05:02 ans-ebb3 kernel:  [<c04587b7>] tick_sched_timer+0x6d/0x9b
May 11 10:05:02 ans-ebb3 kernel:  [<c045874a>] ? tick_sched_timer+0x0/0x9b
May 11 10:05:02 ans-ebb3 kernel:  [<c044f8fc>] __run_hrtimer+0xeb/0x154
May 11 10:05:02 ans-ebb3 kernel:  [<c044fc4d>] hrtimer_interrupt+0xed/0x21e
May 11 10:05:02 ans-ebb3 kernel:  [<c04518c5>] ? sched_clock_idle_wakeup_event+0x16/0x18
May 11 10:05:02 ans-ebb3 kernel:  [<c0418512>] smp_apic_timer_interrupt+0xb4/0x106
May 11 10:05:02 ans-ebb3 kernel:  [<c06cfa66>] apic_timer_interrupt+0x2a/0x30
May 11 10:05:02 ans-ebb3 kernel:  [<c062e4f6>] ? poll_idle+0x1a/0x5c
May 11 10:05:02 ans-ebb3 kernel:  [<c062e7c9>] cpuidle_idle_call+0x6c/0xdf
May 11 10:05:02 ans-ebb3 kernel:  [<c0401c0d>] cpu_idle+0x49/0x80
May 11 10:05:02 ans-ebb3 kernel:  [<c06bdde4>] rest_init+0x80/0x82
May 11 10:05:02 ans-ebb3 kernel:  [<c087a859>] start_kernel+0x2fb/0x300
May 11 10:05:02 ans-ebb3 kernel:  [<c087a08c>] i386_start_kernel+0x8c/0x91



-----Original Message-----
From: Mathieu Desnoyers [mailto:compudj at krystal.dyndns.org]
Sent: Tue 5/11/2010 7:59 AM
To: John P. Paul
Cc: ltt-dev at lists.casi.polymtl.ca
Subject: Re: [ltt-dev] lttctl locks up with RT Linux
 
* jpaul at gdrs.com (jpaul at gdrs.com) wrote:
> Hey Mathieu:
> 
> Thanks for looking at this. I'm a bit new to debugging at this level, so
> you may need to provide me a bit more info on what you need. I attempted
> to use "pstack" on the lttctl and lttd tasks ... no luck as pstack also
> locked up.
> 
> I put a bit of tracing into liblttctl and discovered the lockup occurs
> when a write of "traceName" (whatever traceName happens to be) occurs to
> the "/mnt/debugfs/ltt/destroy_trace" file.
> 
> I'm guessing that you would like some tracing of the ltt kernel module.
> Is there something that I can turn on, or another way I could get a
> stack dump of that module after lockup?  I'll do a little research this
> weekend on kernel debugging techniques.
> 
> I can certainly sprinkle in some printk statements in the ltt kernel
> module source. Doing provided the following info:
> 
> - Control entered _ltt_trace_destroy (single underscore)
> - Control entered del_timer_sync(&ltt_async_wakeup_timer) and never
> returned
> 
> Does that help, or should I continue farther down this path?

Can you try the following patch to see if it fixes your problem ?


lttng fix rt kernel teardown deadlock

LTTng has a teardown bug on RT (deadlock):

Deleting a timer (sync) while holding the traces mutex, and the handler takes
the same mutex, which leads to a deadlock.

Fix this by taking a RCU read lock in the timer (instead of the RT-specific fix
using the mutex), and by doing synchronize_rcu() in addition to
synchronize_sched() upon updates.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
---
 ltt/ltt-tracer.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Index: linux-2.6-lttng/ltt/ltt-tracer.c
===================================================================
--- linux-2.6-lttng.orig/ltt/ltt-tracer.c	2010-05-11 07:50:46.000000000 -0400
+++ linux-2.6-lttng/ltt/ltt-tracer.c	2010-05-11 07:55:46.000000000 -0400
@@ -41,6 +41,14 @@
 #include <linux/vmalloc.h>
 #include <asm/atomic.h>
 
+static void synchronize_trace(void)
+{
+	synchronize_sched();
+#ifdef CONFIG_PREEMPT_RT
+	synchronize_rcu();
+#endif
+}
+
 static void async_wakeup(unsigned long data);
 
 static DEFINE_TIMER(ltt_async_wakeup_timer, async_wakeup, 0, 0);
@@ -321,7 +329,7 @@ void ltt_module_unregister(enum ltt_modu
 		ltt_filter_unregister();
 		ltt_run_filter_owner = NULL;
 		/* Wait for preempt sections to finish */
-		synchronize_sched();
+		synchronize_trace();
 		break;
 	case LTT_FUNCTION_FILTER_CONTROL:
 		ltt_filter_control_functor = ltt_filter_control_default;
@@ -429,13 +437,13 @@ static void async_wakeup(unsigned long d
 	 * PREEMPT_RT does not allow spinlocks to be taken within preempt
 	 * disable sections (spinlock taken in wake_up). However, mainline won't
 	 * allow mutex to be taken in interrupt context. Ugly.
-	 * A proper way to do this would be to turn the timer into a
-	 * periodically woken up thread, but it adds to the footprint.
+	 * Take a standard RCU read lock for RT kernels, which imply that we
+	 * also have to synchronize_rcu() upon updates.
 	 */
 #ifndef CONFIG_PREEMPT_RT
 	rcu_read_lock_sched();
 #else
-	ltt_lock_traces();
+	rcu_read_lock();
 #endif
 	list_for_each_entry_rcu(trace, &ltt_traces.head, list) {
 		trace_async_wakeup(trace);
@@ -443,7 +451,7 @@ static void async_wakeup(unsigned long d
 #ifndef CONFIG_PREEMPT_RT
 	rcu_read_unlock_sched();
 #else
-	ltt_unlock_traces();
+	rcu_read_unlock();
 #endif
 
 	mod_timer(&ltt_async_wakeup_timer, jiffies + LTT_PERCPU_TIMER_INTERVAL);
@@ -901,7 +909,7 @@ int ltt_trace_alloc(const char *trace_na
 		set_kernel_trace_flag_all_tasks();
 	}
 	list_add_rcu(&trace->list, &ltt_traces.head);
-	synchronize_sched();
+	synchronize_trace();
 
 	ltt_unlock_traces();
 
@@ -974,7 +982,7 @@ static int _ltt_trace_destroy(struct ltt
 	}
 	/* Everything went fine */
 	list_del_rcu(&trace->list);
-	synchronize_sched();
+	synchronize_trace();
 	if (list_empty(&ltt_traces.head)) {
 		clear_kernel_trace_flag_all_tasks();
 		/*
@@ -1195,7 +1203,7 @@ static int _ltt_trace_stop(struct ltt_tr
 			trace->nr_channels);
 		trace->active = 0;
 		ltt_traces.num_active_traces--;
-		synchronize_sched(); /* Wait for each tracing to be finished */
+		synchronize_trace(); /* Wait for each tracing to be finished */
 	}
 	module_put(ltt_run_filter_owner);
 	/* Everything went fine */
@@ -1327,12 +1335,12 @@ static void __exit ltt_exit(void)
 	list_for_each_entry_rcu(trace, &ltt_traces.head, list)
 		_ltt_trace_stop(trace);
 	/* Wait for quiescent state. Readers have preemption disabled. */
-	synchronize_sched();
+	synchronize_trace();
 	/* Safe iteration is now permitted. It does not have to be RCU-safe
 	 * because no readers are left. */
 	list_for_each_safe(pos, n, &ltt_traces.head) {
 		trace = container_of(pos, struct ltt_trace, list);
-		/* _ltt_trace_destroy does a synchronize_sched() */
+		/* _ltt_trace_destroy does a synchronize_trace() */
 		_ltt_trace_destroy(trace);
 		__ltt_trace_destroy(trace);
 	}



--
This is an e-mail from General Dynamics Robotic Systems. It is for the intended recipient only and may contain confidential and privileged information. No one else may read, print, store, copy, forward or act in reliance on it or its attachments. If you are not the intended recipient, please return this message to the sender and delete the message and any attachments from your computer. Your cooperation is appreciated.





More information about the lttng-dev mailing list