[ltt-dev] use percpu variable for ltt_nesting

Mathieu Desnoyers compudj at krystal.dyndns.org
Wed Sep 3 11:44:21 EDT 2008


Hi Jiaying,

Excellent finding !

I actually introduced ltt_nesting more as a debugging facility in the
beginning, but it became clear that leaving it there was safer (so
infinite recursion would be detected). However, I never ended up
changing this into the proper per cpu variable, so yes, there is a huge
amount of cache line bouncing here.

I just made some patches which fix every single LTTng patch in the
patchset individually. I'll post them shortly.

Thanks,

Mathieu

* Jiaying Zhang (jiayingz at google.com) wrote:
> Hi Mathieu,
> 
> I found lttng sometimes has very poor performance on multiple cpu systems
> and it seems
> the more processors the system has, the more performance overhead I saw with
> lttng enabled.
> Here are the results I collected with tbench benchmark (
> http://samba.org/ftp/tridge/dbench/).
> 
> single processor:
>    lttng disabled: 236.07 MB/sec
>    lttng enabled:  210.569 MB/sec
> 
> 16 processors:
>    lttng disabled:  4173.83 MB/sec
>    lttng enabled:  1766.77 MB/sec
> 
> After playing with the code for a while and asking around, I found this
> performance
> issue is caused by the cpu contention while updating the ltt_nesting
> variable used
> in ltt/ltt-serialize.c:ltt_vtrace:
>   ltt_nesting[smp_processor_id()]++;
>   ...
>   ltt_nesting[smp_processor_id()]--;
> 
> In the attachment is the patch that changes ltt_nesting into a per_cpu
> variable. With the patch applied,
> the tbench performance with lttng applied gets to about 3600 MB/sec on the
> 16 processor system.
> 
> Jiaying

> Index: linux-2.6.26/include/linux/ltt-core.h
> ===================================================================
> --- linux-2.6.26.orig/include/linux/ltt-core.h	2008-09-02 21:36:27.000000000 -0700
> +++ linux-2.6.26/include/linux/ltt-core.h	2008-09-02 21:36:52.000000000 -0700
> @@ -25,7 +25,7 @@
>  
>  
>  /* Keep track of trap nesting inside LTT */
> -extern unsigned int ltt_nesting[];
> +extern unsigned int per_cpu_var(ltt_nesting);
>  
>  typedef int (*ltt_run_filter_functor)(void *trace, uint16_t eID);
>  extern ltt_run_filter_functor ltt_run_filter;
> Index: linux-2.6.26/ltt/ltt-relay.c
> ===================================================================
> --- linux-2.6.26.orig/ltt/ltt-relay.c	2008-09-02 21:36:27.000000000 -0700
> +++ linux-2.6.26/ltt/ltt-relay.c	2008-09-02 21:36:52.000000000 -0700
> @@ -1013,7 +1013,7 @@
>  	/*
>  	 * Perform retryable operations.
>  	 */
> -	if (ltt_nesting[smp_processor_id()] > 4) {
> +	if (per_cpu(ltt_nesting, smp_processor_id()) > 4) {
>  		local_inc(&ltt_buf->events_lost);
>  		return NULL;
>  	}
> @@ -1212,7 +1212,7 @@
>  	"buffer full : event lost in blocking "
>  	"mode. Increase LTT_RESERVE_CRITICAL.\n");
>  	printk(KERN_ERR "LTT nesting level is %u.\n",
> -		ltt_nesting[cpu]);
> +		per_cpu(ltt_nesting, cpu));
>  	printk(KERN_ERR "LTT avail size %lu.\n",
>  		dbg->avail_size);
>  	printk(KERN_ERR "avai write : %lu, read : %lu\n",
> Index: linux-2.6.26/ltt/ltt-serialize.c
> ===================================================================
> --- linux-2.6.26.orig/ltt/ltt-serialize.c	2008-09-02 21:36:27.000000000 -0700
> +++ linux-2.6.26/ltt/ltt-serialize.c	2008-09-02 21:36:52.000000000 -0700
> @@ -647,7 +647,7 @@
>  		cpu = smp_processor_id();
>  	else
>  		cpu = private_data->cpu;
> -	ltt_nesting[smp_processor_id()]++;
> +	per_cpu(ltt_nesting, smp_processor_id())++;
>  
>  	if (unlikely(private_data && private_data->trace))
>  		dest_trace = private_data->trace;
> @@ -703,7 +703,7 @@
>  		va_end(args_copy);
>  		ltt_commit_slot(channel, &transport_data, buffer, slot_size);
>  	}
> -	ltt_nesting[smp_processor_id()]--;
> +	per_cpu(ltt_nesting, smp_processor_id())--;
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL_GPL(ltt_vtrace);


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list