[ltt-dev] use percpu variable for ltt_nesting

Mathieu Desnoyers compudj at krystal.dyndns.org
Tue Sep 9 17:51:40 EDT 2008


Hi Jiaying,

I'm interested in knowing a little bit more about the setup generating
these numbers. Can you ship me the .config and dmesg output of a machine
running LTTng ? I am particularly interested in knowing if LTTng
detected synchronized TSCs. I would also like if you could try to force
the TIF_KERNEL_TRACE to 0 (by bypassing kernel/sched.c:
set_kernel_trace_flag_all_tasks()).

Having benchmarks for your setup with :

- LTTng compiled out
- LTTng compiled-in, all markers disabled
- LTTng compiled-in, default markers enabled
- LTTng compiled-in, markers disabled, a sigle trace is created, but not
  active. (lttctl -n name -c)
  This will call set_kernel_trace_flag_all_tasks() to activate syscall
  tracing for all threads.
- LTTng compiled-in, default markers enabled, taking an active trace in
  normal mode (dumping to disk).
- Same as above, but LTTng running in flight recorder mode, with default
  size buffers.
- Same as about, but with LTTng running with tiny buffers (~64kB each).

If it's easy enough, doing these tests on UP and SMP kernels would be
good. Knowing all that, we will be able to see if we can start improving
performance.

Thanks,

Mathieu

* Jiaying Zhang (jiayingz at google.com) wrote:
> Hi Mathieu,
> 
> I found lttng sometimes has very poor performance on multiple cpu systems
> and it seems
> the more processors the system has, the more performance overhead I saw with
> lttng enabled.
> Here are the results I collected with tbench benchmark (
> http://samba.org/ftp/tridge/dbench/).
> 
> single processor:
>    lttng disabled: 236.07 MB/sec
>    lttng enabled:  210.569 MB/sec
> 
> 16 processors:
>    lttng disabled:  4173.83 MB/sec
>    lttng enabled:  1766.77 MB/sec
> 
> After playing with the code for a while and asking around, I found this
> performance
> issue is caused by the cpu contention while updating the ltt_nesting
> variable used
> in ltt/ltt-serialize.c:ltt_vtrace:
>   ltt_nesting[smp_processor_id()]++;
>   ...
>   ltt_nesting[smp_processor_id()]--;
> 
> In the attachment is the patch that changes ltt_nesting into a per_cpu
> variable. With the patch applied,
> the tbench performance with lttng applied gets to about 3600 MB/sec on the
> 16 processor system.
> 
> Jiaying

> Index: linux-2.6.26/include/linux/ltt-core.h
> ===================================================================
> --- linux-2.6.26.orig/include/linux/ltt-core.h	2008-09-02 21:36:27.000000000 -0700
> +++ linux-2.6.26/include/linux/ltt-core.h	2008-09-02 21:36:52.000000000 -0700
> @@ -25,7 +25,7 @@
>  
>  
>  /* Keep track of trap nesting inside LTT */
> -extern unsigned int ltt_nesting[];
> +extern unsigned int per_cpu_var(ltt_nesting);
>  
>  typedef int (*ltt_run_filter_functor)(void *trace, uint16_t eID);
>  extern ltt_run_filter_functor ltt_run_filter;
> Index: linux-2.6.26/ltt/ltt-relay.c
> ===================================================================
> --- linux-2.6.26.orig/ltt/ltt-relay.c	2008-09-02 21:36:27.000000000 -0700
> +++ linux-2.6.26/ltt/ltt-relay.c	2008-09-02 21:36:52.000000000 -0700
> @@ -1013,7 +1013,7 @@
>  	/*
>  	 * Perform retryable operations.
>  	 */
> -	if (ltt_nesting[smp_processor_id()] > 4) {
> +	if (per_cpu(ltt_nesting, smp_processor_id()) > 4) {
>  		local_inc(&ltt_buf->events_lost);
>  		return NULL;
>  	}
> @@ -1212,7 +1212,7 @@
>  	"buffer full : event lost in blocking "
>  	"mode. Increase LTT_RESERVE_CRITICAL.\n");
>  	printk(KERN_ERR "LTT nesting level is %u.\n",
> -		ltt_nesting[cpu]);
> +		per_cpu(ltt_nesting, cpu));
>  	printk(KERN_ERR "LTT avail size %lu.\n",
>  		dbg->avail_size);
>  	printk(KERN_ERR "avai write : %lu, read : %lu\n",
> Index: linux-2.6.26/ltt/ltt-serialize.c
> ===================================================================
> --- linux-2.6.26.orig/ltt/ltt-serialize.c	2008-09-02 21:36:27.000000000 -0700
> +++ linux-2.6.26/ltt/ltt-serialize.c	2008-09-02 21:36:52.000000000 -0700
> @@ -647,7 +647,7 @@
>  		cpu = smp_processor_id();
>  	else
>  		cpu = private_data->cpu;
> -	ltt_nesting[smp_processor_id()]++;
> +	per_cpu(ltt_nesting, smp_processor_id())++;
>  
>  	if (unlikely(private_data && private_data->trace))
>  		dest_trace = private_data->trace;
> @@ -703,7 +703,7 @@
>  		va_end(args_copy);
>  		ltt_commit_slot(channel, &transport_data, buffer, slot_size);
>  	}
> -	ltt_nesting[smp_processor_id()]--;
> +	per_cpu(ltt_nesting, smp_processor_id())--;
>  	preempt_enable();
>  }
>  EXPORT_SYMBOL_GPL(ltt_vtrace);


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list