[ltt-dev] performance results I measured with the latest lttng patches

Thu Sep 25 16:20:11 EDT 2008

* Jiaying Zhang (jiayingz at google.com) wrote:
> Hi Mathieu,
> 
> On Thu, Sep 25, 2008 at 12:50 PM, Mathieu Desnoyers <
> compudj at krystal.dyndns.org> wrote:
> 
> > * Jiaying Zhang (jiayingz at google.com) wrote:
> > > Hi Jan,
> > >
> > > >
> > > > > lttng-compiled-in, all markers disabled: Throughput 743.931 MB/sec
> > > > > lttng-compiled-in, default markers enabled, taking an active trace in
> > > > normal
> > > > > mode: Throughput 500.24 MB/sec
> > > >
> > > > Isn't lttng-compile-out missing here? Or did you ran the same base
> > kernel?
> > >
> > >
> > > I used the same base kernel here.
> > >
> > > Jiaying
> >
> > To to give an idea of the order of magniture of kernel effect on tbench
> > on my dual 4-cores box Intel box :
> >
> > stock kernel 2.6.27-rc6 : 2024MB/s
> > stock kernel 2.6.27-rc7 : 1989MB/s
> >
> > 1.7% slowdown, with exactly the same config. And this is between
> > rc releases where the diff consists mostly of _bugfixes_.
> 
> 
> So do both kernels have LTTng compiled in but disabled?
> It is possible to see variance with difference kernel versions.
> To focus on the performance effects of Lttng, I guess we should
> stick with the same kernel version with difference Lttng configurations.
> 

The patchset was unapplied. It was a vanilla mainline kernel.

> 
> > I start to doubt tbench represents well "real" workloads, given it's
> > mostly doing kernel interaction and almost nothing in userspace.
> 
> 
> There are many kinds of "real" workloads. I think we may want to pay
> special attentions to those benchmarks that stress Lttng.
> 

Sure, as long as we agree that having a 5% on a very specific workload
designed especially to stress the OS does not have the same meaning as
having a 5% performance impact on a real workload using a wide variety
of the kernel/userspace infrastructure. Such specific workloads are very
good to exaggerate the performance impact, and thus good to expose
patches which cause the most performance impact (through bisection), but
I don't think they should be thought as reprensentative of tracer impact
on a normal workload.

A well-designed application will likely not generate that much small I/O
operations and interruptions; this is why we have buffered reads/writes.
And typically, the application involved will throw the kernel away from
its cache lines repeatedly. Having a very small amount of code to
execute tends to leave all the code (user and kernel) cache-hot. The
side-effect of this is that if the workload is close to run completely
within the cache, each cacheline "eaten" by either a code/data layout
modification in the kernel will have an impact on the performance, given
it means more memory accesses.

However, normal workloads usually don't fit in the CPU cache lines and
therefore the impact of such small layout modification almost
disappears.

Mathieu

> Jiaying
> 
> 
> >
> > Mathieu
> >
> >
> > --
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> >

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68