[lttng-dev] 答复: UST performance

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Apr 28 22:25:12 EDT 2015


----- Original Message -----

> Hi,Mathieu

> > Is your sched_getcpu() system call a vdso too

> I use VSYSCALL_ADDR(__NR_vgetcpu) instead sched_getcpu so there is no getcpu
> system calls.

> > make sure you #define _LGPL_SOURCE in your benchmark program

> I do it as you suggested but no difference.

> > Then, you can try using "perf" to profile the runtime execution

> $~/renzhyu/perf record ./sample
> WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
> check /proc/sys/kernel/kptr_restrict.

> Samples in kernel functions may not be resolved if a suitable vmlinux
> file is not found in the buildid cache or in the vmlinux path.

> Samples in kernel modules won't be resolved at all.

> If some relocation was applied (e.g. kexec) symbols may be misresolved
> even with a suitable vmlinux or kallsyms file.

> using 5720094251 ns
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.221 MB perf.data (~9658 samples) ]

> $~/renzhyu/perf report
> Events: 5K cycles
> 16.82% sample liblttng-ust.so.0.0.0 [.] lttng_event_commit
> 10.55% sample liblttng-ust.so.0.0.0 [.] lttng_event_strcpy
> 10.45% sample [kernel.kallsyms] [k] 0xffffffff8103548a
> 8.16% sample liblttng-ust.so.0.0.0 [.] lttng_event_reserve
> 7.57% sample sample [.] __event_get_size__sample_component___message
> 6.42% sample sample [.] __event_probe__sample_component___message
> 6.24% sample liblttng-ust.so.0.0.0 [.] lttng_write_event_header
> 5.40% sample sample [.] _rcu_read_lock_update
> 5.35% sample sample [.] _rcu_read_unlock_bp
> 5.17% sample liburcu-bp.so.2.0.0 [.] rcu_read_unlock_bp
> 4.14% sample ld-2.5.so [.] __tls_get_addr
> 3.38% sample [vdso] [.] 0x7fff8f72e6be
> 3.35% sample sample [.] main
> 1.88% sample sample [.] lib_ring_buffer_ctx_init
> 1.79% sample sample [.] _rcu_read_lock_bp
> 1.20% sample liburcu-bp.so.2.0.0 [.] rcu_read_lock_bp
> 0.66% sample sample [.] lib_ring_buffer_align_ctx
> 0.38% sample sample [.] cds_list_empty
> 0.37% sample sample [.] lib_ring_buffer_align
> 0.35% sample liburcu-bp.so.2.0.0 [.] __tls_get_addr at plt
> 0.28% sample sample [.] __event_get_align__sample_component___message
> 0.09% sample ld-2.5.so [.] do_lookup_

Did you compile your program and the probe provider 
(tp.c) with -O2 ? 

Did you define _LGPL_SOURCE at the top of both files ? 
(the code and the probe provider) 

I tried your sample.c program here: 

CPU: 
Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz 
This is within a virtual machine. 

And I get 302 ns/event (total: 3019987815ns exec time) 

I see that in the numbers you provided in your earlier email, 
you talk about "us" (microseconds), did you actually mean 
"ns" (nanoseconds) ? 

I enabled tracing this way: 

lttng create --snapshot 
lttng enable-event -u -a 
lttng start 

Thoughts ? 

Thanks, 

Mathieu 

> The symbol 0xffffffff8103548a is in native_write_msr_safe,I don't know what
> the function is and why that kernel function can be TOP 3.
> ffffffff81035480 <native_write_msr_safe>:
> ffffffff81035480: 55 push %rbp
> ffffffff81035481: 89 f9 mov %edi,%ecx
> ffffffff81035483: 89 f0 mov %esi,%eax
> ffffffff81035485: 48 89 e5 mov %rsp,%rbp
> ffffffff81035488: 0f 30 wrmsr
> ffffffff8103548a: 31 c0 xor %eax,%eax
> ffffffff8103548c: c9 leaveq
> ffffffff8103548d: c3 retq
> ffffffff8103548e: 66 90 xchg %ax,%ax

> PS:my sample.c

> #include <time.h>
> #include <unistd.h>
> #include <stdio.h>
> #define _LGPL_SOURCE
> #define TRACEPOINT_DEFINE
> #include "sample_component_provider.h"

> #define NUM 10000000
> int main(int argc, char **argv)
> {
> int i = 0;
> struct timespec bef,aft;
> long diff;
> clock_gettime(CLOCK_MONOTONIC,&bef);
> for (i = 0; i < NUM; i++) {
> tracepoint(sample_component, message, "HelloWorld");
> }
> clock_gettime(CLOCK_MONOTONIC,&aft);
> diff = (aft.tv_sec-bef.tv_sec)*1000000000+(aft.tv_nsec-bef.tv_nsec);
> printf("using %ld ns\n", diff);
> return 0;
> }

> Thanks
> zhenyu.ren

> > ------------------------------------------------------------------
> 
> > 发件人:Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> 
> > 发送时间:2015年4月28日(星期二) 23:47
> 
> > 收件人:zhenyu.ren <zhenyu.ren at aliyun.com>
> 
> > 抄 送:lttng-dev <lttng-dev at lists.lttng.org>
> 
> > 主 题:Re: [lttng-dev] UST performance
> 

> > Hi,
> 

> > Is your sched_getcpu() system call a vdso too ? This may explain the
> > difference.
> 

> > Moreover, make sure you #define _LGPL_SOURCE in your benchmark program
> 
> > before including any lttng ust header. It will inline many functions
> > associated with
> 
> > tracepoints.
> 

> > Then, you can try using "perf" to profile the runtime execution of your
> > program
> 
> > traced by lttng-ust ? This should give us the information we need to
> > investigate the
> 
> > performance profile.
> 

> > Thanks,
> 

> > Mathieu
> 

> > ----- Original Message -----
> 

> > > Hi,dev
> > 
> 

> > > About a month ago,I asked a question about how to do time recording for
> > > ust
> > > event when vdso_clock_gettime not availble on my platform(Glibc too
> > > old).Recently I implemented a lib to analyze vdso in memory and lookuped
> > > the
> > > symbol vdso_clock_gettime successfully.With this vdso_clock_gettime
> > > availble,I get a 16% improvement in time spending per ust event. I.E. I
> > > measured the time taked by a event with 10 bytes string :
> > 
> 

> > > no vdso_clock_gettime : 0.642us
> > 
> 
> > > with vdso_clock_gettime:0.554us
> > 
> 

> > > However,Both the numbers are much higher than the following 0.28us(But
> > > closed
> > > to 0.5us --W/O opt). So what's the difference between UST(0.28us) and UST
> > > W/O OPT(0.5us)??
> > 
> 

> > > PS:My hardware
> > 
> 
> > > 2 x Xeon E5-2430 2.20GHz 100MHz FSB (24 cores) 94.6GB / 96GB 1333MHz DDR3
> > > ==
> > > 6 x 16GB, 6 x empty
> > 
> 

> > > Thanks
> > 
> 
> > > zhenyu.ren
> > 
> 

> > > _______________________________________________
> > 
> 
> > > lttng-dev mailing list
> > 
> 
> > > lttng-dev at lists.lttng.org
> > 
> 
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > 
> 

> > --
> 
> > Mathieu Desnoyers
> 
> > EfficiOS Inc.
> 
> > http://www.efficios.com
> 

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20150429/a5cdc5a0/attachment-0001.html>


More information about the lttng-dev mailing list