[lttng-dev] 答复:答复: UST performance

zhenyu.ren zhenyu.ren at aliyun.com
Tue Apr 28 23:39:08 EDT 2015


Oh,My mistake. I am not using -O2 (I copied the Makefile from lttng-ust-2.6.0/doc/examples/easy-ust/Makefile :) ). When compiled with -O2, I get a 0.454us/event(10M events total 4548921074 ns).>Did you define _LGPL_SOURCE at the top of both files ? (the code and the probe provider)  Yes,I did define it all of the files(provider(.h/.c) and sample.c)
>I see that in the numbers you provided in your earlier email, you talk about "us" (microseconds), did you actually mean "ns" (nanoseconds) ?  Yes,I use "us", say  0.554us , 0.28us, 0.5us and the latest 0.454us . If use "ns",they are 554ns, 280ns,500ns,454ns ,it seems no mistake here。My CPU is 2 x Xeon E5-2420  1.90GHz , OS: RHEL6,2.6.32-220 Maybe the fact my ust event slower than yours is due to the CPU ability,I am not sure about that. Does you perf report show something like  native_write_msr_safe?Thankszhenyu.ren------------------------------------------------------------------发件人:Mathieu Desnoyers <mathieu.desnoyers at efficios.com>发送时间:2015年4月29日(星期三) 10:25收件人:zhenyu.ren <zhenyu.ren at aliyun.com>抄 送:lttng-dev <lttng-dev at lists.lttng.org>主 题:Re: 答复:[lttng-dev] UST performance
Hi,Mathieu
> Is your sched_getcpu() system call a vdso too
   I use VSYSCALL_ADDR(__NR_vgetcpu) instead sched_getcpu so there is no getcpu system calls.
> make sure you #define _LGPL_SOURCE in your benchmark program
   I do it as you suggested but no difference.

> Then, you can try using "perf" to profile the runtime execution 
$~/renzhyu/perf record ./sample
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.


using 5720094251 ns
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.221 MB perf.data (~9658 samples) ]


$~/renzhyu/perf report
Events: 5K cycles
16.82% sample liblttng-ust.so.0.0.0 [.] lttng_event_commit10.55% sample liblttng-ust.so.0.0.0 [.] lttng_event_strcpy10.45% sample [kernel.kallsyms] [k] 0xffffffff8103548a8.16% sample liblttng-ust.so.0.0.0 [.] lttng_event_reserve7.57% sample sample [.] __event_get_size__sample_component___message6.42% sample sample [.] __event_probe__sample_component___message6.24% sample liblttng-ust.so.0.0.0 [.] lttng_write_event_header5.40% sample sample [.] _rcu_read_lock_update5.35% sample sample [.] _rcu_read_unlock_bp5.17% sample liburcu-bp.so.2.0.0 [.] rcu_read_unlock_bp4.14% sample ld-2.5.so [.] __tls_get_addr3.38% sample [vdso] [.] 0x7fff8f72e6be3.35% sample sample [.] main1.88% sample sample [.] lib_ring_buffer_ctx_init1.79% sample sample [.] _rcu_read_lock_bp1.20% sample liburcu-bp.so.2.0.0 [.] rcu_read_lock_bp0.66% sample sample [.] lib_ring_buffer_align_ctx0.38% sample sample [.] cds_list_empty0.37% sample sample [.] lib_ring_buffer_align0.35% sample liburcu-bp.so.2.0.0 [.] __tls_get_addr at plt0.28% sample sample [.] __event_get_align__sample_component___message0.09% sample ld-2.5.so [.] do_lookup_Did you compile your program and the probe provider
(tp.c) with -O2 ?

Did you define _LGPL_SOURCE at the top of both files ?
(the code and the probe provider)

I tried your sample.c program here:

CPU:
Intel(R) Core(TM) i7-4600U CPU @ 2.10GHzThis is within a virtual machine.

And I get 302 ns/event (total: 3019987815ns exec time)

I see that in the numbers you provided in your earlier email,
you talk about "us" (microseconds), did you actually mean"ns" (nanoseconds) ?

I enabled tracing this way:

lttng create --snapshot
lttng enable-event -u -a
lttng start

Thoughts ?

Thanks,

Mathieu
The symbol  0xffffffff8103548a is in native_write_msr_safe,I don't know what the function is and why that kernel function can be TOP 3.ffffffff81035480 <native_write_msr_safe>:
ffffffff81035480: 55 push %rbp
ffffffff81035481: 89 f9 mov %edi,%ecx
ffffffff81035483: 89 f0 mov %esi,%eax
ffffffff81035485: 48 89 e5 mov %rsp,%rbp
ffffffff81035488: 0f 30 wrmsr
ffffffff8103548a: 31 c0 xor %eax,%eax
ffffffff8103548c: c9 leaveq
ffffffff8103548d: c3 retq
ffffffff8103548e: 66 90 xchg %ax,%ax
PS:my sample.c
#include <time.h>
#include <unistd.h>
#include <stdio.h>
#define _LGPL_SOURCE
#define TRACEPOINT_DEFINE
#include "sample_component_provider.h"

#define NUM 10000000
int main(int argc, char **argv)
{
 int i = 0;
 struct timespec bef,aft;
 long diff;
 clock_gettime(CLOCK_MONOTONIC,&bef);
 for (i = 0; i < NUM; i++) {
tracepoint(sample_component, message, "HelloWorld");
 }
 clock_gettime(CLOCK_MONOTONIC,&aft);
 diff = (aft.tv_sec-bef.tv_sec)*1000000000+(aft.tv_nsec-bef.tv_nsec);
 printf("using %ld ns\n", diff);
 return 0;
}

Thankszhenyu.ren------------------------------------------------------------------发件人:Mathieu Desnoyers <mathieu.desnoyers at efficios.com>发送时间:2015年4月28日(星期二) 23:47收件人:zhenyu.ren <zhenyu.ren at aliyun.com>抄 送:lttng-dev <lttng-dev at lists.lttng.org>主 题:Re: [lttng-dev] UST performance
Hi,
Is your sched_getcpu() system call a vdso too ? This may explain the difference.

Moreover, make sure you #define _LGPL_SOURCE in your benchmark program
before including any lttng ust header. It will inline many functions associated with
tracepoints.

Then, you can try using "perf" to profile the runtime execution of your program
traced by lttng-ust ? This should give us the information we need to investigate the
performance profile.

Thanks,

Mathieu


Hi,dev
    About a month ago,I asked a question about how to do time recording for ust event when vdso_clock_gettime not availble on my platform(Glibc too old).Recently I implemented a lib to analyze vdso in memory and lookuped the symbol vdso_clock_gettime successfully.With this vdso_clock_gettime availble,I get a 16% improvement in time spending per ust event. I.E. I measured the time taked by a event with 10 bytes string :
 no   vdso_clock_gettime : 0.642us with vdso_clock_gettime:0.554us
However,Both the numbers are much higher than the following 0.28us(But closed to 0.5us --W/O opt). So what's the difference between UST(0.28us) and UST W/O OPT(0.5us)?? 



PS:My hardware2 x Xeon E5-2430  2.20GHz 100MHz FSB (24 cores) 94.6GB / 96GB 1333MHz DDR3 == 6 x 16GB, 6 x empty
Thankszhenyu.ren



_______________________________________________
lttng-dev mailing list
lttng-dev at lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20150429/4e24ceec/attachment-0001.html>


More information about the lttng-dev mailing list