<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Venkatesh ChitlurSrinivasa" <Venkatesh.Babu@netapp.com><br><b>To: </b>"mathieu desnoyers" <mathieu.desnoyers@efficios.com><br><b>Cc: </b>lttng-dev@lists.lttng.org<br><b>Sent: </b>Friday, September 5, 2014 2:53:51 PM<br><b>Subject: </b>What is the cost of user-space tracepoint() ?<br><div><br></div><style style="display:none"><!--P{margin-top:0;margin-bottom:0;} .ms-cui-menu {background-color:#ffffff;border:1px rgb(171, 171, 171) solid;font-family:'Segoe UI WPC', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', Verdana, sans-serif;font-size:11pt;color:rgb(51, 51, 51);} .ms-cui-menusection-title {display:none;} .ms-cui-ctl {vertical-align:text-top;text-decoration:none;color:rgb(51, 51, 51);} .ms-cui-ctl-on {background-color:rgb(223, 237, 250);opacity: 0.8;} .ms-cui-img-cont-float {display:inline-block;margin-top:2px} .ms-cui-smenu-inner {padding-top:0px;} .ms-owa-paste-option-icon {margin: 2px 4px 0px 4px;vertical-align:sub;padding-bottom: 2px;display:inline-block;} .ms-rtePasteFlyout-option:hover {background-color:rgb(223, 237, 250) !important;opacity:1 !important;} .ms-rtePasteFlyout-option {padding:8px 4px 8px 4px;outline:none;} .ms-cui-menusection {float:left; width:85px;height:24px;overflow:hidden}.wf {speak:none; font-weight:normal; font-variant:normal; text-transform:none; -webkit-font-smoothing:antialiased; vertical-align:middle; display:inline-block;}.wf-family-owa {font-family:'o365Icons'}@font-face { font-family:'o365IconsIE8'; src:; font-weight:normal; font-style:normal;}.wf-family-owa {font-family:'o365IconsMouse'}.ie8 .wf-family-owa {font-family:'o365IconsIE8'}.ie8 .wf-owa-play-large:before {content:'\e254';}.notIE8 .wf-owa-play-large:before {content:'\e054';}.ie8 .wf-owa-play-large {color:#FFFFFF;}.notIE8 .wf-owa-play-large {border-color:#FFFFFF; width:1.4em; height:1.4em; border-width:.1em; border-style:solid; border-radius:.8em; text-align:center; box-sizing:border-box; -moz-box-sizing:border-box; padding:0.1em; color:#FFFFFF;}.ie8 .wf-size-play-large {width:40px; height:40px; font-size:30px}.notIE8 .wf-size-play-large {width:40px; height:40px; font-size:30px}
<!--
p
{margin-top:0;
margin-bottom:0}
.ms-cui-menu
{background-color:#ffffff;
border:1px rgb(171,171,171) solid;
font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif;
font-size:11pt;
color:rgb(51,51,51)}
.ms-cui-menusection-title
{}
.ms-cui-ctl
{vertical-align:text-top;
text-decoration:none;
color:rgb(51,51,51)}
.ms-cui-ctl-on
{background-color:rgb(223,237,250)}
.ms-cui-img-cont-float
{display:inline-block;
margin-top:2px}
.ms-cui-smenu-inner
{padding-top:0px}
.ms-owa-paste-option-icon
{margin:2px 4px 0px 4px;
vertical-align:sub;
padding-bottom:2px;
display:inline-block}
.ms-rtePasteFlyout-option
{padding:8px 4px 8px 4px;
outline:none}
.ms-cui-menusection
{float:left;
width:85px;
height:24px;
overflow:hidden}
.wf
{speak:none;
font-weight:normal;
font-variant:normal;
text-transform:none;
vertical-align:middle;
display:inline-block}
.wf-family-owa
{font-family:'o365Icons'}
@font-face
{font-family:'o365IconsIE8';
font-weight:normal;
font-style:normal}
@font-face
{font-family:'o365IconsMouse';
font-weight:normal;
font-style:normal}
.wf-family-owa
{font-family:'o365IconsMouse'}
.ie8 .wf-family-owa
{font-family:'o365IconsIE8'}
.notIE8 .wf-owa-play-large
{border-color:#FFFFFF;
width:1.4em;
height:1.4em;
border-width:.1em;
border-style:solid;
text-align:center;
padding:0.1em;
color:#FFFFFF}
.ie8 .wf-size-play-large
{width:40px;
height:40px;
font-size:30px}
.notIE8 .wf-size-play-large
{width:40px;
height:40px;
font-size:30px}
-->
--></style><div style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;"><style style="">
<!--
p
{margin-top:0;
margin-bottom:0}
.ms-cui-menu
{background-color:#ffffff;
border:1px rgb(171,171,171) solid;
font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif;
font-size:11pt;
color:rgb(51,51,51)}
.ms-cui-menusection-title
{}
.ms-cui-ctl
{vertical-align:text-top;
text-decoration:none;
color:rgb(51,51,51)}
.ms-cui-ctl-on
{background-color:rgb(223,237,250)}
.ms-cui-img-cont-float
{display:inline-block;
margin-top:2px}
.ms-cui-smenu-inner
{padding-top:0px}
.ms-owa-paste-option-icon
{margin:2px 4px 0px 4px;
vertical-align:sub;
padding-bottom:2px;
display:inline-block}
.ms-rtePasteFlyout-option
{padding:8px 4px 8px 4px;
outline:none}
.ms-cui-menusection
{float:left;
width:85px;
height:24px;
overflow:hidden}
.wf
{speak:none;
font-weight:normal;
font-variant:normal;
text-transform:none;
vertical-align:middle;
display:inline-block}
.wf-family-owa
{font-family:'o365Icons'}
@font-face
{font-family:'o365IconsIE8';
font-weight:normal;
font-style:normal}
@font-face
{font-family:'o365IconsMouse';
font-weight:normal;
font-style:normal}
.wf-family-owa
{font-family:'o365IconsMouse'}
.ie8 .wf-family-owa
{font-family:'o365IconsIE8'}
.notIE8 .wf-owa-play-large
{border-color:#FFFFFF;
width:1.4em;
height:1.4em;
border-width:.1em;
border-style:solid;
text-align:center;
padding:0.1em;
color:#FFFFFF}
.ie8 .wf-size-play-large
{width:40px;
height:40px;
font-size:30px}
.notIE8 .wf-size-play-large
{width:40px;
height:40px;
font-size:30px}
-->
</style><div style="background-color: rgb(255, 255, 255);"><p style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;"><span style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px">Mathieu,</span></p><p><br><span style="color: #282828; font-family: Segoe UI WPC,Segoe UI,Tahoma,Microsoft Sans Serif,Verdana,sans-serif; font-size: small;" data-mce-style="color: #282828; font-family: Segoe UI WPC,Segoe UI,Tahoma,Microsoft Sans Serif,Verdana,sans-serif; font-size: small;" color="#282828" face="Segoe UI WPC, Segoe UI, Tahoma, Microsoft Sans Serif, Verdana, sans-serif" size="2">I tried to send this email to lttng-dev@ but I didn't get any response. So I am sending this directly to you. I greatly appreciate your response.</span></p></div></div></blockquote><div>Hi Venkatesh,<br></div><div><br></div><div>Sorry, I was on vacation and just recently returned. I must admit I did not<br></div><div>have time to fully deal with my email backlog.<br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;"><div style="background-color: rgb(255, 255, 255);"><p style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;"><span style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px"><br>
</span></p><p style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;"><span style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px">We are planning to use LTTng UST as it supports lot of interesting features, but have some performance concerns (as compared
with our in-house tracing tool). Please point me to the latest benchmark tests and performance results. <span style="color: rgb(40, 40, 40); font-family: 'Segoe UI WPC', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', Verdana, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255);">On CPU Intel
Xeon E5-2680 v2 @ 2.80GHz, running Linux 3.6.11 and lttng 2.4.1, I am getting about 927 cycles (9270692144 cycles for 10000000 iterations). This seems to be lot higher than the documented results. In the paper </span><a href="https://lttng.org/files/papers/desnoyers.pdf" target="_blank" style="font-family: 'Segoe UI WPC', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', Verdana, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255);">https://lttng.org/files/papers/desnoyers.pdf</a><span style="color: rgb(40, 40, 40); font-family: 'Segoe UI WPC', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', Verdana, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255);"> the
average cost of tracepoint() with older ltt-usertrace-fast tracepoint is 297 cycles. Another link </span><a href="http://lttng.org/files/thesis/desnoyers-thesis-defense-2009-12-e1.pdf" target="_blank" style="font-family: 'Segoe UI WPC', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', Verdana, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255);">http://lttng.org/files/thesis/desnoyers-thesis-defense-2009-12-e1.pdf</a><span style="color: rgb(40, 40, 40); font-family: 'Segoe UI WPC', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', Verdana, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255);"> says
cache hot tracepoint() cost is 238 cycles.</span><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 16px; background-color: rgb(255, 255, 255);"></span></span></p></div></div></blockquote><div>Indeed, this is surprising. I would expect a performance figure in the area of 500 cycles per<br></div><div>UST tracepoints on modern Intel processors with lttng-ust 2.x, using the Linux kernel<br></div><div>monotonic clock.<br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;"><div style="background-color: rgb(255, 255, 255);"><p style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;"><br style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px; background-color:rgb(255,255,255)"><span style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px; background-color:rgb(255,255,255)">I noticed that the tracepoint is making clock_gettime() and sched_getcpu() system calls.
With Linux kernel v3.6.11 and libc v2.13-38+deb7u3, I see that these system calls are not going through VDSO and hence costing more. I tried to add wrapper functions for these system calls to call __vdso_clock_gettime and __vdso_getcpu as upgrading libc was
not an option. With this change the cost of tracepoint() recording one integer dropped to 795 cycles (= 284 nsec on Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz). Still this number seems to be higher than earlier published numbers. </span></p></div></div></blockquote><div>Indeed, going through a syscall for gettime and getcpu can be a large<br></div><div>cause of performance degradation. Upgrading the libc is really recommended<br></div><div>here.<br></div><div><br></div><div>The latest published benchmarks I remember are actually a bit old (UST 0.11):<br></div><div><br></div><div>https://sourceware.org/ml/systemtap/2011-q1/msg00244.html</div><div><br></div><div>This gave 211 ns/event on a CPU Intel Xeon E5404 at 2.0GHz, for<br></div><div>422 cycles per event. I remember having seen benchmarks of more<br></div><div>recent lttng-ust 2.x around these numbers (or perhaps more around</div><div>275 ns/event).<br></div><div><br></div><div>With LTTng 2.x, we reworked the ring buffer and added features, but<br></div><div>indeed the figure of 795 cycles/event (just below 400ns/event at 2.0GHz)</div><div>is higher than expected.<br></div><div><br></div><div>One very important question: what payload are you tracing exactly ? Can<br></div><div>you create a small package with a simple benchmark program you use<br></div><div>so we can build it ourselves and try it out ?<br></div><div><br></div><div>Another thing to consider is that performance is likely limited by the<br></div><div>cache throughput, memory barrier execution and so on. Therefore,</div><div>just because your CPU is running at 2.8GHz does not mean we can<br></div><div>trace faster than with a CPU at 2.0GHz. Therefore, it might be better<br></div><div>to measure in ns per event rather than cycles per event.<br></div><div><br></div><div>Moreover, I'd be interested to see results of perf profiling of the<br></div><div>benchmark.<br></div><div><br></div><div>Thanks,<br></div><div><br></div><div>Mathieu<br></div><div><br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;"><div style="background-color: rgb(255, 255, 255);"><p style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;"><br style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px; background-color:rgb(255,255,255)"><span style="color:rgb(40,40,40); font-family:'Segoe UI WPC','Segoe UI',Tahoma,'Microsoft Sans Serif',Verdana,sans-serif; font-size:13px; background-color:rgb(255,255,255)">VBabu</span><br></p></div></div></blockquote><div><br><br></div><div><br></div><div>-- <br></div><div><span name="x"></span>Mathieu Desnoyers<br>EfficiOS Inc.<br>http://www.efficios.com<span name="x"></span><br></div></div></body></html>