<html><body><div style="font-family: arial, helvetica, sans-serif; font-size: 12pt; color: #000000"><div><span id="zwchr" data-marker="__DIVIDER__">----- On Jun 28, 2016, at 3:14 AM, Vijay Anand <vjanandr85@gmail.com> wrote:<br></span></div><div data-marker="__QUOTED_TEXT__"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">Hi Sebastien,</span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">I have written a simple program with a loop 1 million times adding a trace everytime.</span></div></div></blockquote><div><br></div><div>1 million times is not that many events. The startup time of your<br data-mce-bogus="1"></div><div>application, and buffer setup time will not be negligible, and should</div><div>be subtracted.<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">Please find the simple program at <a href="https://github.com/vjanandr/sampleC/blob/master/lttng/benchmark/lttng_benchmark.c" target="_blank">https://github.com/vjanandr/sampleC/blob/master/lttng/benchmark/lttng_benchmark.c</a></span><br data-mce-bogus="1"></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">I first configured the channel and then ran the benchmark program.</span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">More logs @ <a href="https://gist.github.com/vjanandr/1db6a6a9d93e6b3f2ac30a05aadcf06b" target="_blank">https://gist.github.com/vjanandr/1db6a6a9d93e6b3f2ac30a05aadcf06b</a></span><br data-mce-bogus="1"></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">Here is the observation below:</span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">a. default 4 subbuffers with 128K each takes 1894063 microseconds</span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">b. 16 subbuffers with 1024K each takes 1882392 microseconds</span></div><div class="gmail_default" style=""><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><span style="font-family: verdana,sans-serif;" data-mce-style="font-family: verdana,sans-serif;" face="verdana, sans-serif">b. 16 subbuffer with 2M each takes 1884999 microseconds</span></span></div></div></blockquote><div><br></div><div>You vary 2 variables (at least there): sub-buffer size, and overall buffer size.<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Larger sub-buffers means you reach sub-buffer boundaries less often (faster).<br data-mce-bogus="1"></div><div>Larger buffers means you trash extra L1/L2/L3 cache on your CPU (slower).<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>So the 2 variables you vary here affect performance in opposite ways.<br data-mce-bogus="1"></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_default" style=""><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">on an average the programs takes about 1.8 seconds for each run.</span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">Is this expected ?, if I have to improve the performance should I configure anything different ?</span></div></div></blockquote><div><br></div><div>You should detail your system config (kernel version, kernel configuration,<br data-mce-bogus="1"></div><div>architecture, cpu speed, amount of memory) whenever you present a benchmark,<br data-mce-bogus="1"></div><div>otherwise it is meaningless.<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>You should also benchmark separately flight recorder tracing (snapshot)<br data-mce-bogus="1"></div><div>and tracing doing I/O to disk or network (discard mode). You should keep<br data-mce-bogus="1"></div><div>track of discarded event counts in your results.<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>You should also benchmark separately transient state tracing (e.g. tracing<br data-mce-bogus="1"></div><div>for the first time into buffers) vs steady-state tracing (after the cache lines<br data-mce-bogus="1"></div><div>and TLB entries are hot).<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000"><br></span></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><span style="color: #000000;" data-mce-style="color: #000000;" color="#000000">Just that I understand this better, I have been trying to understand this from looking at the code. Could you please point me to the code that might be impacting this ?</span></div></div></blockquote><div><br></div><div>You might want to look at those slides for methodology ideas:<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>http://hsdm.dorsal.polymtl.ca/may2016_mgebai</div><div><br data-mce-bogus="1"></div><div>Thanks,<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Mathieu<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr"><span style="color: #0000ff; font-family: verdana,sans-serif;" data-mce-style="color: #0000ff; font-family: verdana,sans-serif;" color="#0000ff" face="verdana, sans-serif">Regards,</span><div><span style="color: #0000ff; font-family: verdana,sans-serif;" data-mce-style="color: #0000ff; font-family: verdana,sans-serif;" color="#0000ff" face="verdana, sans-serif">Vijay</span></div></div></div></div><br><div class="gmail_quote">On Mon, Jun 27, 2016 at 8:54 PM, Sebastien Boisvert <span dir="ltr"><<a href="mailto:sboisvert@gydle.com" target="_blank">sboisvert@gydle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Vijay,<br><span class=""><br>
<br>
On 06/27/2016 06:12 AM, Vijay Anand wrote:<br>
> Hi,<br>
><br>
> I have been trying to understand the impact of performance while logging user space program based on the subbuffer count and size.<br>
><br>
<br>
</span>The sub-buffers count and size is well-documented here:<br><br><a href="http://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count" rel="noreferrer" target="_blank">http://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count</a><br><span class=""><br>
<br>
<br>
> I have a simple program logging 1 million traces<br>
<br>
</span>Do you mean that you recorded 1 million UST events ?<br><br>
Or do you mean that you traced your app 1 million times and you generated 1 million traces ?<br><span class=""><br>
<br>
> and I dont seem to see any appreciable performance between the below two configurations.<br>
><br>
> a. default 4 subbufers 128KB each<br>
> b. 16 subbufers with 1024kB each.<br>
<br>
</span>When you are enabling your UST events (before starting the LTTng session),<br>
are you enabling your UST events in your custom channel (the one with configuration a. or configuration b.) ?<br><br><br>
If not, it could explain the lack of difference in performance between your 2 configurations.<br><br><br>
Example:<br><br>
lttng create<br><br>
lttng enable-channel -u --subbuf-size 4M channel7<br>
lttng enable-event -u -c channel7 gydle_om:Allocator_constructor_default<br><br>
lttng start<br><br>
run-your-app<br><br>
lttng stop<br>
lttng view > trace.txt<br><span class=""><br>
<br>
<br>
><br>
> Moreover I have been looking at the lttng-ust code to understand the performance impacts from the implementation and unable to comprehend what is documented at<br>
><br>
> <a href="http://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count" rel="noreferrer" target="_blank">http://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count</a><br>
><br>
> I see that channel_backend_init, _shm_object_table_alloc_shm seems to allocate one big shared memory chunk which again is subdivided into subbuffers with each subbuffer referenced using the subbufer index.<br>
><br>
> Further more lib_ring_buffer_write seems to find the subbuffer index and write "len" number of bytes into the subbuffer index.<br>
><br>
> Could anyone please enlighten me what is the overhead involved while switching from one subbufer to another as documented.<br>
><br>
<br>
</span>The documentation indicates that the tracer's CPU overhead is caused by 2 things:<br><br>
1) marking the current sub-buffer as consumable, and<br>
2) switching to an empty sub-buffer.<br><br><br><br><br>
> Regards,<br>
> Vijay<br>
><br>
><br>
> _______________________________________________<br>
> lttng-dev mailing list<br>
> <a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a><br>
> <a href="https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" rel="noreferrer" target="_blank">https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><br>
><br></blockquote></div><br></div><br>_______________________________________________<br>lttng-dev mailing list<br>lttng-dev@lists.lttng.org<br>https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev<br></blockquote></div><div><br></div><div data-marker="__SIG_POST__">-- <br></div><div>Mathieu Desnoyers<br>EfficiOS Inc.<br>http://www.efficios.com</div></div></body></html>