[lttng-dev] subbuffers count and size

Tue Jun 28 13:38:33 UTC 2016

----- On Jun 28, 2016, at 3:14 AM, Vijay Anand <vjanandr85 at gmail.com> wrote: 

> Hi Sebastien,

> I have written a simple program with a loop 1 million times adding a trace
> everytime.

1 million times is not that many events. The startup time of your 
application, and buffer setup time will not be negligible, and should 
be subtracted. 

> Please find the simple program at
> https://github.com/vjanandr/sampleC/blob/master/lttng/benchmark/lttng_benchmark.c

> I first configured the channel and then ran the benchmark program.
> More logs @ https://gist.github.com/vjanandr/1db6a6a9d93e6b3f2ac30a05aadcf06b

> Here is the observation below:

> a. default 4 subbuffers with 128K each takes 1894063 microseconds
> b. 16 subbuffers with 1024K each takes 1882392 microseconds
> b. 16 subbuffer with 2M each takes 1884999 microseconds

You vary 2 variables (at least there): sub-buffer size, and overall buffer size. 

Larger sub-buffers means you reach sub-buffer boundaries less often (faster). 
Larger buffers means you trash extra L1/L2/L3 cache on your CPU (slower). 

So the 2 variables you vary here affect performance in opposite ways. 

> on an average the programs takes about 1.8 seconds for each run.

> Is this expected ?, if I have to improve the performance should I configure
> anything different ?

You should detail your system config (kernel version, kernel configuration, 
architecture, cpu speed, amount of memory) whenever you present a benchmark, 
otherwise it is meaningless. 

You should also benchmark separately flight recorder tracing (snapshot) 
and tracing doing I/O to disk or network (discard mode). You should keep 
track of discarded event counts in your results. 

You should also benchmark separately transient state tracing (e.g. tracing 
for the first time into buffers) vs steady-state tracing (after the cache lines 
and TLB entries are hot). 

> Just that I understand this better, I have been trying to understand this from
> looking at the code. Could you please point me to the code that might be
> impacting this ?

You might want to look at those slides for methodology ideas: 

http://hsdm.dorsal.polymtl.ca/may2016_mgebai 

Thanks, 

Mathieu 

> Regards,
> Vijay

> On Mon, Jun 27, 2016 at 8:54 PM, Sebastien Boisvert < sboisvert at gydle.com >
> wrote:

>> Hi Vijay,

>> On 06/27/2016 06:12 AM, Vijay Anand wrote:
>> > Hi,

>>> I have been trying to understand the impact of performance while logging user
>> > space program based on the subbuffer count and size.

>> The sub-buffers count and size is well-documented here:

>> http://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count

>> > I have a simple program logging 1 million traces

>> Do you mean that you recorded 1 million UST events ?

>> Or do you mean that you traced your app 1 million times and you generated 1
>> million traces ?

>>> and I dont seem to see any appreciable performance between the below two
>> > configurations.

>> > a. default 4 subbufers 128KB each
>> > b. 16 subbufers with 1024kB each.

>> When you are enabling your UST events (before starting the LTTng session),
>> are you enabling your UST events in your custom channel (the one with
>> configuration a. or configuration b.) ?

>> If not, it could explain the lack of difference in performance between your 2
>> configurations.

>> Example:

>> lttng create

>> lttng enable-channel -u --subbuf-size 4M channel7
>> lttng enable-event -u -c channel7 gydle_om:Allocator_constructor_default

>> lttng start

>> run-your-app

>> lttng stop
>> lttng view > trace.txt

>>> Moreover I have been looking at the lttng-ust code to understand the performance
>> > impacts from the implementation and unable to comprehend what is documented at

>> > http://lttng.org/docs/#doc-channel-subbuf-size-vs-subbuf-count

>>> I see that channel_backend_init, _shm_object_table_alloc_shm seems to allocate
>>> one big shared memory chunk which again is subdivided into subbuffers with each
>> > subbuffer referenced using the subbufer index.

>>> Further more lib_ring_buffer_write seems to find the subbuffer index and write
>> > "len" number of bytes into the subbuffer index.

>>> Could anyone please enlighten me what is the overhead involved while switching
>> > from one subbufer to another as documented.

>> The documentation indicates that the tracer's CPU overhead is caused by 2
>> things:

>> 1) marking the current sub-buffer as consumable, and
>> 2) switching to an empty sub-buffer.

>> > Regards,
>> > Vijay

>> > _______________________________________________
>> > lttng-dev mailing list
>> > lttng-dev at lists.lttng.org
>> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20160628/c1abdccd/attachment.html>