[lttng-dev] snapshot data corrupted after the output buffer is filled

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Mon Nov 17 06:45:14 EST 2014


Hi Anders, 

Yes, please file a feature request against Babeltrace on 
the bug tracker pointing the this email thread and 
describing the feature. 

Thanks! 

Mathieu 

----- Original Message -----

> From: "Anders Wallin" <wallinux at gmail.com>
> To: "Mathieu Desnoyers" <mathieu.desnoyers at efficios.com>
> Cc: lttng-dev at lists.lttng.org
> Sent: Monday, November 17, 2014 10:45:37 AM
> Subject: Re: [lttng-dev] snapshot data corrupted after the output buffer is
> filled

> Thx Mathieu,

> if I'm running my test on one cpu (taskset -c 1 ./test.run) it works as
> expected and described by you.
> The babeltrace flag would be a good solution! Do you want me to file a
> bug/enhancement on this?

> Regards

> Anders Wallin

> On Sat, Nov 15, 2014 at 6:59 PM, Mathieu Desnoyers <
> mathieu.desnoyers at efficios.com > wrote:

> > > From: "Anders Wallin" < wallinux at gmail.com >
> > 
> 
> > > To: lttng-dev at lists.lttng.org
> > 
> 
> > > Sent: Friday, November 14, 2014 8:28:18 AM
> > 
> 
> > > Subject: [lttng-dev] snapshot data corrupted after the output buffer is
> > > filled
> > 
> 

> > > Hi,
> > 
> 

> > > it looks like snapshot events became faulty after the output max size is
> > > reached.
> > 
> 
> > > The next snapshot get mixed events from different buffers. I have tested
> > > it
> > > on 2.3, 2.4 and 2.5
> > 
> 
> > > and it's in all of them, but it hits earlier in 2.5 due to the problem
> > > reported in
> > 
> 
> > > http://lists.lttng.org/pipermail/lttng-dev/2014-November/023773.html
> > 
> 

> > > There should never be more then 2 HUNKS in a diff between 2 consecutive
> > > snapshots, here I got
> > 
> 
> > > 11 as the worst case.
> > 
> 

> > Hi Anders,
> 

> > This is not a bug, rather just per-cpu buffering at work.It does have some
> 
> > rather non-obvious consequences in terms of snapshot event flow.
> 

> > To understand this, we need to keep in mind that LTTng does everything
> 
> > in per-cpu buffers, and that overwrite of oldest events in buffer full
> > conditions
> 
> > happens per-cpu.
> 

> > The use-case you show below really makes sense when you start to consider
> 
> > on which CPU (and thus per-cpu buffer) the application is running.
> > Initially,
> 
> > the first app runs on CPU 6. As long as the following applications run on
> 
> > other cpus, we keep the same first event. Then when enough events are
> 
> > written in cpu 6 buffer to overwrite the first event, we see a first event
> > change.
> 
> > Then, depending on the scheduling of applications, older parts of per-cpu
> 
> > buffers are overwritten, and others are not (if there are less events in
> > those
> 
> > buffer). This means some per-cpu buffers will have events going further
> > back
> 
> > in time compared to other higher-throughput CPUs.
> 

> > This is why you see more than 2 hunks in your diff. Babeltrace merges those
> 
> > per-cpu buffers back into a single text output, and the fact that some
> > events
> 
> > have been overwritten for a time-range for high-throughput buffers, but not
> 
> > for low-throughput buffers, make it appear as if events are missing in the
> 
> > stream if your model is a single system-wide buffer, but the missing events
> 
> > make sense if you see it as a merge of per-cpu buffers.
> 

> > Our current behavior is to gather all available data when taking a
> > snapshot,
> 
> > and babeltrace behavior is to show all events available in the streams. We
> 
> > could perhaps consider a different babeltrace behavior (a flag) that would
> 
> > allow it to only show the events for the intersection of time range of all
> 
> > streams, based on the packet header timestamp begin/end. This would
> 
> > take care of hiding events that happen in low-throughput streams when
> 
> > there is no intersection with snapshotted high-throughput streams.
> 

> > And perhaps this should be the default ?
> 

> > Thanks!
> 

> > Mathieu
> 

> > > TESTCASE;
> > 
> 

> > > tracetest code:
> > 
> 
> > > ---------------------------------------------------------------
> > 
> 
> > > #include <stdio.h>
> > 
> 
> > > #include "tracetest-tp.h"
> > 
> 

> > > int main(int argc, char* argv[])
> > 
> 
> > > {
> > 
> 
> > > int i;
> > 
> 
> > > for (i = 0; i < 1000; i++) {
> > 
> 
> > > tracepoint(tracetest, first_tp, i, "test");
> > 
> 
> > > }
> > 
> 
> > > return 0;
> > 
> 
> > > }
> > 
> 
> > > ---------------------------------------------------------------
> > 
> 

> > > test script:
> > 
> 
> > > ---------------------------------------------------------------
> > 
> 
> > > #!/bin/bash
> > 
> 
> > > : ${NO_OF_SUBBUF:=6}
> > 
> 
> > > : ${SUBBUF_SIZE:=4096}
> > 
> 
> > > : ${KERNEL_TRACE:=NO}
> > 
> 
> > > : ${LOOPS:=10}
> > 
> 

> > > SESSION=test
> > 
> 
> > > CHANNEL1=ch1
> > 
> 
> > > CHANNEL2=ch2
> > 
> 

> > > CH2_SUBBUF_SIZE=$(($SUBBUF_SIZE*$NO_OF_SUBBUF))
> > 
> 

> > > LTTNG='lttng -n'
> > 
> 
> > > killall lttng-consumerd > /dev/null 2>&1
> > 
> 
> > > killall lttng-sessiond > /dev/null 2>&1
> > 
> 
> > > if [ -z $DEBUG ]; then
> > 
> 
> > > lttng-sessiond --no-kernel -d
> > 
> 
> > > else
> > 
> 
> > > lttng-sessiond -v --no-kernel --verbose-consumer -b
> > 
> 
> > > fi
> > 
> 
> > > sleep 1
> > 
> 

> > > if [ ! -z $DEBUG ]; then
> > 
> 
> > > export LTTNG_UST_DEBUG=1
> > 
> 
> > > fi
> > 
> 

> > > rm -rf $HOME/lttng-traces
> > 
> 
> > > rm snapshot/*.bt
> > 
> 

> > > $LTTNG --version
> > 
> 

> > > $LTTNG create $SESSION --snapshot
> > 
> 
> > > $LTTNG enable-channel -u --subbuf-size $SUBBUF_SIZE --num-subbuf
> > > $NO_OF_SUBBUF --overwrite $CHANNEL1
> > 
> 
> > > $LTTNG enable-channel -u --subbuf-size $CH2_SUBBUF_SIZE --num-subbuf 2
> > > --overwrite $CHANNEL2
> > 
> 
> > > $LTTNG enable-event -u -c $CHANNEL1 "*"
> > 
> 
> > > $LTTNG start
> > 
> 
> > > $LTTNG list $SESSION
> > 
> 

> > > mkdir -p snapshot/
> > 
> 

> > > h=1
> > 
> 
> > > for i in $(seq 1 $LOOPS); do
> > 
> 
> > > $APP
> > 
> 
> > > $LTTNG snapshot record
> > 
> 
> > > # print no of events and first and last event
> > 
> 
> > > last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1)
> > 
> 
> > > babeltrace $last > snapshot/$ i.bt
> > 
> 
> > > cat snapshot/$ i.bt | wc -l
> > 
> 
> > > cat snapshot/$ i.bt | head -1
> > 
> 
> > > cat snapshot/$ i.bt | tail -1
> > 
> 
> > > diff -u snapshot/$ i.bt snapshot/$ h.bt > snapshot/$i_$h.diff
> > 
> 
> > > cat snapshot/$i_$h.diff | grep "@@" | wc -l | xargs echo "NUMBER OF DIFF
> > > HUNKS = "
> > 
> 
> > > h=$i
> > 
> 
> > > done
> > 
> 

> > > $LTTNG stop
> > 
> 
> > > $LTTNG destroy $SESSION
> > 
> 

> > > ---------------------------------------------------------------
> > 
> 

> > > result from run (removed some output lines for readability)
> > 
> 
> > > ---------------------------------------------------------------
> > 
> 
> > > lttng (LTTng Trace Control) 2.4.2 - Époque Opaque
> > 
> 
> > > Snapshot mode set. Every channel enabled for that session will be set in
> > > overwrite mode and mmap output.
> > 
> 
> > > Warning: The number of subbuffers (6) is rounded to the next power of 2
> > > (8)
> > 
> 
> > > UST channel ch1 enabled for session test
> > 
> 
> > > Warning: The subbuf size (24576) is rounded to the next power of 2
> > > (32768)
> > 
> 
> > > UST channel ch2 enabled for session test
> > 
> 
> > > UST event * created in channel ch1
> > 
> 
> > > Tracing started for session test
> > 
> 
> > > Tracing session test: [active snapshot]
> > 
> 
> > > Trace path:
> > 
> 

> > > === Domain: UST global ===
> > 
> 

> > > Buffer type: per UID
> > 
> 

> > > Channels:
> > 
> 
> > > -------------
> > 
> 
> > > - ch1: [enabled]
> > 
> 

> > > Attributes:
> > 
> 
> > > overwrite mode: 1
> > 
> 
> > > subbufers size: 4096
> > 
> 
> > > number of subbufers: 8
> > 
> 
> > > switch timer interval: 0
> > 
> 
> > > read timer interval: 0
> > 
> 
> > > output: mmap()
> > 
> 

> > > Events:
> > 
> 
> > > * (type: tracepoint) [enabled]
> > 
> 
> > > - ch2: [enabled]
> > 
> 

> > > Attributes:
> > 
> 
> > > overwrite mode: 1
> > 
> 
> > > subbufers size: 32768
> > 
> 
> > > number of subbufers: 2
> > 
> 
> > > switch timer interval: 0
> > 
> 
> > > read timer interval: 0
> > 
> 
> > > output: mmap()
> > 
> 

> > > Events:
> > 
> 
> > > None
> > 
> 

> > > Snapshot recorded successfully for session test
> > 
> 
> > > 1011
> > 
> 
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.279975100] (+0.000000975) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 0
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 2022
> > 
> 
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.337703508] (+0.000001238) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 4 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 1
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 3033
> > 
> 
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.432827051] (+0.000000308) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 1
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 4044
> > 
> 
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.475784825] (+0.000000501) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 1
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 4735
> > 
> 
> > > [07:09:43.279031320] (+?.?????????) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 0 }
> > 
> 
> > > [07:09:43.549945106] (+0.000000517) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 12
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 4437
> > 
> 
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.654229030] (+0.000000436) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 3
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 4746
> > 
> 
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.747509349] (+0.000000502) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 11
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 5757
> > 
> 
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.814938128] (+0.000001371) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 4 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 1
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 6066
> > 
> 
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:43.919112586] (+0.000000990) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 11
> > 
> 
> > > Snapshot recorded successfully for session test
> > 
> 
> > > 6077
> > 
> 
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> > 
> 
> > > [07:09:44.004234933] (+0.000001213) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> > 
> 
> > > NUMBER OF DIFF HUNKS = 2
> > 
> 
> > > Waiting for data availability
> > 
> 
> > > Tracing stopped for session test
> > 
> 
> > > Session test destroyed
> > 
> 
> > > ---------------------------------------------------------------
> > 
> 

> > > Regards
> > 
> 

> > > Anders Wallin
> > 
> 

> > > _______________________________________________
> > 
> 
> > > lttng-dev mailing list
> > 
> 
> > > lttng-dev at lists.lttng.org
> > 
> 
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > 
> 

> > --
> 
> > Mathieu Desnoyers
> 
> > EfficiOS Inc.
> 
> > http://www.efficios.com
> 

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20141117/400eb655/attachment-0001.html>


More information about the lttng-dev mailing list