[lttng-dev] snapshot data corrupted after the output buffer is filled
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Mon Nov 17 06:45:14 EST 2014
Hi Anders,
Yes, please file a feature request against Babeltrace on
the bug tracker pointing the this email thread and
describing the feature.
Thanks!
Mathieu
----- Original Message -----
> From: "Anders Wallin" <wallinux at gmail.com>
> To: "Mathieu Desnoyers" <mathieu.desnoyers at efficios.com>
> Cc: lttng-dev at lists.lttng.org
> Sent: Monday, November 17, 2014 10:45:37 AM
> Subject: Re: [lttng-dev] snapshot data corrupted after the output buffer is
> filled
> Thx Mathieu,
> if I'm running my test on one cpu (taskset -c 1 ./test.run) it works as
> expected and described by you.
> The babeltrace flag would be a good solution! Do you want me to file a
> bug/enhancement on this?
> Regards
> Anders Wallin
> On Sat, Nov 15, 2014 at 6:59 PM, Mathieu Desnoyers <
> mathieu.desnoyers at efficios.com > wrote:
> > > From: "Anders Wallin" < wallinux at gmail.com >
> >
>
> > > To: lttng-dev at lists.lttng.org
> >
>
> > > Sent: Friday, November 14, 2014 8:28:18 AM
> >
>
> > > Subject: [lttng-dev] snapshot data corrupted after the output buffer is
> > > filled
> >
>
> > > Hi,
> >
>
> > > it looks like snapshot events became faulty after the output max size is
> > > reached.
> >
>
> > > The next snapshot get mixed events from different buffers. I have tested
> > > it
> > > on 2.3, 2.4 and 2.5
> >
>
> > > and it's in all of them, but it hits earlier in 2.5 due to the problem
> > > reported in
> >
>
> > > http://lists.lttng.org/pipermail/lttng-dev/2014-November/023773.html
> >
>
> > > There should never be more then 2 HUNKS in a diff between 2 consecutive
> > > snapshots, here I got
> >
>
> > > 11 as the worst case.
> >
>
> > Hi Anders,
>
> > This is not a bug, rather just per-cpu buffering at work.It does have some
>
> > rather non-obvious consequences in terms of snapshot event flow.
>
> > To understand this, we need to keep in mind that LTTng does everything
>
> > in per-cpu buffers, and that overwrite of oldest events in buffer full
> > conditions
>
> > happens per-cpu.
>
> > The use-case you show below really makes sense when you start to consider
>
> > on which CPU (and thus per-cpu buffer) the application is running.
> > Initially,
>
> > the first app runs on CPU 6. As long as the following applications run on
>
> > other cpus, we keep the same first event. Then when enough events are
>
> > written in cpu 6 buffer to overwrite the first event, we see a first event
> > change.
>
> > Then, depending on the scheduling of applications, older parts of per-cpu
>
> > buffers are overwritten, and others are not (if there are less events in
> > those
>
> > buffer). This means some per-cpu buffers will have events going further
> > back
>
> > in time compared to other higher-throughput CPUs.
>
> > This is why you see more than 2 hunks in your diff. Babeltrace merges those
>
> > per-cpu buffers back into a single text output, and the fact that some
> > events
>
> > have been overwritten for a time-range for high-throughput buffers, but not
>
> > for low-throughput buffers, make it appear as if events are missing in the
>
> > stream if your model is a single system-wide buffer, but the missing events
>
> > make sense if you see it as a merge of per-cpu buffers.
>
> > Our current behavior is to gather all available data when taking a
> > snapshot,
>
> > and babeltrace behavior is to show all events available in the streams. We
>
> > could perhaps consider a different babeltrace behavior (a flag) that would
>
> > allow it to only show the events for the intersection of time range of all
>
> > streams, based on the packet header timestamp begin/end. This would
>
> > take care of hiding events that happen in low-throughput streams when
>
> > there is no intersection with snapshotted high-throughput streams.
>
> > And perhaps this should be the default ?
>
> > Thanks!
>
> > Mathieu
>
> > > TESTCASE;
> >
>
> > > tracetest code:
> >
>
> > > ---------------------------------------------------------------
> >
>
> > > #include <stdio.h>
> >
>
> > > #include "tracetest-tp.h"
> >
>
> > > int main(int argc, char* argv[])
> >
>
> > > {
> >
>
> > > int i;
> >
>
> > > for (i = 0; i < 1000; i++) {
> >
>
> > > tracepoint(tracetest, first_tp, i, "test");
> >
>
> > > }
> >
>
> > > return 0;
> >
>
> > > }
> >
>
> > > ---------------------------------------------------------------
> >
>
> > > test script:
> >
>
> > > ---------------------------------------------------------------
> >
>
> > > #!/bin/bash
> >
>
> > > : ${NO_OF_SUBBUF:=6}
> >
>
> > > : ${SUBBUF_SIZE:=4096}
> >
>
> > > : ${KERNEL_TRACE:=NO}
> >
>
> > > : ${LOOPS:=10}
> >
>
> > > SESSION=test
> >
>
> > > CHANNEL1=ch1
> >
>
> > > CHANNEL2=ch2
> >
>
> > > CH2_SUBBUF_SIZE=$(($SUBBUF_SIZE*$NO_OF_SUBBUF))
> >
>
> > > LTTNG='lttng -n'
> >
>
> > > killall lttng-consumerd > /dev/null 2>&1
> >
>
> > > killall lttng-sessiond > /dev/null 2>&1
> >
>
> > > if [ -z $DEBUG ]; then
> >
>
> > > lttng-sessiond --no-kernel -d
> >
>
> > > else
> >
>
> > > lttng-sessiond -v --no-kernel --verbose-consumer -b
> >
>
> > > fi
> >
>
> > > sleep 1
> >
>
> > > if [ ! -z $DEBUG ]; then
> >
>
> > > export LTTNG_UST_DEBUG=1
> >
>
> > > fi
> >
>
> > > rm -rf $HOME/lttng-traces
> >
>
> > > rm snapshot/*.bt
> >
>
> > > $LTTNG --version
> >
>
> > > $LTTNG create $SESSION --snapshot
> >
>
> > > $LTTNG enable-channel -u --subbuf-size $SUBBUF_SIZE --num-subbuf
> > > $NO_OF_SUBBUF --overwrite $CHANNEL1
> >
>
> > > $LTTNG enable-channel -u --subbuf-size $CH2_SUBBUF_SIZE --num-subbuf 2
> > > --overwrite $CHANNEL2
> >
>
> > > $LTTNG enable-event -u -c $CHANNEL1 "*"
> >
>
> > > $LTTNG start
> >
>
> > > $LTTNG list $SESSION
> >
>
> > > mkdir -p snapshot/
> >
>
> > > h=1
> >
>
> > > for i in $(seq 1 $LOOPS); do
> >
>
> > > $APP
> >
>
> > > $LTTNG snapshot record
> >
>
> > > # print no of events and first and last event
> >
>
> > > last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1)
> >
>
> > > babeltrace $last > snapshot/$ i.bt
> >
>
> > > cat snapshot/$ i.bt | wc -l
> >
>
> > > cat snapshot/$ i.bt | head -1
> >
>
> > > cat snapshot/$ i.bt | tail -1
> >
>
> > > diff -u snapshot/$ i.bt snapshot/$ h.bt > snapshot/$i_$h.diff
> >
>
> > > cat snapshot/$i_$h.diff | grep "@@" | wc -l | xargs echo "NUMBER OF DIFF
> > > HUNKS = "
> >
>
> > > h=$i
> >
>
> > > done
> >
>
> > > $LTTNG stop
> >
>
> > > $LTTNG destroy $SESSION
> >
>
> > > ---------------------------------------------------------------
> >
>
> > > result from run (removed some output lines for readability)
> >
>
> > > ---------------------------------------------------------------
> >
>
> > > lttng (LTTng Trace Control) 2.4.2 - Époque Opaque
> >
>
> > > Snapshot mode set. Every channel enabled for that session will be set in
> > > overwrite mode and mmap output.
> >
>
> > > Warning: The number of subbuffers (6) is rounded to the next power of 2
> > > (8)
> >
>
> > > UST channel ch1 enabled for session test
> >
>
> > > Warning: The subbuf size (24576) is rounded to the next power of 2
> > > (32768)
> >
>
> > > UST channel ch2 enabled for session test
> >
>
> > > UST event * created in channel ch1
> >
>
> > > Tracing started for session test
> >
>
> > > Tracing session test: [active snapshot]
> >
>
> > > Trace path:
> >
>
> > > === Domain: UST global ===
> >
>
> > > Buffer type: per UID
> >
>
> > > Channels:
> >
>
> > > -------------
> >
>
> > > - ch1: [enabled]
> >
>
> > > Attributes:
> >
>
> > > overwrite mode: 1
> >
>
> > > subbufers size: 4096
> >
>
> > > number of subbufers: 8
> >
>
> > > switch timer interval: 0
> >
>
> > > read timer interval: 0
> >
>
> > > output: mmap()
> >
>
> > > Events:
> >
>
> > > * (type: tracepoint) [enabled]
> >
>
> > > - ch2: [enabled]
> >
>
> > > Attributes:
> >
>
> > > overwrite mode: 1
> >
>
> > > subbufers size: 32768
> >
>
> > > number of subbufers: 2
> >
>
> > > switch timer interval: 0
> >
>
> > > read timer interval: 0
> >
>
> > > output: mmap()
> >
>
> > > Events:
> >
>
> > > None
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 1011
> >
>
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.279975100] (+0.000000975) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 0
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 2022
> >
>
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.337703508] (+0.000001238) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 4 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 1
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 3033
> >
>
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.432827051] (+0.000000308) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 1
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 4044
> >
>
> > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.475784825] (+0.000000501) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 1
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 4735
> >
>
> > > [07:09:43.279031320] (+?.?????????) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 0 }
> >
>
> > > [07:09:43.549945106] (+0.000000517) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 12
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 4437
> >
>
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.654229030] (+0.000000436) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 5 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 3
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 4746
> >
>
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.747509349] (+0.000000502) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 11
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 5757
> >
>
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.814938128] (+0.000001371) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 4 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 1
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 6066
> >
>
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:43.919112586] (+0.000000990) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 11
> >
>
> > > Snapshot recorded successfully for session test
> >
>
> > > 6077
> >
>
> > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f
> > > ust_baddr_statedump:soinfo:
> > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0,
> > > mtime
> > > = -1 }
> >
>
> > > [07:09:44.004234933] (+0.000001213) 07f8cf84a38f tracetest:first_tp: {
> > > cpu_id
> > > = 6 }, { my_string_field = "test", my_integer_field = 999 }
> >
>
> > > NUMBER OF DIFF HUNKS = 2
> >
>
> > > Waiting for data availability
> >
>
> > > Tracing stopped for session test
> >
>
> > > Session test destroyed
> >
>
> > > ---------------------------------------------------------------
> >
>
> > > Regards
> >
>
> > > Anders Wallin
> >
>
> > > _______________________________________________
> >
>
> > > lttng-dev mailing list
> >
>
> > > lttng-dev at lists.lttng.org
> >
>
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >
>
> > --
>
> > Mathieu Desnoyers
>
> > EfficiOS Inc.
>
> > http://www.efficios.com
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20141117/400eb655/attachment-0001.html>
More information about the lttng-dev
mailing list