[lttng-dev] snapshot issue with 2.5
Julien Desfossez
jdesfossez at efficios.com
Fri Nov 14 14:58:18 EST 2014
Indeed, it's a problem, would you mind filling a bug report on
https://bugs.lttng.org/projects/lttng-tools so we can keep track of this
issue ?
Thanks,
Julien
On 14-11-13 04:40 PM, Anders Wallin wrote:
> Looks like the patch creates a design bug.
> If you have one channel enabled the max snapshot size will be equal to
> one buffer times the number of channels (seems to be 8??)
> So max AND min size will always be the same and always much smaller then
> the content of the buffers
> The size argument passed to snapshot record is never used (or if it's to
> small it will complain)
>
> A Q&D workaround to get "all" data out is to create a 2nd non-used
> channel like this;
>
> : ${NO_OF_SUBBUF:=6}
> : ${SUBBUF_SIZE:=4096}
>
> SESSION=test
> CHANNEL1=ch1
> CHANNEL2=ch2
>
> CH2_SUBBUF_SIZE=$(($SUBBUF_SIZE*$NO_OF_SUBBUF))
> .....
>
> $LTTNG create $SESSION --snapshot
> $LTTNG enable-channel -u --subbuf-size $SUBBUF_SIZE --num-subbuf
> $NO_OF_SUBBUF --overwrite $CHANNEL1
> $LTTNG enable-channel -u --subbuf-size $CH2_SUBBUF_SIZE --num-subbuf 2
> --overwrite $CHANNEL2
> $LTTNG enable-event -u -c $CHANNEL1 "*"
> ....
>
> Then I found another issue about the content of the traces, but I will
> file this under another topic
>
>
> Anders Wallin
>
> On Thu, Nov 13, 2014 at 6:41 PM, Anders Wallin <wallinux at gmail.com
> <mailto:wallinux at gmail.com>> wrote:
>
> No, I get the same problem with or w/o setting the size
>
> Anders Wallin
>
> On Thu, Nov 13, 2014 at 5:45 PM, Julien Desfossez
> <jdesfossez at efficios.com <mailto:jdesfossez at efficios.com>> wrote:
>
> Ok, so if you don't limit the size of the snapshot (remove the
> -m 1G) it
> works as expected ?
>
> Julien
>
> On 14-11-13 10:33 AM, Anders Wallin wrote:
> > bug is introduced in 68808f4e0bbb3adccff72ff9dab6ec9f3a9e6866
> >
> > Regards
> >
> > Anders
> >
> > Anders Wallin
> >
> > On Thu, Nov 13, 2014 at 10:39 AM, Anders Wallin <wallinux at gmail.com <mailto:wallinux at gmail.com>
> > <mailto:wallinux at gmail.com <mailto:wallinux at gmail.com>>> wrote:
> >
> >
> >
> > Anders Wallin
> >
> > On Thu, Nov 13, 2014 at 8:43 AM, Anders Wallin <wallinux at gmail.com <mailto:wallinux at gmail.com>
> > <mailto:wallinux at gmail.com <mailto:wallinux at gmail.com>>> wrote:
> >
> > On Thu, Nov 13, 2014 at 12:11 AM, Julien Desfossez
> > <jdesfossez at efficios.com <mailto:jdesfossez at efficios.com>
> <mailto:jdesfossez at efficios.com
> <mailto:jdesfossez at efficios.com>>> wrote:
> >
> > Hi,
> >
> > On 14-11-12 05:30 PM, Jérémie Galarneau wrote:
> > > On Wed, Nov 12, 2014 at 7:39 AM, Anders Wallin
> > <wallinux at gmail.com <mailto:wallinux at gmail.com>
> <mailto:wallinux at gmail.com <mailto:wallinux at gmail.com>>> wrote:
> > >> Hi,
> > >>
> > >> using userspace snapshot mode in 2.5 seems to
> broken.
> > >>
> > >> I have small stupid application(tracetest)
> generating
> > 1000 events
> > >>
> > >> Running the script (the below is simplified
> version of
> > the script)
> > >>
> >
> ----------------------------------------------------------------------------------
> > >> #!/bin/bash
> > >> LTTNG='lttng -n'
> > >> lttng-sessiond --no-kernel &
> > >
> > > You should use the "-d" flag to launch the
> sessiond as a
> > daemon. Using
> > > "&" might cause the "create" command to launch a
> second
> > session daemon
> > > since the first one would not have had the time
> to open
> > create its
> > > command sockets.
> > >
> > >>
> > >> $LTTNG create test --snapshot
> > >> $LTTNG enable-channel -u --subbuf-size 64k
> --num-subbuf
> > 16 --overwrite ch1
> > >> $LTTNG enable-event -u -c ch1 "*"
> > >> $LTTNG start
> > >> $LTTNG list ch1
> > >>
> > >> mkdir -p snapshot/
> > >>
> > >> for i in $(seq 1 $LOOPS); do
> > >> ./tracetest
> > >> $LTTNG snapshot record -m 1G
> > >>
> > >> # print no of events and first and last event
> > >> last=$(ls -1drt
> $HOME/lttng-traces/$SESSION*/* | tail -1)
> > >> babeltrace $last > snapshot/$i.bt
> <http://i.bt> <http://i.bt>
> > >> cat snapshot/$i.bt <http://i.bt>
> <http://i.bt> | wc -l
> > >> cat snapshot/$i.bt <http://i.bt>
> <http://i.bt> | head -1
> > >> cat snapshot/$i.bt <http://i.bt>
> <http://i.bt> | tail -1
> > >> done
> > >>
> > >> $LTTNG stop
> > >> $LTTNG destroy $SESSION
> > >>
> >
> ----------------------------------------------------------------------------------
> > >>
> > >> The result of the run looks like this;
> > >> ./test.run
> > >> Error: consumer err socket second poll error
> > >> Error: Health error occurred in
> thread_manage_consumer
> > >> PERROR - 13:36:59.685167 [23772/23781]: bind inet:
> > Address already in use
> > >> (in lttcomm_bind_inet_sock() at inet.c:109)
> > >> Warning: Another session daemon is using this
> JUL port.
> > JUL support will be
> > >> deactivated to prevent interfering with the
> tracing.
> > >>
> > >
> > > Is there a reason why you run two session
> daemons? I'm
> > guessing this
> > > is related to my "& vs -d" comment above.
> > >
> > >> lttng (LTTng Trace Control) 2.5.2 - Fumisterie
> > >> BabelTrace Trace Viewer and Converter 1.2.4
> > >> linux-vdso.so.1 => (0x00007fffba966000)
> > >> liblttng-ctl.so.0 =>
> > /usr/lib/x86_64-linux-gnu/liblttng-ctl.so.0
> > >> (0x00007fbfe4b1d000)
> > >> librt.so.1 =>
> /lib/x86_64-linux-gnu/librt.so.1
> > (0x00007fbfe4915000)
> > >> libxml2.so.2 =>
> > /usr/lib/x86_64-linux-gnu/libxml2.so.2
> > >> (0x00007fbfe45ae000)
> > >> libpopt.so.0 =>
> /lib/x86_64-linux-gnu/libpopt.so.0
> > >> (0x00007fbfe43a2000)
> > >> libpthread.so.0 =>
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > >> (0x00007fbfe4184000)
> > >> libc.so.6 =>
> /lib/x86_64-linux-gnu/libc.so.6
> > (0x00007fbfe3dbd000)
> > >> /lib64/ld-linux-x86-64.so.2
> (0x00007fbfe4fac000)
> > >> liburcu.so.2 =>
> > /usr/lib/x86_64-linux-gnu/liburcu.so.2
> > >> (0x00007fbfe3bb6000)
> > >> libdl.so.2 =>
> /lib/x86_64-linux-gnu/libdl.so.2
> > (0x00007fbfe39b2000)
> > >> libz.so.1 =>
> /lib/x86_64-linux-gnu/libz.so.1
> > (0x00007fbfe3798000)
> > >> liblzma.so.5 =>
> /lib/x86_64-linux-gnu/liblzma.so.5
> > >> (0x00007fbfe3576000)
> > >> libm.so.6 =>
> /lib/x86_64-linux-gnu/libm.so.6
> > (0x00007fbfe3270000)
> > >>
> > >> Session test created.
> > >> Default snapshot output set to:
> > >> /home/awallin/lttng-traces/test-20141112-133700
> > >> Snapshot mode set. Every channel enabled for
> that session
> > will be set in
> > >> overwrite mode and mmap output.
> > >> UST channel ch1 enabled for session test
> > >> UST event * created in channel ch1
> > >> Tracing started for session test
> > >> Tracing session test: [active snapshot]
> > >> Trace path:
> > >> Live timer interval (usec): 4294967295
> > >>
> > >> === Domain: UST global ===
> > >>
> > >> Channels:
> > >> -------------
> > >> - ch1: [enabled]
> > >>
> > >> Attributes:
> > >> overwrite mode: 1
> > >> subbufers size: 4096
> > >> number of subbufers: 16
> > >> switch timer interval: 0
> > >> read timer interval: 0
> > >> trace file count: 0
> > >> trace file size (bytes): 0
> > >> output: mmap()
> > >>
> > >> Events:
> > >> * (type: tracepoint) [enabled]
> > >>
> > >> Snapshot recorded successfully for session test
> > >> 84
> > >> [13:37:00.852615950] (+?.?????????)
> arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, {
> baddr =
> > 0x7FFFC6BC7000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:00.853798949] (+0.000000805)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 6 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 157
> > >> [13:37:00.853735195] (+?.?????????)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 6 }, { my_string_field = "test",
> > my_integer_field = 927 }
> > >> [13:37:00.894839116] (+0.000000299)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 5 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:00.853735195] (+?.?????????)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 6 }, { my_string_field = "test",
> > my_integer_field = 927 }
> > >> [13:37:00.929447330] (+0.000000305)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 5 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> >
> > Something is wrong from here:
> > >> 168
> > >> [13:37:00.894411117] (+?.?????????)
> arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, {
> baddr = 0x7FFF6BF81000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:00.972035546] (+0.000000597)
> arn-awallin-mint-l3 tracetest:first_tp:
> > >> { cpu_id = 6 }, { my_string_field = "test",
> my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:00.894411117] (+?.?????????)
> arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, {
> baddr = 0x7FFF6BF81000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:01.014060940 <tel:014060940>
> <tel:014060940 <tel:014060940>>] (+0.000000341)
> > arn-awallin-mint-l3 tracetest:first_tp:
> > >> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:01.052555877 <tel:052555877>
> <tel:052555877 <tel:052555877>>] (+0.000000872)
> > arn-awallin-mint-l3 tracetest:first_tp:
> > >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
> > ... to here.
> >
> > The beginning timestamp is the same, the number of events is
> > the same,
> > but the end timestamp moves. We see also this pattern below.
> > I would be really interested to see the whole snapshots when
> > it happens.
> > It seems like we are not be overwriting the beginning of the
> > ring buffer
> > in some conditions. Could you send us your tracetest program
> > and the
> > script so we can try to reproduce the problem ?
> >
> >
> > I will append it including a run with debug turned on and all
> > the snapshot files created
> >
> >
> >
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:01.014035993 <tel:014035993>
> <tel:014035993 <tel:014035993>>] (+?.?????????)
> > arn-awallin-mint-l3 tracetest:first_tp:
> > >> { cpu_id = 7 }, { my_string_field = "test",
> > my_integer_field = 927 }
> > >> [13:37:01.098620808] (+0.000001404) arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 4 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:01.097087388 <tel:097087388>
> <tel:097087388 <tel:097087388>>] (+?.?????????)
> > arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr =
> > 0x7FFF4BFFE000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 7 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:01.097087388 <tel:097087388>
> <tel:097087388 <tel:097087388>>] (+?.?????????)
> > arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr =
> > 0x7FFF4BFFE000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 7 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:01.097087388 <tel:097087388>
> <tel:097087388 <tel:097087388>>] (+?.?????????)
> > arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, {
> baddr =
> > 0x7FFF4BFFE000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:01.172829105] (+0.000000575)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 5 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Snapshot recorded successfully for session test
> > >> 168
> > >> [13:37:01.172021557] (+?.?????????)
> arn-awallin-mint-l3
> > >> ust_baddr_statedump:soinfo: { cpu_id = 7 }, {
> baddr =
> > 0x7FFF653FE000, sopath
> > >> = "[vdso]", size = 0, mtime = -1 }
> > >> [13:37:01.220074165] (+0.000000553)
> arn-awallin-mint-l3
> > tracetest:first_tp:
> > >> { cpu_id = 6 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > >> Waiting for data availability
> > >> Tracing stopped for session test
> > >> Session test destroyed
> > >>
> > >> the result from the run on older versions of lttng
> > creates bigger and bigger
> > >> event traces until reaching max size.
> > We see this pattern here and the max size seems to
> be 168
> > events, was it
> > similar in your previous tests ?
> >
> > If I'm running the same test on 2.4 or 2.3 the max
> number of
> > events is much bigger, like ~10000
> > (I don't have the exact number right now, I will
> rebuild and run
> > the test again and get back
> > with traces from that run)
> >
> > For 2.4.2 the log looks like this:
> > ./test.run ./tracetest
> >
> > lttng (LTTng Trace Control) 2.4.2 - Époque Opaque
> > BabelTrace Trace Viewer and Converter 1.2.1
> > linux-vdso.so.1 => (0x00007fff775fe000)
> > liblttng-ctl.so.0 => /usr/local/lib/liblttng-ctl.so.0
> > (0x00002b2b016e4000)
> > libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0
> (0x00002b2b0190f000)
> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > (0x00002b2b01b1b000)
> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> (0x00002b2b01d39000)
> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> (0x00002b2b020ff000)
> > liburcu.so.3 => /usr/local/lib/liburcu.so.3
> (0x00002b2b02307000)
> > /lib64/ld-linux-x86-64.so.2 (0x00002b2b014bf000)
> >
> > Session test created.
> > Default snapshot output set to:
> //lttng-traces/test-20141113-093611
> > Snapshot mode set. Every channel enabled for that session
> will be
> > set in overwrite mode and mmap output.
> > UST channel ch1 enabled for session test
> > UST event * created in channel ch1
> > Tracing started for session test
> > Tracing session test: [active snapshot]
> > Trace path:
> >
> > === Domain: UST global ===
> >
> > Buffer type: per UID
> >
> > Channels:
> > -------------
> > - ch1: [enabled]
> >
> > Attributes:
> > overwrite mode: 1
> > subbufers size: 4096
> > number of subbufers: 16
> > switch timer interval: 0
> > read timer interval: 0
> > output: mmap()
> >
> > Events:
> > * (type: tracepoint) [enabled]
> >
> > Snapshot recorded successfully for session test
> > 1000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:11.985857254] (+0.000000501) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 2000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.033066130 <tel:033066130> <tel:033066130
> <tel:033066130>>] (+0.000001663) 07f8cf84a38f
> > tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test",
> > my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 3000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
> > [09:36:12.086888121 <tel:086888121> <tel:086888121
> <tel:086888121>>] (+0.000001273) 07f8cf84a38f
> > tracetest:first_tp: { cpu_id = 5 }, { my_string_field =
> "test",
> > my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 4000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.147378251] (+0.000001257) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 5000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.225584696] (+0.000000528) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 6000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.288311172] (+0.000000571) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 7 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 7000
> > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.375789307] (+0.000001238) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 7000
> > [09:36:12.032314084] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 5 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.460756779] (+0.000001248) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 4 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 8000
> > [09:36:12.032314084] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 5 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.540464090] (+0.000001292) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 7 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Snapshot recorded successfully for session test
> > 9000
> > [09:36:12.032314084] (+?.?????????) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 5 }, { my_string_field = "test",
> my_integer_field = 0 }
> > [09:36:12.621992931] (+0.000001298) 07f8cf84a38f
> tracetest:first_tp:
> > { cpu_id = 5 }, { my_string_field = "test",
> my_integer_field = 999 }
> > Waiting for data availability
> > Tracing stopped for session test
> > Session test destroyed
> >
> >
> >
> > >
> > > Unless I'm mistaken, the behavior of the
> snapshot mode has changed
> > > between 2.4 and 2.5.
> > >
> > > In the 2.4 series, consecutive snapshots could
> overlap; that is,
> > > events could be present in multiple snapshots.
> As of 2.5, an event is
> > > guaranteed to only appear in one snapshot as the
> events are considered
> > > as having been "consumed" at that point.
> >
> > Hum, no the behavior is still the same, consecutive
> > snapshots can overlap.
> >
> > Could you tell us on which architecture you are
> experiencing
> > the problem
> > and the number of CPUs ?
> >
> > The traces appended is from my laptop running Ubuntu
> 14.04 with
> > "8" cpu's
> > model name : Intel(R) Core(TM) i7-2760QM CPU @
> 2.40GHz
> > I have also run this on ARM with 16 cores and a powerpc
> >
> >
> > Thanks,
> >
> > Julien
> >
> >
> >
> >
>
>
>
More information about the lttng-dev
mailing list