[lttng-dev] snapshot issue with 2.5

Julien Desfossez jdesfossez at efficios.com
Thu Nov 13 11:45:28 EST 2014


Ok, so if you don't limit the size of the snapshot (remove the -m 1G) it
works as expected ?

Julien

On 14-11-13 10:33 AM, Anders Wallin wrote:
> bug is introduced in  68808f4e0bbb3adccff72ff9dab6ec9f3a9e6866
> 
> Regards
> 
> Anders
> 
> Anders Wallin
> 
> On Thu, Nov 13, 2014 at 10:39 AM, Anders Wallin <wallinux at gmail.com
> <mailto:wallinux at gmail.com>> wrote:
> 
> 
> 
>     Anders Wallin
> 
>     On Thu, Nov 13, 2014 at 8:43 AM, Anders Wallin <wallinux at gmail.com
>     <mailto:wallinux at gmail.com>> wrote:
> 
>         On Thu, Nov 13, 2014 at 12:11 AM, Julien Desfossez
>         <jdesfossez at efficios.com <mailto:jdesfossez at efficios.com>> wrote:
> 
>             Hi,
> 
>             On 14-11-12 05:30 PM, Jérémie Galarneau wrote:
>             > On Wed, Nov 12, 2014 at 7:39 AM, Anders Wallin
>             <wallinux at gmail.com <mailto:wallinux at gmail.com>> wrote:
>             >> Hi,
>             >>
>             >> using userspace snapshot mode in 2.5 seems to broken.
>             >>
>             >> I have small stupid application(tracetest) generating
>             1000 events
>             >>
>             >> Running the script (the below is simplified version of
>             the script)
>             >>
>             ----------------------------------------------------------------------------------
>             >> #!/bin/bash
>             >> LTTNG='lttng -n'
>             >> lttng-sessiond --no-kernel &
>             >
>             > You should use the "-d" flag to launch the sessiond as a
>             daemon. Using
>             > "&" might cause the "create" command to launch a second
>             session daemon
>             > since the first one would not have had the time to open
>             create its
>             > command sockets.
>             >
>             >>
>             >> $LTTNG create test --snapshot
>             >> $LTTNG enable-channel -u --subbuf-size 64k --num-subbuf
>             16 --overwrite ch1
>             >> $LTTNG enable-event -u -c ch1 "*"
>             >> $LTTNG start
>             >> $LTTNG list ch1
>             >>
>             >> mkdir -p snapshot/
>             >>
>             >> for i in $(seq 1 $LOOPS); do
>             >>     ./tracetest
>             >>     $LTTNG snapshot record -m 1G
>             >>
>             >>     # print no of events and first and last event
>             >>     last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1)
>             >>     babeltrace $last > snapshot/$i.bt <http://i.bt>
>             >>     cat snapshot/$i.bt <http://i.bt> | wc -l
>             >>     cat snapshot/$i.bt <http://i.bt> | head -1
>             >>     cat snapshot/$i.bt <http://i.bt> | tail -1
>             >> done
>             >>
>             >> $LTTNG stop
>             >> $LTTNG destroy $SESSION
>             >>
>             ----------------------------------------------------------------------------------
>             >>
>             >> The result of the run looks like this;
>             >> ./test.run
>             >> Error: consumer err socket second poll error
>             >> Error: Health error occurred in thread_manage_consumer
>             >> PERROR - 13:36:59.685167 [23772/23781]: bind inet:
>             Address already in use
>             >> (in lttcomm_bind_inet_sock() at inet.c:109)
>             >> Warning: Another session daemon is using this JUL port.
>             JUL support will be
>             >> deactivated to prevent interfering with the tracing.
>             >>
>             >
>             > Is there a reason why you run two session daemons? I'm
>             guessing this
>             > is related to my "& vs -d" comment above.
>             >
>             >> lttng (LTTng Trace Control) 2.5.2 - Fumisterie
>             >> BabelTrace Trace Viewer and Converter 1.2.4
>             >>         linux-vdso.so.1 =>  (0x00007fffba966000)
>             >>         liblttng-ctl.so.0 =>
>             /usr/lib/x86_64-linux-gnu/liblttng-ctl.so.0
>             >> (0x00007fbfe4b1d000)
>             >>         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
>             (0x00007fbfe4915000)
>             >>         libxml2.so.2 =>
>             /usr/lib/x86_64-linux-gnu/libxml2.so.2
>             >> (0x00007fbfe45ae000)
>             >>         libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0
>             >> (0x00007fbfe43a2000)
>             >>         libpthread.so.0 =>
>             /lib/x86_64-linux-gnu/libpthread.so.0
>             >> (0x00007fbfe4184000)
>             >>         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
>             (0x00007fbfe3dbd000)
>             >>         /lib64/ld-linux-x86-64.so.2 (0x00007fbfe4fac000)
>             >>         liburcu.so.2 =>
>             /usr/lib/x86_64-linux-gnu/liburcu.so.2
>             >> (0x00007fbfe3bb6000)
>             >>         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
>             (0x00007fbfe39b2000)
>             >>         libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1
>             (0x00007fbfe3798000)
>             >>         liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5
>             >> (0x00007fbfe3576000)
>             >>         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
>             (0x00007fbfe3270000)
>             >>
>             >> Session test created.
>             >> Default snapshot output set to:
>             >> /home/awallin/lttng-traces/test-20141112-133700
>             >> Snapshot mode set. Every channel enabled for that session
>             will be set in
>             >> overwrite mode and mmap output.
>             >> UST channel ch1 enabled for session test
>             >> UST event * created in channel ch1
>             >> Tracing started for session test
>             >> Tracing session test: [active snapshot]
>             >>     Trace path:
>             >>     Live timer interval (usec): 4294967295
>             >>
>             >> === Domain: UST global ===
>             >>
>             >> Channels:
>             >> -------------
>             >> - ch1: [enabled]
>             >>
>             >>     Attributes:
>             >>       overwrite mode: 1
>             >>       subbufers size: 4096
>             >>       number of subbufers: 16
>             >>       switch timer interval: 0
>             >>       read timer interval: 0
>             >>       trace file count: 0
>             >>       trace file size (bytes): 0
>             >>       output: mmap()
>             >>
>             >>     Events:
>             >>       * (type: tracepoint) [enabled]
>             >>
>             >> Snapshot recorded successfully for session test
>             >> 84
>             >> [13:37:00.852615950] (+?.?????????) arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr =
>             0x7FFFC6BC7000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:00.853798949] (+0.000000805) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 6 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 157
>             >> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 6 }, { my_string_field = "test",
>             my_integer_field = 927 }
>             >> [13:37:00.894839116] (+0.000000299) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 5 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 6 }, { my_string_field = "test",
>             my_integer_field = 927 }
>             >> [13:37:00.929447330] (+0.000000305) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 5 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
> 
>             Something is wrong from here:
>             >> 168
>             >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:00.972035546] (+0.000000597) arn-awallin-mint-l3 tracetest:first_tp:
>             >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:01.014060940 <tel:014060940>] (+0.000000341)
>             arn-awallin-mint-l3 tracetest:first_tp:
>             >> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:01.052555877 <tel:052555877>] (+0.000000872)
>             arn-awallin-mint-l3 tracetest:first_tp:
>             >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
>             ... to here.
> 
>             The beginning timestamp is the same, the number of events is
>             the same,
>             but the end timestamp moves. We see also this pattern below.
>             I would be really interested to see the whole snapshots when
>             it happens.
>             It seems like we are not be overwriting the beginning of the
>             ring buffer
>             in some conditions. Could you send us your tracetest program
>             and the
>             script so we can try to reproduce the problem ?
> 
>          
>         I will append it including a run with debug turned on and all
>         the snapshot files created
>          
> 
> 
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:01.014035993 <tel:014035993>] (+?.?????????)
>             arn-awallin-mint-l3 tracetest:first_tp:
>             >> { cpu_id = 7 }, { my_string_field = "test",
>             my_integer_field = 927 }
>             >> [13:37:01.098620808] (+0.000001404) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 4 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:01.097087388 <tel:097087388>] (+?.?????????)
>             arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr =
>             0x7FFF4BFFE000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 7 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:01.097087388 <tel:097087388>] (+?.?????????)
>             arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr =
>             0x7FFF4BFFE000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 7 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:01.097087388 <tel:097087388>] (+?.?????????)
>             arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr =
>             0x7FFF4BFFE000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:01.172829105] (+0.000000575) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 5 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Snapshot recorded successfully for session test
>             >> 168
>             >> [13:37:01.172021557] (+?.?????????) arn-awallin-mint-l3
>             >> ust_baddr_statedump:soinfo: { cpu_id = 7 }, { baddr =
>             0x7FFF653FE000, sopath
>             >> = "[vdso]", size = 0, mtime = -1 }
>             >> [13:37:01.220074165] (+0.000000553) arn-awallin-mint-l3
>             tracetest:first_tp:
>             >> { cpu_id = 6 }, { my_string_field = "test",
>             my_integer_field = 999 }
>             >> Waiting for data availability
>             >> Tracing stopped for session test
>             >> Session test destroyed
>             >>
>             >> the result from the run on older versions of lttng
>             creates bigger and bigger
>             >> event traces until reaching max size.
>             We see this pattern here and the max size seems to be 168
>             events, was it
>             similar in your previous tests ?
> 
>         If I'm running the same test on 2.4 or 2.3 the max number of
>         events is much bigger, like ~10000 
>         (I don't have the exact number right now, I will rebuild and run
>         the test again and get back 
>         with traces from that run) 
> 
>     For 2.4.2 the log looks like this:
>     ./test.run ./tracetest
> 
>     lttng (LTTng Trace Control) 2.4.2 - Époque Opaque
>     BabelTrace Trace Viewer and Converter 1.2.1
>     linux-vdso.so.1 =>  (0x00007fff775fe000)
>     liblttng-ctl.so.0 => /usr/local/lib/liblttng-ctl.so.0
>     (0x00002b2b016e4000)
>     libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0 (0x00002b2b0190f000)
>     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>     (0x00002b2b01b1b000)
>     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b2b01d39000)
>     librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00002b2b020ff000)
>     liburcu.so.3 => /usr/local/lib/liburcu.so.3 (0x00002b2b02307000)
>     /lib64/ld-linux-x86-64.so.2 (0x00002b2b014bf000)
> 
>     Session test created.
>     Default snapshot output set to: //lttng-traces/test-20141113-093611
>     Snapshot mode set. Every channel enabled for that session will be
>     set in overwrite mode and mmap output.
>     UST channel ch1 enabled for session test
>     UST event * created in channel ch1
>     Tracing started for session test
>     Tracing session test: [active snapshot]
>         Trace path: 
> 
>     === Domain: UST global ===
> 
>     Buffer type: per UID
> 
>     Channels:
>     -------------
>     - ch1: [enabled]
> 
>         Attributes:
>           overwrite mode: 1
>           subbufers size: 4096
>           number of subbufers: 16
>           switch timer interval: 0
>           read timer interval: 0
>           output: mmap()
> 
>         Events:
>           * (type: tracepoint) [enabled]
> 
>     Snapshot recorded successfully for session test
>     1000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:11.985857254] (+0.000000501) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     2000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.033066130 <tel:033066130>] (+0.000001663) 07f8cf84a38f
>     tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test",
>     my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     3000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.086888121 <tel:086888121>] (+0.000001273) 07f8cf84a38f
>     tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test",
>     my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     4000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.147378251] (+0.000001257) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     5000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.225584696] (+0.000000528) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     6000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.288311172] (+0.000000571) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     7000
>     [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.375789307] (+0.000001238) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     7000
>     [09:36:12.032314084] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.460756779] (+0.000001248) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     8000
>     [09:36:12.032314084] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.540464090] (+0.000001292) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
>     Snapshot recorded successfully for session test
>     9000
>     [09:36:12.032314084] (+?.?????????) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 }
>     [09:36:12.621992931] (+0.000001298) 07f8cf84a38f tracetest:first_tp:
>     { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 }
>     Waiting for data availability
>     Tracing stopped for session test
>     Session test destroyed
>      
> 
> 
>             >
>             > Unless I'm mistaken, the behavior of the snapshot mode has changed
>             > between 2.4 and 2.5.
>             >
>             > In the 2.4 series, consecutive snapshots could overlap; that is,
>             > events could be present in multiple snapshots. As of 2.5, an event is
>             > guaranteed to only appear in one snapshot as the events are considered
>             > as having been "consumed" at that point.
> 
>             Hum, no the behavior is still the same, consecutive
>             snapshots can overlap.
> 
>             Could you tell us on which architecture you are experiencing
>             the problem
>             and the number of CPUs ?
> 
>         The traces appended is from my laptop running Ubuntu 14.04 with
>         "8" cpu's
>         model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
>         I have also run this on ARM with 16 cores and a powerpc
> 
> 
>             Thanks,
> 
>             Julien
> 
> 
> 
> 



More information about the lttng-dev mailing list