[lttng-dev] snapshot issue with 2.5

Jérémie Galarneau jeremie.galarneau at efficios.com
Wed Nov 12 17:30:33 EST 2014


On Wed, Nov 12, 2014 at 7:39 AM, Anders Wallin <wallinux at gmail.com> wrote:
> Hi,
>
> using userspace snapshot mode in 2.5 seems to broken.
>
> I have small stupid application(tracetest) generating 1000 events
>
> Running the script (the below is simplified version of the script)
> ----------------------------------------------------------------------------------
> #!/bin/bash
> LTTNG='lttng -n'
> lttng-sessiond --no-kernel &

You should use the "-d" flag to launch the sessiond as a daemon. Using
"&" might cause the "create" command to launch a second session daemon
since the first one would not have had the time to open create its
command sockets.

>
> $LTTNG create test --snapshot
> $LTTNG enable-channel -u --subbuf-size 64k --num-subbuf 16 --overwrite ch1
> $LTTNG enable-event -u -c ch1 "*"
> $LTTNG start
> $LTTNG list ch1
>
> mkdir -p snapshot/
>
> for i in $(seq 1 $LOOPS); do
>     ./tracetest
>     $LTTNG snapshot record -m 1G
>
>     # print no of events and first and last event
>     last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1)
>     babeltrace $last > snapshot/$i.bt
>     cat snapshot/$i.bt | wc -l
>     cat snapshot/$i.bt | head -1
>     cat snapshot/$i.bt | tail -1
> done
>
> $LTTNG stop
> $LTTNG destroy $SESSION
> ----------------------------------------------------------------------------------
>
> The result of the run looks like this;
> ./test.run
> Error: consumer err socket second poll error
> Error: Health error occurred in thread_manage_consumer
> PERROR - 13:36:59.685167 [23772/23781]: bind inet: Address already in use
> (in lttcomm_bind_inet_sock() at inet.c:109)
> Warning: Another session daemon is using this JUL port. JUL support will be
> deactivated to prevent interfering with the tracing.
>

Is there a reason why you run two session daemons? I'm guessing this
is related to my "& vs -d" comment above.

> lttng (LTTng Trace Control) 2.5.2 - Fumisterie
> BabelTrace Trace Viewer and Converter 1.2.4
>         linux-vdso.so.1 =>  (0x00007fffba966000)
>         liblttng-ctl.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ctl.so.0
> (0x00007fbfe4b1d000)
>         librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbfe4915000)
>         libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2
> (0x00007fbfe45ae000)
>         libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0
> (0x00007fbfe43a2000)
>         libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> (0x00007fbfe4184000)
>         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbfe3dbd000)
>         /lib64/ld-linux-x86-64.so.2 (0x00007fbfe4fac000)
>         liburcu.so.2 => /usr/lib/x86_64-linux-gnu/liburcu.so.2
> (0x00007fbfe3bb6000)
>         libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbfe39b2000)
>         libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fbfe3798000)
>         liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5
> (0x00007fbfe3576000)
>         libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbfe3270000)
>
> Session test created.
> Default snapshot output set to:
> /home/awallin/lttng-traces/test-20141112-133700
> Snapshot mode set. Every channel enabled for that session will be set in
> overwrite mode and mmap output.
> UST channel ch1 enabled for session test
> UST event * created in channel ch1
> Tracing started for session test
> Tracing session test: [active snapshot]
>     Trace path:
>     Live timer interval (usec): 4294967295
>
> === Domain: UST global ===
>
> Channels:
> -------------
> - ch1: [enabled]
>
>     Attributes:
>       overwrite mode: 1
>       subbufers size: 4096
>       number of subbufers: 16
>       switch timer interval: 0
>       read timer interval: 0
>       trace file count: 0
>       trace file size (bytes): 0
>       output: mmap()
>
>     Events:
>       * (type: tracepoint) [enabled]
>
> Snapshot recorded successfully for session test
> 84
> [13:37:00.852615950] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFFC6BC7000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:00.853798949] (+0.000000805) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 157
> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 927 }
> [13:37:00.894839116] (+0.000000299) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 927 }
> [13:37:00.929447330] (+0.000000305) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:00.972035546] (+0.000000597) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:01.014060940] (+0.000000341) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:01.052555877] (+0.000000872) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:01.014035993] (+?.?????????) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 927 }
> [13:37:01.098620808] (+0.000001404) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:01.172829105] (+0.000000575) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 }
> Snapshot recorded successfully for session test
> 168
> [13:37:01.172021557] (+?.?????????) arn-awallin-mint-l3
> ust_baddr_statedump:soinfo: { cpu_id = 7 }, { baddr = 0x7FFF653FE000, sopath
> = "[vdso]", size = 0, mtime = -1 }
> [13:37:01.220074165] (+0.000000553) arn-awallin-mint-l3 tracetest:first_tp:
> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 }
> Waiting for data availability
> Tracing stopped for session test
> Session test destroyed
>
> the result from the run on older versions of lttng creates bigger and bigger
> event traces until reaching max size.
>

Unless I'm mistaken, the behavior of the snapshot mode has changed
between 2.4 and 2.5.

In the 2.4 series, consecutive snapshots could overlap; that is,
events could be present in multiple snapshots. As of 2.5, an event is
guaranteed to only appear in one snapshot as the events are considered
as having been "consumed" at that point.

Julien (cc'ed) might be able to confirm this.

Regards,
Jérémie

> I have tested this on different architectures, with different buffer sizes
> and number of buffer
>
> Regards
> Anders Wallin
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list