[lttng-dev] Getting SIGBUS in babeltrace - occasionally
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Wed Jul 24 12:14:09 EDT 2013
* Amit Margalit (AMITM at il.ibm.com) wrote:
> Hi,
>
> Access beyond end of file seems to be the issue IIUC.
>
> Let me try to give as much context as I can and answer your questions one
> by one:
>
> Versions
>
> lttng-ust 2.2.0, lttng-tools 2.2.0, babeltrace - I used git hash
> cc26a15ac355d886bb507955fdaec9b218024713 with a patch on babeltrace.i.in.
>
> [I cannot supply the patch as this requires my employer's (IBM) approval -
> I am in the process of obtaining a permanent contribution approval, but
> until then...]
>
> However, I can tell you exactly what it does:
> 1. Adds a typedef for 'ssize_t'
> 2. Added a call to _bt_ctf_get_decl_from_def() around the argument
> supplied to _bt_ctf_get_int_len()
> 3. Changed all use of {} in string ".format()" calls to add numbers (e.g.
> {0} instead of {}) - as this doesn't work for Python 2.6.
>
> I am waiting for the contribution approval and then I'll submit a patch.
>
> To provide a trace sample, I'll need a special approval, which could take
> very long.
>
> System Setup
>
> Processor: Xeon-family 6-cores (12 if you count HT)
> RAM: 24GB of RAM, page size is 4KB.
> OS: Linux 2.6.32.12-211_11_4_0 based on an MCP (IBM's SuSE derivative)
> build.
>
> Tracing Setup
>
> Same system used to generate, collect and read the traces.
> 100% user-space.
>
> Triggering
>
> This is not readily reproducible. Some context:
>
> We have an in-house trace mechanism including a code instrumentor and a
> curses-based viewer (actually urwid+python).
> My current project is to embed LTTng logging and trace capabilites into
> several open-source projects, because we couldn't use our in-house tracer
> without having GPL issues.
> I have written a back-end that enables our in-house viewer to display
> Lttng traces, by linking with libbabeltrace (allowed by its LGPL license).
>
> As far as I can tell SIGBUS happens to me when I was doing a search. This
> literally reads the events in sequence and tries to compare some fields
> against some criteria.
>
> And here's the main reason why I think this is happening - I was adding a
> trace (using add_traces_recursive(trace_dir,'ctf')) while the session is
> active (i.e. not stopped).
>
> I can see 2 problematic scenarios here:
>
> Scenario 1
>
> When I add the trace to the context babeltrace performs mmap() according
> to the size of the file(s) at that moment.
> The files keep growing.
> When I try to access new packets they may lie beyond the end of the
> mmap'ed region. I think this should lead to SIGSEGV and not to SIGBUS.
babeltrace only mmap packet by packet, never an entire file.
>
> Scenario 2
>
> Due to buffering issues, I could be having a race with the session daemon,
> where my code reads the beginning of a packet that was already written to
> the file, and try to access the contents, which were not yet written.
> Again - could happen around the end of the file, with the session still
> active.
I think the reason is close to this scenario: I think that babeltrace
try to read a packet as it is being written to disk. So it gets the
packet header (including the expected size of the packet), and attempts
to read beyond the end of the partially-copied packet. Live streaming
feature (we're working on this for the next release of lttng) will
handle thi use-cases.
Thanks,
Mathieu
>
> Amit
>
> Amit Margalit
> IBM XIV - Storage Reinvented
> XIV-NAS Development Team
> Tel. 03-689-7774
> Fax. 03-689-7230
>
>
>
> From: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> To: Amit Margalit/Israel/IBM at IBMIL
> Cc: lttng-dev at lists.lttng.org
> Date: 07/24/2013 04:20 PM
> Subject: Re: [lttng-dev] Getting SIGBUS in babeltrace -
> occasionally
>
>
>
> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote:
> > * Amit Margalit (AMITM at il.ibm.com) wrote:
> > > Hi,
> > >
> > > I am getting occasional SIGBUS inside _aligned_integer_read() in
> > > ./formats/ctf/types/integer.c
> > >
> > > Here is the backtrace, maybe someone could take a look and suggest
> > > something:
> >
> > It looks like you're hitting the internal checks for overflow on the
> > buffer boundaries.
>
> Please forget about my previous email, I'm utterly wrong. That should
> teach me to reply before my first morning coffee ;)
>
> We indeed have internal checks for overflows in babeltrace, but those
> are not triggering SIGBUS ever, they return failure gracefully.
>
> What seems to happen in your case is documented here:
>
> mmap(2) manpage:
>
> SIGBUS Attempted access to a portion of the buffer that does not
> corre‐
> spond to the file (for example, beyond the end of the
> file,
> including the case where another process has truncated
> the
> file).
>
> I'm curious to learn which versions of babeltrace/ust/modules you are
> using, if you modified them (and how), and if you can get us a sample
> trace that triggers the issue.
>
> Also, a bit of context about your setup: is this 32-bit or 64-bit
> user-space, which OS.
>
> Hrm.
>
> By any chance, is it possible that you record your trace on a platform
> with 4kB pages, and run babeltrace on it on a platform having 64kB
> pages? Everything points me to include/babeltrace/mmap-align.h
> mmap_align_addr(). We might be mmaping beyond the end of file in this
> case.
>
> Thanks!
>
> Mathieu
>
> >
> > On which babeltrace version can you reproduce it ? Did you do any
> > modification to babeltrace on your own on top ?
> >
> > Are you getting this with a lttng-ust or lttng-modules trace ? If yes,
> > which version of the tool has generated the trace ?
> >
> > Can you provide a sample trace that triggers this issue ?
> >
> > Can you give us the detailed sequence of steps you use to reproduce ?
> >
> > Thanks,
> >
> > Mathieu
> >
> > > #0 _aligned_integer_read (definition=0x954320, ppos=0x106e028) at
> > > integer.c:81
> > > #1 ctf_integer_read (ppos=0x106e028, definition=0x954320) at
> > > integer.c:224
> > > #2 0x00007ffff070e36e in generic_rw (definition=<optimized out>,
> > > pos=0x106e028) at ../include/babeltrace/types.h:133
> > > #3 bt_struct_rw (ppos=0x106e028, definition=0x9542e0) at struct.c:56
> > > #4 0x00007ffff13f7fda in generic_rw (definition=<optimized out>,
> > > pos=0x106e028) at ../../include/babeltrace/types.h:133
> > > #5 ctf_packet_seek (stream_pos=0x106e028, index=<optimized out>,
> > > whence=<optimized out>) at ctf.c:860
> > > #6 0x00007ffff070a3f1 in seek_file_stream_by_timestamp
> > > (cfs=cfs at entry=0x106cf90,
> timestamp=timestamp at entry=1374636821498972926)
> > > at iterator.c:141
> > > #7 0x00007ffff070aa96 in seek_ctf_trace_by_timestamp
> > > (stream_heap=0x176d310, timestamp=1374636821498972926, tin=<optimized
> > > out>) at iterator.c:188
> > > #8 bt_iter_set_pos (iter=iter at entry=0xfdafa0, iter_pos=0x2bf59f0) at
> > > iterator.c:439
> > > #9 0x00007ffff1624f01 in _wrap__bt_iter_set_pos (self=<optimized
> out>,
> > > args=<optimized out>) at babeltrace_wrap.c:3805
> > > #10 0x00007ffff7b28c0c in call_function (oparg=<optimized out>,
> > > pp_stack=<optimized out>) at Python/ceval.c:3679
> > > #11 PyEval_EvalFrameEx (f=0x1c243c0, throwflag=<optimized out>) at
> > > Python/ceval.c:2370
> > > #12 0x00007ffff7b2e0d2 in PyEval_EvalCodeEx (co=0xa8e5d0,
> > > globals=<optimized out>, locals=<optimized out>, args=0x2088cc8,
> > > argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
> > > at Python/ceval.c:2942
> > > .
> > > .
> > > .
> > >
> > > Any suggestions are welcome.
> > >
> > > Amit Margalit
> > > IBM XIV - Storage Reinvented
> > > XIV-NAS Development Team
> > > Tel. 03-689-7774
> > > Fax. 03-689-7230
> > > _______________________________________________
> > > lttng-dev mailing list
> > > lttng-dev at lists.lttng.org
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >
> >
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list