<font size=2 face="sans-serif">Well, the live-streaming feature is more
than we need.</font>
<br>
<br><font size=2 face="sans-serif">We don't require the ability to sit
and watch events as they appear in the trace file.</font>
<br>
<br><font size=2 face="sans-serif">Is there a way to determine the "latest
timestamp that is safe to access" ?</font>
<br>
<br><font size=2 face="sans-serif">Currently, I am using get_timestamp_end()
on the trace handle and then I try to read the events at that point, and
if I fail, I simply try earlier times until I succeed in reading an event.</font>
<br>
<br><font size=2 face="sans-serif">Thanks,</font>
<br>
<br><font size=2 color=#000080 face="sans-serif">Amit Margalit</font>
<br><font size=2 color=#808000 face="sans-serif">IBM XIV </font><font size=2 face="sans-serif">-
<i>Storage Reinvented</i></font>
<br><font size=2 face="sans-serif">XIV-NAS Development Team</font>
<br><font size=2 face="sans-serif">Tel. 03</font><font size=2 face="Arial">-689-7774</font>
<br><font size=2 face="Arial">Fax. 03-689-7230</font>
<br>
<br>
<br>
<br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">Mathieu Desnoyers <mathieu.desnoyers@efficios.com></font>
<br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">Amit Margalit/Israel/IBM@IBMIL</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Cc:
</font><font size=1 face="sans-serif">lttng-dev@lists.lttng.org</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">07/24/2013 07:14 PM</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">Re: [lttng-dev]
Getting SIGBUS in babeltrace - occasionally</font>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>* Amit Margalit (AMITM@il.ibm.com) wrote:<br>
> Hi,<br>
> <br>
> Access beyond end of file seems to be the issue IIUC.<br>
> <br>
> Let me try to give as much context as I can and answer your questions
one <br>
> by one:<br>
> <br>
> Versions<br>
> <br>
> lttng-ust 2.2.0, lttng-tools 2.2.0, babeltrace - I used git hash <br>
> cc26a15ac355d886bb507955fdaec9b218024713 with a patch on babeltrace.i.in.<br>
> <br>
> [I cannot supply the patch as this requires my employer's (IBM) approval
- <br>
> I am in the process of obtaining a permanent contribution approval,
but <br>
> until then...]<br>
> <br>
> However, I can tell you exactly what it does:<br>
> 1. Adds a typedef for 'ssize_t'<br>
> 2. Added a call to _bt_ctf_get_decl_from_def() around the argument
<br>
> supplied to _bt_ctf_get_int_len()<br>
> 3. Changed all use of {} in string ".format()" calls to
add numbers (e.g. <br>
> {0} instead of {}) - as this doesn't work for Python 2.6.<br>
> <br>
> I am waiting for the contribution approval and then I'll submit a
patch.<br>
> <br>
> To provide a trace sample, I'll need a special approval, which could
take <br>
> very long.<br>
> <br>
> System Setup<br>
> <br>
> Processor: Xeon-family 6-cores (12 if you count HT)<br>
> RAM: 24GB of RAM, page size is 4KB.<br>
> OS: Linux 2.6.32.12-211_11_4_0 based on an MCP (IBM's SuSE derivative)
<br>
> build.<br>
> <br>
> Tracing Setup<br>
> <br>
> Same system used to generate, collect and read the traces.<br>
> 100% user-space.<br>
> <br>
> Triggering<br>
> <br>
> This is not readily reproducible. Some context:<br>
> <br>
> We have an in-house trace mechanism including a code instrumentor
and a <br>
> curses-based viewer (actually urwid+python).<br>
> My current project is to embed LTTng logging and trace capabilites
into <br>
> several open-source projects, because we couldn't use our in-house
tracer <br>
> without having GPL issues.<br>
> I have written a back-end that enables our in-house viewer to display
<br>
> Lttng traces, by linking with libbabeltrace (allowed by its LGPL license).<br>
> <br>
> As far as I can tell SIGBUS happens to me when I was doing a search.
This <br>
> literally reads the events in sequence and tries to compare some fields
<br>
> against some criteria.<br>
> <br>
> And here's the main reason why I think this is happening - I was adding
a <br>
> trace (using add_traces_recursive(trace_dir,'ctf')) while the session
is <br>
> active (i.e. not stopped).<br>
> <br>
> I can see 2 problematic scenarios here:<br>
> <br>
> Scenario 1<br>
> <br>
> When I add the trace to the context babeltrace performs mmap() according
<br>
> to the size of the file(s) at that moment.<br>
> The files keep growing.<br>
> When I try to access new packets they may lie beyond the end of the
<br>
> mmap'ed region. I think this should lead to SIGSEGV and not to SIGBUS.<br>
<br>
babeltrace only mmap packet by packet, never an entire file.<br>
<br>
> <br>
> Scenario 2<br>
> <br>
> Due to buffering issues, I could be having a race with the session
daemon, <br>
> where my code reads the beginning of a packet that was already written
to <br>
> the file, and try to access the contents, which were not yet written.
<br>
> Again - could happen around the end of the file, with the session
still <br>
> active.<br>
<br>
I think the reason is close to this scenario: I think that babeltrace<br>
try to read a packet as it is being written to disk. So it gets the<br>
packet header (including the expected size of the packet), and attempts<br>
to read beyond the end of the partially-copied packet. Live streaming<br>
feature (we're working on this for the next release of lttng) will<br>
handle thi use-cases.<br>
<br>
Thanks,<br>
<br>
Mathieu<br>
<br>
<br>
> <br>
> Amit<br>
> <br>
> Amit Margalit<br>
> IBM XIV - Storage Reinvented<br>
> XIV-NAS Development Team<br>
> Tel. 03-689-7774<br>
> Fax. 03-689-7230<br>
> <br>
> <br>
> <br>
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com><br>
> To: Amit Margalit/Israel/IBM@IBMIL<br>
> Cc: lttng-dev@lists.lttng.org<br>
> Date: 07/24/2013 04:20 PM<br>
> Subject: Re: [lttng-dev] Getting SIGBUS
in babeltrace - <br>
> occasionally<br>
> <br>
> <br>
> <br>
> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:<br>
> > * Amit Margalit (AMITM@il.ibm.com) wrote:<br>
> > > Hi,<br>
> > > <br>
> > > I am getting occasional SIGBUS inside _aligned_integer_read()
in <br>
> > > ./formats/ctf/types/integer.c<br>
> > > <br>
> > > Here is the backtrace, maybe someone could take a look and
suggest <br>
> > > something:<br>
> > <br>
> > It looks like you're hitting the internal checks for overflow
on the<br>
> > buffer boundaries.<br>
> <br>
> Please forget about my previous email, I'm utterly wrong. That should<br>
> teach me to reply before my first morning coffee ;)<br>
> <br>
> We indeed have internal checks for overflows in babeltrace, but those<br>
> are not triggering SIGBUS ever, they return failure gracefully.<br>
> <br>
> What seems to happen in your case is documented here:<br>
> <br>
> mmap(2) manpage:<br>
> <br>
> SIGBUS Attempted access to a portion of
the buffer that does not <br>
> corre$B!>(B<br>
> spond to the
file (for example, beyond the end of the <br>
> file,<br>
> including the case
where another process has truncated <br>
> the<br>
> file).<br>
> <br>
> I'm curious to learn which versions of babeltrace/ust/modules you
are<br>
> using, if you modified them (and how), and if you can get us a sample<br>
> trace that triggers the issue.<br>
> <br>
> Also, a bit of context about your setup: is this 32-bit or 64-bit<br>
> user-space, which OS.<br>
> <br>
> Hrm.<br>
> <br>
> By any chance, is it possible that you record your trace on a platform<br>
> with 4kB pages, and run babeltrace on it on a platform having 64kB<br>
> pages? Everything points me to include/babeltrace/mmap-align.h<br>
> mmap_align_addr(). We might be mmaping beyond the end of file in this<br>
> case.<br>
> <br>
> Thanks!<br>
> <br>
> Mathieu<br>
> <br>
> > <br>
> > On which babeltrace version can you reproduce it ? Did you do
any<br>
> > modification to babeltrace on your own on top ?<br>
> > <br>
> > Are you getting this with a lttng-ust or lttng-modules trace
? If yes,<br>
> > which version of the tool has generated the trace ?<br>
> > <br>
> > Can you provide a sample trace that triggers this issue ?<br>
> > <br>
> > Can you give us the detailed sequence of steps you use to reproduce
?<br>
> > <br>
> > Thanks,<br>
> > <br>
> > Mathieu<br>
> > <br>
> > > #0 _aligned_integer_read (definition=0x954320, ppos=0x106e028)
at <br>
> > > integer.c:81<br>
> > > #1 ctf_integer_read (ppos=0x106e028, definition=0x954320)
at <br>
> > > integer.c:224<br>
> > > #2 0x00007ffff070e36e in generic_rw (definition=<optimized
out>, <br>
> > > pos=0x106e028) at ../include/babeltrace/types.h:133<br>
> > > #3 bt_struct_rw (ppos=0x106e028, definition=0x9542e0)
at struct.c:56<br>
> > > #4 0x00007ffff13f7fda in generic_rw (definition=<optimized
out>, <br>
> > > pos=0x106e028) at ../../include/babeltrace/types.h:133<br>
> > > #5 ctf_packet_seek (stream_pos=0x106e028, index=<optimized
out>, <br>
> > > whence=<optimized out>) at ctf.c:860<br>
> > > #6 0x00007ffff070a3f1 in seek_file_stream_by_timestamp
<br>
> > > (cfs=cfs@entry=0x106cf90, <br>
> timestamp=timestamp@entry=1374636821498972926) <br>
> > > at iterator.c:141<br>
> > > #7 0x00007ffff070aa96 in seek_ctf_trace_by_timestamp
<br>
> > > (stream_heap=0x176d310, timestamp=1374636821498972926, tin=<optimized
<br>
> > > out>) at iterator.c:188<br>
> > > #8 bt_iter_set_pos (iter=iter@entry=0xfdafa0, iter_pos=0x2bf59f0)
at <br>
> > > iterator.c:439<br>
> > > #9 0x00007ffff1624f01 in _wrap__bt_iter_set_pos (self=<optimized
<br>
> out>, <br>
> > > args=<optimized out>) at babeltrace_wrap.c:3805<br>
> > > #10 0x00007ffff7b28c0c in call_function (oparg=<optimized
out>, <br>
> > > pp_stack=<optimized out>) at Python/ceval.c:3679<br>
> > > #11 PyEval_EvalFrameEx (f=0x1c243c0, throwflag=<optimized
out>) at <br>
> > > Python/ceval.c:2370<br>
> > > #12 0x00007ffff7b2e0d2 in PyEval_EvalCodeEx (co=0xa8e5d0,
<br>
> > > globals=<optimized out>, locals=<optimized out>,
args=0x2088cc8, <br>
> > > argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)<br>
> > > at Python/ceval.c:2942<br>
> > > .<br>
> > > .<br>
> > > .<br>
> > > <br>
> > > Any suggestions are welcome.<br>
> > > <br>
> > > Amit Margalit<br>
> > > IBM XIV - Storage Reinvented<br>
> > > XIV-NAS Development Team<br>
> > > Tel. 03-689-7774<br>
> > > Fax. 03-689-7230<br>
> > > _______________________________________________<br>
> > > lttng-dev mailing list<br>
> > > lttng-dev@lists.lttng.org<br>
> > > </font></tt><a href="http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev"><tt><font size=2>http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</font></tt></a><tt><font size=2><br>
> > <br>
> > <br>
> > -- <br>
> > Mathieu Desnoyers<br>
> > EfficiOS Inc.<br>
> > </font></tt><a href=http://www.efficios.com/><tt><font size=2>http://www.efficios.com</font></tt></a><tt><font size=2><br>
> <br>
> -- <br>
> Mathieu Desnoyers<br>
> EfficiOS Inc.<br>
> </font></tt><a href=http://www.efficios.com/><tt><font size=2>http://www.efficios.com</font></tt></a><tt><font size=2><br>
> <br>
> <br>
<br>
-- <br>
Mathieu Desnoyers<br>
EfficiOS Inc.<br>
</font></tt><a href=http://www.efficios.com/><tt><font size=2>http://www.efficios.com</font></tt></a><tt><font size=2><br>
<br>
</font></tt>
<br>