[lttng-dev] Getting SIGBUS in babeltrace - occasionally

Amit Margalit AMITM at il.ibm.com
Wed Jul 24 10:48:17 EDT 2013


Access beyond end of file seems to be the issue IIUC.

Let me try to give as much context as I can and answer your questions one 
by one:


lttng-ust 2.2.0, lttng-tools 2.2.0, babeltrace - I used git hash 
cc26a15ac355d886bb507955fdaec9b218024713 with a patch on babeltrace.i.in.

[I cannot supply the patch as this requires my employer's (IBM) approval - 
I am in the process of obtaining a permanent contribution approval, but 
until then...]

However, I can tell you exactly what it does:
1. Adds a typedef for 'ssize_t'
2. Added a call to _bt_ctf_get_decl_from_def() around the argument 
supplied to _bt_ctf_get_int_len()
3. Changed all use of {} in string ".format()" calls to add numbers (e.g. 
{0} instead of {}) - as this doesn't work for Python 2.6.

I am waiting for the contribution approval and then I'll submit a patch.

To provide a trace sample, I'll need a special approval, which could take 
very long.

System Setup

Processor: Xeon-family 6-cores (12 if you count HT)
RAM: 24GB of RAM, page size is 4KB.
OS: Linux based on an MCP (IBM's SuSE derivative) 

Tracing Setup

Same system used to generate, collect and read the traces.
100% user-space.


This is not readily reproducible. Some context:

We have an in-house trace mechanism including a code instrumentor and a 
curses-based viewer (actually urwid+python).
My current project is to embed LTTng logging and trace capabilites into 
several open-source projects, because we couldn't use our in-house tracer 
without having GPL issues.
I have written a back-end that enables our in-house viewer to display 
Lttng traces, by linking with libbabeltrace (allowed by its LGPL license).

As far as I can tell SIGBUS happens to me when I was doing a search. This 
literally reads the events in sequence and tries to compare some fields 
against some criteria.

And here's the main reason why I think this is happening - I was adding a 
trace (using add_traces_recursive(trace_dir,'ctf')) while the session is 
active (i.e. not stopped).

I can see 2 problematic scenarios here:

Scenario 1

When I add the trace to the context babeltrace performs mmap() according 
to the size of the file(s) at that moment.
The files keep growing.
When I try to access new packets they may lie beyond the end of the 
mmap'ed region. I think this should lead to SIGSEGV and not to SIGBUS.

Scenario 2

Due to buffering issues, I could be having a race with the session daemon, 
where my code reads the beginning of a packet that was already written to 
the file, and try to access the contents, which were not yet written. 
Again - could happen around the end of the file, with the session still 


> > 
> -- 
