[lttng-dev] RFC: Fix crash in dlerror()

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Sat Feb 8 17:22:28 EST 2014


----- Original Message -----
> From: "Stefan Seefeld" <stefan_seefeld at mentor.com>
> To: "Mathieu Desnoyers" <mathieu.desnoyers at efficios.com>
> Cc: lttng-dev at lists.lttng.org
> Sent: Saturday, February 8, 2014 11:53:44 AM
> Subject: Re: [lttng-dev] RFC: Fix crash in dlerror()
> 
> On 02/08/2014 11:06 AM, Mathieu Desnoyers wrote:
> > 
> > 
> > ----- Original Message -----
> >> From: "Stefan Seefeld" <stefan_seefeld at mentor.com>
> >> To: lttng-dev at lists.lttng.org
> >> Sent: Friday, February 7, 2014 4:50:09 PM
> >> Subject: [lttng-dev] RFC: Fix crash in dlerror()
> >>
> >> I have been looking into an issue with the malloc wrapper, where an
> >> unsuccessful call to dlopen(), followed by a call to dlerror(), would
> >> result in a segmentation fault when the malloc wrapper is being used.
> >>
> >> The problem is the following:
> >>
> >> The functions dlopen() and dlsym() make use of a global (though
> >> thread-local) "result" structure to hold the error state, to allow a
> >> subsequent call to dlerror() to report it.
> >>
> >> As it turns out, dlerror() itself may implicitly call realloc(), which,
> >> if it hasn't been used before, triggers our wrapper to call dlsym(). So,
> >> while dlerror() inspects said result structure, dlsym() re-initializes
> >> it, causing the crash...
> >>
> >> This is arguably a bug in the dlfcn functions. The attached patch
> >> attempts to fix this by moving the initialization of the realloc()
> >> wrapper (i.e., the loading of the symbol) into the constructor. This
> >> fixes the crash that I'm observing, but since none of these dependencies
> >> are specified or documented, this change may cause other issues elsewhere.
> >>
> >> Are there any objections to this approach ? If not, I'll submit a formal
> >> patch for this.
> > 
> > The issue I see with the approach you propose is if dlsym() is used
> > within another constructor (from anoter lib for instance). Given that
> > the order of constructor execution should be assumed to be random, we can
> > run into the same error scenario you describe here.
> 
> Yes, I was worried about the same. In general there really isn't
> anything we can do short of submitting a patch to glibc, as the only way
> to address this correctly is to record whether we are inside a dlerror()
> call. So, anything I can think of is merely heuristic.
> 
> Here is a slightly different approach: We know that the calloc symbol is
> looked up very early (see the trick we use with the temporary static
> calloc function to avoid the infinite recursion), and we may safely
> assume that the calloc lookup won't fail (if it does, the user has more
> important things to worry about). Thus, right after the calloc lookup
> completed seems a good time to force the realloc lookup.
> 
> That's what this follow-up patch does (which still works for my test).
> 
> How does that look ?

Interesting approach. Then I wonder if we couldn't simply lookup every symbol
we're interested in whenever any of the overridden function is called and
we notice a NULL pointer, and provide a simplistic "static" allocator for
every function overridden.

Please let me know if something like the attached patch works for you.
It seems more solid to cover all the cases, so that if the dlsym() implementation
ever chooses to use other functions from the memory allocator, it will still work.

Thoughts ?

Thanks,

Mathieu



> 
> 	Stefan
> 
> --
> Stefan Seefeld
> CodeSourcery / Mentor Graphics
> http://www.mentor.com/embedded-software/
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-ust-malloc.patch
Type: text/x-patch
Size: 8641 bytes
Desc: not available
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20140208/3a05c089/attachment-0001.bin>


More information about the lttng-dev mailing list