[lttng-dev] RFC: Fix crash in dlerror()

Sat Feb 8 11:53:44 EST 2014

On 02/08/2014 11:06 AM, Mathieu Desnoyers wrote:
> 
> 
> ----- Original Message -----
>> From: "Stefan Seefeld" <stefan_seefeld at mentor.com>
>> To: lttng-dev at lists.lttng.org
>> Sent: Friday, February 7, 2014 4:50:09 PM
>> Subject: [lttng-dev] RFC: Fix crash in dlerror()
>>
>> I have been looking into an issue with the malloc wrapper, where an
>> unsuccessful call to dlopen(), followed by a call to dlerror(), would
>> result in a segmentation fault when the malloc wrapper is being used.
>>
>> The problem is the following:
>>
>> The functions dlopen() and dlsym() make use of a global (though
>> thread-local) "result" structure to hold the error state, to allow a
>> subsequent call to dlerror() to report it.
>>
>> As it turns out, dlerror() itself may implicitly call realloc(), which,
>> if it hasn't been used before, triggers our wrapper to call dlsym(). So,
>> while dlerror() inspects said result structure, dlsym() re-initializes
>> it, causing the crash...
>>
>> This is arguably a bug in the dlfcn functions. The attached patch
>> attempts to fix this by moving the initialization of the realloc()
>> wrapper (i.e., the loading of the symbol) into the constructor. This
>> fixes the crash that I'm observing, but since none of these dependencies
>> are specified or documented, this change may cause other issues elsewhere.
>>
>> Are there any objections to this approach ? If not, I'll submit a formal
>> patch for this.
> 
> The issue I see with the approach you propose is if dlsym() is used
> within another constructor (from anoter lib for instance). Given that
> the order of constructor execution should be assumed to be random, we can
> run into the same error scenario you describe here.

Yes, I was worried about the same. In general there really isn't
anything we can do short of submitting a patch to glibc, as the only way
to address this correctly is to record whether we are inside a dlerror()
call. So, anything I can think of is merely heuristic.

Here is a slightly different approach: We know that the calloc symbol is
looked up very early (see the trick we use with the temporary static
calloc function to avoid the infinite recursion), and we may safely
assume that the calloc lookup won't fail (if it does, the user has more
important things to worry about). Thus, right after the calloc lookup
completed seems a good time to force the realloc lookup.

That's what this follow-up patch does (which still works for my test).

How does that look ?

	Stefan

-- 
Stefan Seefeld
CodeSourcery / Mentor Graphics
http://www.mentor.com/embedded-software/
-------------- next part --------------

diff --git a/liblttng-ust-libc-wrapper/lttng-ust-malloc.c b/liblttng-ust-libc-wrapper/lttng-ust-malloc.c
index 33ed18b..8413f31 100644
--- a/liblttng-ust-libc-wrapper/lttng-ust-malloc.c
+++ b/liblttng-ust-libc-wrapper/lttng-ust-malloc.c
@@ -54,6 +54,8 @@ static void *static_calloc(size_t nmemb, size_t size)
 	return &static_calloc_buf[prev_offset];
 }
 
+static void *(*plibc_realloc)(void *ptr, size_t size);
+
 void *malloc(size_t size)
 {
 	static void *(*plibc_malloc)(size_t size);
@@ -111,6 +113,16 @@ void *calloc(size_t nmemb, size_t size)
 			fprintf(stderr, "callocwrap: unable to find calloc\n");
 			return NULL;
 		}
+		/*
+		 * Now seems a good time to also initialize the realloc symbol.
+		 * (Doing that just-in-time won't work as it would corrupt some
+		 * dlfcn-internal state.)
+		 */
+		plibc_realloc = dlsym(RTLD_NEXT, "realloc");
+		if (plibc_calloc == NULL) {
+			fprintf(stderr, "callocwrap: unable to find realloc\n");
+			return NULL;
+		}
 	}
 	retval = plibc_calloc(nmemb, size);
 	tracepoint(ust_libc, calloc, nmemb, size, retval);
@@ -119,7 +131,6 @@ void *calloc(size_t nmemb, size_t size)
 
 void *realloc(void *ptr, size_t size)
 {
-	static void *(*plibc_realloc)(void *ptr, size_t size);
 	void *retval;
 
 	if (plibc_realloc == NULL) {