[lttng-dev] Deaklock in liblttng-ust

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Sep 18 00:54:49 EDT 2012


* changz (zheng.chang at emc.com) wrote:
> On 9/17/2012 21:33 PM, Mathieu Desnoyers wrote:
>> * changz (zheng.chang at emc.com) wrote:
>>> ......
>>>
>>> The child process calls _fini when it calls API exit. It gets hung and
>>> meanwhile the parent is waiting for its termination.
>>> I think the whole life-cycle of the process should be considered. The
>>> parent's waiting in critical region is dangerous.
>>> Is it possible to refine the critical region with smaller fineness?
>>>
>>> What do you think?
>> Hrm, yes you're right. I'm looking into it.
>>
>> The main issue is that get_wait_shm() bypass the fork() wrapper (with
>> lttng_ust_nest_count), which is responsible for holding the UST mutex
>> across fork(). Therefore, when exiting the context of the child process,
>> we execute the destructor, which try to grab the UST mutex, which might
>> be in pretty much any state.
>>
>> Given that we don't want this process to try to register to
>> lttng-sessiond (because this is internal to lttng-ust), we might want to
>> let it skip the destructor execution. This would actually be the easiest
>> way out.
>>
>> Does the follow patch fix the issue for you ?
>>
>> diff --git a/liblttng-ust/lttng-ust-comm.c b/liblttng-ust/lttng-ust-comm.c
>> index be64acd..596fd7d 100644
>> --- a/liblttng-ust/lttng-ust-comm.c
>> +++ b/liblttng-ust/lttng-ust-comm.c
>> @@ -616,9 +616,9 @@ int get_wait_shm(struct sock_info *sock_info, size_t mmap_size)
>>   			ret = ftruncate(wait_shm_fd, mmap_size);
>>   			if (ret) {
>>   				PERROR("ftruncate");
>> -				exit(EXIT_FAILURE);
>> +				_exit(EXIT_FAILURE);
>>   			}
>> -			exit(EXIT_SUCCESS);
>> +			_exit(EXIT_SUCCESS);
>>   		}
>>   		/*
>>   		 * For local shm, we need to have rw access to accept
> Yes, it works.
> Just a reminder, here arefour callings of exit in child's path in my git  
> repository.

Indeed! Thanks for the reminder!

Here is the fix:

commit 5d3bc5ed74a4c9f557a75d7de82ed7056adb812e
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date:   Tue Sep 18 00:52:10 2012 -0400

    Fix: get_wait_shm() ust mutex deadlock (add 2 missing exit calls)
    
    Reported-by: changz <zheng.chang at emc.com>
    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

backported to stable-2.0.

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com



More information about the lttng-dev mailing list