[lttng-dev] Deaklock in liblttng-ust
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Tue Sep 18 00:54:49 EDT 2012
* changz (zheng.chang at emc.com) wrote:
> On 9/17/2012 21:33 PM, Mathieu Desnoyers wrote:
>> * changz (zheng.chang at emc.com) wrote:
>>> ......
>>>
>>> The child process calls _fini when it calls API exit. It gets hung and
>>> meanwhile the parent is waiting for its termination.
>>> I think the whole life-cycle of the process should be considered. The
>>> parent's waiting in critical region is dangerous.
>>> Is it possible to refine the critical region with smaller fineness?
>>>
>>> What do you think?
>> Hrm, yes you're right. I'm looking into it.
>>
>> The main issue is that get_wait_shm() bypass the fork() wrapper (with
>> lttng_ust_nest_count), which is responsible for holding the UST mutex
>> across fork(). Therefore, when exiting the context of the child process,
>> we execute the destructor, which try to grab the UST mutex, which might
>> be in pretty much any state.
>>
>> Given that we don't want this process to try to register to
>> lttng-sessiond (because this is internal to lttng-ust), we might want to
>> let it skip the destructor execution. This would actually be the easiest
>> way out.
>>
>> Does the follow patch fix the issue for you ?
>>
>> diff --git a/liblttng-ust/lttng-ust-comm.c b/liblttng-ust/lttng-ust-comm.c
>> index be64acd..596fd7d 100644
>> --- a/liblttng-ust/lttng-ust-comm.c
>> +++ b/liblttng-ust/lttng-ust-comm.c
>> @@ -616,9 +616,9 @@ int get_wait_shm(struct sock_info *sock_info, size_t mmap_size)
>> ret = ftruncate(wait_shm_fd, mmap_size);
>> if (ret) {
>> PERROR("ftruncate");
>> - exit(EXIT_FAILURE);
>> + _exit(EXIT_FAILURE);
>> }
>> - exit(EXIT_SUCCESS);
>> + _exit(EXIT_SUCCESS);
>> }
>> /*
>> * For local shm, we need to have rw access to accept
> Yes, it works.
> Just a reminder, here arefour callings of exit in child's path in my git
> repository.
Indeed! Thanks for the reminder!
Here is the fix:
commit 5d3bc5ed74a4c9f557a75d7de82ed7056adb812e
Author: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
Date: Tue Sep 18 00:52:10 2012 -0400
Fix: get_wait_shm() ust mutex deadlock (add 2 missing exit calls)
Reported-by: changz <zheng.chang at emc.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
backported to stable-2.0.
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list