[lttng-dev] lttng create freezes sometimes

David Goulet dgoulet at efficios.com
Fri Jan 27 12:35:44 EST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 12-01-27 12:25 PM, Sébastien Barthélémy wrote:
> Hello,
> 
> 2012/1/27 David Goulet <dgoulet at efficios.com>:
>> On 12-01-26 08:19 AM, Sébastien Barthélémy wrote:
>>> It works, thanks! I can run the following without any failure:
>>>
>>> $ for i in {1..100}; do killall lttng-sessiond; sleep 1; echo "$i";
>>> lttng create;done
>>>
>>> However, if I remove the sleep, I get 7 other failures.
>>> Likely race conditions between the kill and the subsequent start,
>>> which is an unlikely use pattern.
>>
>> Hmmmm... yes looks like a race. I can't get any errors on my side... I even bump
>> it up to 1000 and still nothing...
>>
>> It seems your system is quite a challenge for us! :)
> 
> It is a 500MHz AMD Geode processor. I'm pretty sure your workstation has more
> horsepower ;)
> 

Ya I dev on i7 :P ... It's difficult for me to hit race condition due to slow
CPU/RAM access and this is why open source is so fantastic! Crowd sourcing the
corner cases :P :P

> 
>> This is most irregular since chmod() is *always* done after a mkdir, a socket
>> creation or shm open....
> 
> I imagine the scheduler might not agree with our definition of after ;).
> 
> Wouldn't the following be possible?
> 
> lttng-sessiond1 is running. .lttng exists
> lttng-sessiond2 creates .lttng (as it already exists, this is a no-op)
> lttng-sessiond1 get killed. It deletes .lttng
> lttng-sessiond2 tries to chmod .lttng, which does not exist anymore

You are WAY more awake than me! Actually, I'm talking with Mathieu on this issue
right now. It is probably exactly what's happening!

We'll try to come up with a fix today! I'll keep you inform

Thanks a lot!
David

> 
>>> 60
>>> Spawning a session daemon
>>> rm: cannot remove `/home/nao/.lttng': Directory not empty
>>> Session auto-20120126-122706 created.
>>> Traces will be written in /home/nao/lttng-traces/auto-20120126-122706
>>
>> Again here, we use "rm -rf" for now ... how can you get this kind of error
>> message with "-rf" ....?
> 
> I dunno...
> 
>> Looking at the rest, you have a bind() error, Bad file descriptor and more "no
>> such file or dir...". Is your open files limit is very low? Maybe there is a
>> couple of place where we don't handle well the maximum open files error.
> 
> The limit is 1024. Quite usual no?
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBAgAGBQJPIuBwAAoJEELoaioR9I02piYH/i4E58ATABiv4cWdOCnXg8yN
hECtzD/l3oQ64BRv+HmMKfac06jpK3cG3XmPq2Ncjdm+RE9IGgczgvcRG2hL21DS
f0pR+R+uqe/yqnb3xEYio6IiIov0o25qmSYEuAjLzY5nOqG7A1h5fnbgPL+FCoN3
uIbnMQx+eCP6GnOvuPaX7yB6Egwsz4bmUD/tfUqTVKB+q+9BfPuIDliWxINjj2iK
0lEaTYcgF9yax2tUbknQXpS0S72Q9wZ+JLDkLGqDcL0Qfp+dhEgIYQTfCAYZg7kK
HEoO8yIIpjyLJl+Dc0VuSgXVjsvSK/lEaf2r6HPApO/S1GWJh6CM7L0xXUk/6FU=
=05MT
-----END PGP SIGNATURE-----



More information about the lttng-dev mailing list