[lttng-dev] Lttng Soft Hang issue

Jérémie Galarneau jeremie.galarneau at efficios.com
Mon Jul 6 11:23:38 EDT 2015


Hi,

What do you observe when this happens? Does the session daemon become
unresponsive?

Jérémie

On Wed, May 27, 2015 at 6:19 AM, Aravind HT <aravind.ht at gmail.com> wrote:

> Request someone to kindly help with this. Blocked at this point, unable to
> continue as any application crash leads to lttng not working.
>
> Thanks,
> Aravind.
>
> On Thu, May 21, 2015 at 12:18 AM, Aravind HT <aravind.ht at gmail.com> wrote:
>
>> Hi,
>>
>>
>>
>> I have recently started trying lttng 2.6 for a few applications on my
>> Ubuntu 12.04 and I noticed that the health check on sessiond and consumerd
>> failed soon after starting the session.
>>
>>
>> On investigating, I found that thread_manage_consumer() had exited
>> causing an overall health check failure.
>>
>>
>> Here are the sequence of steps that I found contributed to
>> *thread_manage_consumer()* exiting.
>>
>>
>> 1.       In *thread_manage_apps() : 1558 , ust_app_unregister(pollfd)* is
>> being called. This happens when there is an error detected by *revents =
>> LTTNG_POLL_GETEV(&events, i)*
>>
>> My initial guess here is that as one of my apps has crashed, producing a *LPOLLERR
>> | LPOLLHUP | LPOLLRDHUP *to be generated for *epoll()* causing
>> *ust_app_unregister()* to be called for that app.
>>
>>
>>
>> 2.       In *ust_app_unregister():3154 , close_metadata(registry,
>> ua_sess->consumer)  *in which* registry->metadata_closed = 1 *is set*.*
>>
>>
>>
>> (2.a) Note:* close_metadata() *also calls* consumer_close_metadata() *which
>> sends* LTTNG_CONSUMER_CLOSE_METADATA *and* metadata_key *to the
>> consumerd to stop it from further dealing with the concerned app. Somehow
>> this doesn’t seem to help*.*
>>
>>
>>
>> 3.       Next, I see that the *thread_manage_consumer():1353* for some
>> reason has ignored the above 2.a and gets to do request/reply for that app
>> by calling*ust_consumer_metadata_request():491 ->
>> ust_app_push_metadat(ust_reg, socket,1) *which at line 460 checks for
>> *registry->metadata_closed* and returns an *–EPIPE*
>>
>>
>>
>> 4.       This *–EPIPE* error cascades all the way back up to
>> *thread_manage_consumer():1353* at which point *thread_manage_consumer()* decides
>> to exit causing health_check() to fail.
>>
>>
>>
>>
>>
>> So it looks like under some scenario, an application crash could cause
>> the lttng some problems.
>>
>>
>>
>> I think a possible fix for this scenario, is to instead of 4, send an
>> *ERROR* message back to *consumerd()* . This could be done from
>> *ust_consumer_metadata_request()* call. Can someone please let me know
>> if this is correct and shed more light on the issue ?
>>
>>
>>
>>
>>
>> Please forgive if there are any guideline omissions for posting here from
>> my part. This is my first post.
>>
>>
>> Regards,
>>
>> Aravind.
>>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>


-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20150706/88cf6003/attachment.html>


More information about the lttng-dev mailing list