[lttng-dev] Lttng Soft Hang issue

Aravind HT aravind.ht at gmail.com
Wed May 27 06:19:51 EDT 2015


Request someone to kindly help with this. Blocked at this point, unable to
continue as any application crash leads to lttng not working.

Thanks,
Aravind.

On Thu, May 21, 2015 at 12:18 AM, Aravind HT <aravind.ht at gmail.com> wrote:

> Hi,
>
>
>
> I have recently started trying lttng 2.6 for a few applications on my
> Ubuntu 12.04 and I noticed that the health check on sessiond and consumerd
> failed soon after starting the session.
>
>
> On investigating, I found that thread_manage_consumer() had exited causing
> an overall health check failure.
>
>
> Here are the sequence of steps that I found contributed to
> *thread_manage_consumer()* exiting.
>
>
> 1.       In *thread_manage_apps() : 1558 , ust_app_unregister(pollfd)* is
> being called. This happens when there is an error detected by *revents =
> LTTNG_POLL_GETEV(&events, i)*
>
> My initial guess here is that as one of my apps has crashed, producing a *LPOLLERR
> | LPOLLHUP | LPOLLRDHUP *to be generated for *epoll()* causing
> *ust_app_unregister()* to be called for that app.
>
>
>
> 2.       In *ust_app_unregister():3154 , close_metadata(registry,
> ua_sess->consumer)  *in which* registry->metadata_closed = 1 *is set*.*
>
>
>
> (2.a) Note:* close_metadata() *also calls* consumer_close_metadata() *which
> sends* LTTNG_CONSUMER_CLOSE_METADATA *and* metadata_key *to the consumerd
> to stop it from further dealing with the concerned app. Somehow this
> doesn’t seem to help*.*
>
>
>
> 3.       Next, I see that the *thread_manage_consumer():1353* for some
> reason has ignored the above 2.a and gets to do request/reply for that app
> by calling*ust_consumer_metadata_request():491 ->
> ust_app_push_metadat(ust_reg, socket,1) *which at line 460 checks for
> *registry->metadata_closed* and returns an *–EPIPE*
>
>
>
> 4.       This *–EPIPE* error cascades all the way back up to
> *thread_manage_consumer():1353* at which point *thread_manage_consumer()* decides
> to exit causing health_check() to fail.
>
>
>
>
>
> So it looks like under some scenario, an application crash could cause the
> lttng some problems.
>
>
>
> I think a possible fix for this scenario, is to instead of 4, send an
> *ERROR* message back to *consumerd()* . This could be done from
> *ust_consumer_metadata_request()* call. Can someone please let me know if
> this is correct and shed more light on the issue ?
>
>
>
>
>
> Please forgive if there are any guideline omissions for posting here from
> my part. This is my first post.
>
>
> Regards,
>
> Aravind.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20150527/3e98cc51/attachment.html>


More information about the lttng-dev mailing list