[lttng-dev] lttng soft hang issue

Aravind HT aravind.ht at gmail.com
Tue May 19 07:06:06 EDT 2015


Hi,



I have recently started trying lttng 2.6 for a few applications on my
Ubuntu 12.04 and I noticed that the health check on sessiond and consumerd
failed soon after starting the session.


On investigating, I found that thread_manage_consumer() had exited causing
an overall health check failure.


Here are the sequence of steps that I found contributed to
*thread_manage_consumer()* exiting.


1.       In *thread_manage_apps() : 1558 , ust_app_unregister(pollfd)* is
being called. This happens when there is an error detected by *revents =
LTTNG_POLL_GETEV(&events, i) *

My initial guess here is that as one of my apps has crashed, producing
a *LPOLLERR
| LPOLLHUP | LPOLLRDHUP *to be generated for *epoll()* causing
*ust_app_unregister()* to be called for that app.



2.       In *ust_app_unregister():3154 , close_metadata(registry,
ua_sess->consumer)  *in which* registry->metadata_closed = 1 *is set*. *



(2.a) Note:* close_metadata() *also calls* consumer_close_metadata() *which
sends* LTTNG_CONSUMER_CLOSE_METADATA *and* metadata_key *to the consumerd
to stop it from further dealing with the concerned app. Somehow this
doesn’t seem to help*.*



3.       Next, I see that the *thread_manage_consumer():1353* for some
reason has ignored the above 2.a and gets to do request/reply for that app
by calling *ust_consumer_metadata_request():491 ->
ust_app_push_metadat(ust_reg, socket,1) *which at line 460 checks for
*registry->metadata_closed* and returns an *–EPIPE*



4.       This *–EPIPE* error cascades all the way back up to
*thread_manage_consumer():1353* at which point *thread_manage_consumer()*
decides to exit causing health_check() to fail.





So it looks like under some scenario, an application crash could cause the
lttng some problems.



I think a possible fix for this scenario, is to instead of 4, send an
*ERROR* message back to *consumerd()* . This could be done from
*ust_consumer_metadata_request()* call. Can someone please let me know if
this is correct and shed more light on the issue ?





Please forgive if there are any guideline omissions for posting here from
my part. This is my first post.


Regards,

Aravind.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20150519/db943dfe/attachment-0001.html>


More information about the lttng-dev mailing list