<div dir="ltr"><p class="" style="margin-left:0in">I have gone through a few scenarios and this is what I think of the problem and my solution is at the bottom.</p><p class="" style="margin-left:0in">Please let me know if this is correct.</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">The brief analysis is:</p><p class="" style="margin-left:0in">1.       Whenever a new stream is added</p><p class="" style="margin-left:0in"><span class="" style="white-space:pre">     </span>a.      a stream object having path_name, channel_name, recv_list and other things are allocated and populated.</p><p class="" style="margin-left:0in"><span class="" style="white-space:pre">        </span>b.      The address of recv_list, which is a member of the stream object is added to conn->recv_head list. conn represents a connection that is made up of multiple streams.</p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in">2.       One of the scenarios is that when a stream is closed RELAYD_CLOSE_STREAM, the memory allocated in 1.a is freed through a call to ctf_trace_destroy() -> stream_delete() and stream_destroy() . However, the entry in the conn->recv_head is not removed. This now is a dangling pointer. The closing of stream object can happen in multiple cases and each will cause the same dangling pointer problem. The attached .xps document shows quite a few code paths in which this can happen, raising the complexity.</p><p class="" style="margin-left:0in">3.       Among the use cases that access this conn->recv_head list are set_view_ready_flag() and addition of a new stream through relay_add_stream(). Both iterate and try to modify the list. There may be other cases as well.</p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in">In short, there is a linked list which is made of nodes allocated in memory regions that get freed, but the nodes in this list are not removed at that point. Any access of this list causes the problems.</p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">Fixes attempted so far:</p><p class="" style="margin-left:0in"><a href="https://github.com/lttng/lttng-tools/commit/cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80">https://github.com/lttng/lttng-tools/commit/cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80</a></p><p class="" style="margin-left:0in"><a href="https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f">https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f</a></p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">The above fixes of removing the node from conn->recv_head using cds_list_del() attempts to do so from multiple places</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">1. relay_close_stream()  </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">Nothing wrong in removing it here using cds_list_del()</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">2. connection_destroy()</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">This is where the problem of corruption is not getting solved. This is because it gets called from destroy_connection()</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">static void destroy_connection(struct lttng_ht *relay_connections_ht, struct relay_connection *conn)</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">{</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                assert(relay_connections_ht);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                assert(conn);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                connection_delete(relay_connections_ht, conn);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                /* For the control socket, we try to destroy the session. */</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                if (conn->type == RELAY_CONTROL && conn->session) {</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                                destroy_session(conn->session, conn->sessions_ht);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                }</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                connection_destroy(conn);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">}</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">But, destroy_session() above calls ctf_trace_destroy() which frees up the memory allocated for the streams, thus a later attempt by connection_destroy() to iterate over the conn->recv_head list will be a problem as it now contains dangling pointers.</p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in">3. From my side, I thought that just calling cds_list_del() from relay_close_stream() should fix it, however, when set_viewer_ready_flag() is called</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">static void set_viewer_ready_flag(struct relay_connection *conn)</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">{</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                struct relay_stream *stream, *tmp_stream;</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                pthread_mutex_lock(&conn->session->viewer_ready_lock);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                cds_list_for_each_entry_safe(stream, tmp_stream, &conn->recv_head,</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                                                recv_list) {</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                                stream->viewer_ready = 1;</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                                cds_list_del(&stream->recv_list);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                }</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                pthread_mutex_unlock(&conn->session->viewer_ready_lock);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                return;</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">}</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">It iterates over the entire conn->recv_head and removes all the entries. So my previous solution would try to remove an entry from the list which is already removed causing some other corruption ( the removal from the list using cds_list_del() does not have any safety checks )</p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in">The solution that I am thinking of is below:</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">1.  Wherever cds_list_del(&stream->recv_list) gets called, reset the pointers to NULL as</p><p class="" style="margin-left:0in">    stream->recv_list.next = stream->recv_list.prev = NULL;</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">2. Remove the entry from conn->recv_head list from only stream_delete() after checking for above NULLs and stream->viewer_ready flag.</p><p class="" style="margin-left:0in">   The NULL set/check may not be needed as we could manage with the viewer_ready flag alone, but in case we have missed any other scenario, NULL would help us identity/fix it faster.</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in">/---------------------------------------------------------------------/</p><p class="" style="margin-left:0in">The patch with the fix :</p><p class="" style="margin-left:0in">/---------------------------------------------------------------------/</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">diff --git a/src/bin/lttng-relayd/main.c b/src/bin/lttng-relayd/main.c</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">index cc68410..da1b2e2 100644</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">--- a/src/bin/lttng-relayd/main.c</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+++ b/src/bin/lttng-relayd/main.c</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">@@ -1153,6 +1153,8 @@ void set_viewer_ready_flag(struct relay_connection *conn)</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                        recv_list) {</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                stream->viewer_ready = 1;</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">                cds_list_del(&stream->recv_list);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+               /* Reset the removed node memory, we shall check in other places for this */</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+               memset(&stream->recv_list, 0, sizeof(struct cds_list_head));</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">        }</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">        pthread_mutex_unlock(&conn->session->viewer_ready_lock);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">        return;</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">diff --git a/src/bin/lttng-relayd/stream.c b/src/bin/lttng-relayd/stream.c</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">index 410fae8..7043797 100644</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">--- a/src/bin/lttng-relayd/stream.c</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+++ b/src/bin/lttng-relayd/stream.c</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">@@ -144,6 +144,13 @@ void stream_delete(struct lttng_ht *ht, struct relay_stream *stream)</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">        assert(!ret);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> </p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">        cds_list_del(&stream->trace_list);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+/* Before we attempt to remove the stream from conn->recv_head list, need to check if it was already removed from set_viewer_ready_flag() */</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+       if(stream->viewer_ready == 0) {</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+               cds_list_del(&stream->recv_list);</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+               /* Reset the removed node memory, we shall check in other places for this */</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+               memset(&stream->recv_list, 0, sizeof(struct cds_list_head));</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+       }</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in">+</p><p class="" style="margin-left:0in"><br></p><p class="" style="margin-left:0in"> }</p><p class="" style="margin-left:0in"><br></p></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 14, 2015 at 12:59 AM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="color:rgb(0,134,179);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">cds_list_del</span><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">(&stream->recv_list) definitely needs to be called from relay_close_stream() else </span><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">conn->recv_head will continue to have a stale pointer(for streams that got closed) until </span></div><div><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">the entire connection is destroyed, and conversely, connection_destruction() means destroying the conn->recv_head list which means iterating/removing entries of streams ( some of them could be stale,</span></div><div><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">if relay_close_stream() does not do cds_list_del() ) and in which case, </span><span style="color:rgb(0,134,179);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">cds_list_del</span><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">(&stream->recv_list) needs to be called also.</span></div><div><span style="color:rgb(51,51,51);font-family:Consolas,'Liberation Mono',Menlo,Courier,monospace;font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)"><br></span></div><div><font color="#333333" face="Consolas, Liberation Mono, Menlo, Courier, monospace"><span style="font-size:12px;line-height:16.7999992370605px;white-space:pre-wrap;background-color:rgb(255,236,236)">So removing either of them, I think, is not the good and in this case, </span></font><a href="https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f" target="_blank">https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f</a> could be reverted</div><div><br></div><div><br></div><div> *** However *** , </div><div>there could also be a synchronization issue if relay_close_stream() and connection_destroy() can happen concurrently if these calls are not serialized ( stream->recv_list being accessed from two contexts ), need to verify this before reverting <a href="https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f" target="_blank">https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f</a>  </div><div><br></div><div>I also do not know the kind of issues being seen/details to prompt the reversal of <a href="http://git.lttng.org/?p=lttng-tools.git;a=commitdiff;h=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80" target="_blank">http://git.lttng.org/?p=lttng-tools.git;a=commitdiff;h=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80</a></div><div><br></div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 14, 2015 at 12:17 AM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The specific change in <a href="http://git.lttng.org/?p=lttng-tools.git;a=commitdiff;h=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80" target="_blank">http://git.lttng.org/?p=lttng-tools.git;a=commitdiff;h=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80</a><div>that I need is for <span style="color:rgb(0,0,0);font-family:Consolas,'Bitstream Vera Sans Mono',monospace;font-size:13px;line-height:18.2000007629395px;white-space:pre-wrap;background-color:rgb(221,255,221)">cds_list_del(&stream->recv_list); to be called. </span></div><div><span style="color:rgb(0,0,0);font-family:Consolas,'Bitstream Vera Sans Mono',monospace;font-size:13px;line-height:18.2000007629395px;white-space:pre-wrap;background-color:rgb(221,255,221)">Below is the patch extract.</span></div><div><span style="font-size:13px;color:rgb(0,0,0);font-family:Consolas,'Bitstream Vera Sans Mono',monospace;line-height:18.2000007629395px;white-space:pre-wrap;background-color:rgb(234,242,245)"><br></span></div><div><span style="font-size:13px;color:rgb(0,0,0);font-family:Consolas,'Bitstream Vera Sans Mono',monospace;line-height:18.2000007629395px;white-space:pre-wrap;background-color:rgb(234,242,245)">diff --git </span><a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=a93151ac47f560875f07c9cbf113225f9dd892cc" style="font-size:13px;margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(65,131,196);text-decoration:none;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;line-height:18.2000007629395px;white-space:pre-wrap;background:transparent" target="_blank">a/src/bin/lttng-relayd/main.c</a><span style="font-size:13px;color:rgb(0,0,0);font-family:Consolas,'Bitstream Vera Sans Mono',monospace;line-height:18.2000007629395px;white-space:pre-wrap;background-color:rgb(234,242,245)"> </span><a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=d82a3412d7b8578be8f5b39abf0a9fd949ff07bc;hb=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80" style="font-size:13px;margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(65,131,196);text-decoration:none;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;line-height:18.2000007629395px;white-space:pre-wrap;background:transparent" target="_blank">b/src/bin/lttng-relayd/main.c</a></div><div><br></div><div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;width:1480.5625px;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)">index <a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=a93151ac47f560875f07c9cbf113225f9dd892cc" style="margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(65,131,196);text-decoration:none;background:transparent" target="_blank">a93151a</a>..<a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=d82a3412d7b8578be8f5b39abf0a9fd949ff07bc;hb=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80" style="margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(65,131,196);text-decoration:none;background:transparent" target="_blank">d82a341</a> 100644<span style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;background:transparent"> (file)</span><br>
</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(170,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)">--- a/<a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=a93151ac47f560875f07c9cbf113225f9dd892cc" style="margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(170,0,0);text-decoration:none;background:transparent" target="_blank">src/bin/lttng-relayd/main.c</a></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,112,0);line-height:18.2000007629395px;background:rgb(248,248,248)">+++ b/<a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=d82a3412d7b8578be8f5b39abf0a9fd949ff07bc;hb=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80" style="margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(0,112,0);text-decoration:none;background:transparent" target="_blank">src/bin/lttng-relayd/main.c</a></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;color:rgb(153,153,153);white-space:pre-wrap;line-height:18.2000007629395px;background:rgb(234,242,245)"><span style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;background:transparent">@@ <a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=a93151ac47f560875f07c9cbf113225f9dd892cc#l1340" style="margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(65,131,196);text-decoration:none;background:transparent" target="_blank">-1340,6</a> <a href="http://git.lttng.org/?p=lttng-tools.git;a=blob;f=src/bin/lttng-relayd/main.c;h=d82a3412d7b8578be8f5b39abf0a9fd949ff07bc;hb=cd2ef1ef1d54ced9e4d0d03b865bb7fc6a905f80#l1340" style="margin:0px;padding:0px;border:0px;outline:none;vertical-align:baseline;color:rgb(65,131,196);text-decoration:none;background:transparent" target="_blank">+1340,18</a> @@</span><span style="margin:0px;padding:0px;border:0px;outline:0px;vertical-align:baseline;background:transparent"> int relay_close_stream(struct lttcomm_relayd_hdr *recv_hdr,</span></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)">        session->stream_count--;</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)">        assert(session->stream_count >= 0);</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)"> </div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+       /*</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * Remove the stream from the connection recv list since we are about to</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * flag it invalid and thus might be freed. This has to be done here since</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * only the control thread can do actions on that list.</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        *</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * Note that this stream might NOT be in the list but we have to try to</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * remove it here else this can race with the stream destruction freeing</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * the object and the connection destroy doing a use after free when</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        * deleting the remaining nodes in this list.</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+        */</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+       cds_list_del(&stream->recv_list);</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(221,255,221)">+</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)">        /* Check if we can close it or else the data will do it. */</div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)">        try_close_stream(session, stream);</div></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)"><br></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)"><br></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)"><br></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)"><br></div><div style="margin:0px;padding:0px;border:0px;outline:0px;font-size:13px;vertical-align:baseline;font-family:Consolas,'Bitstream Vera Sans Mono',monospace;white-space:pre-wrap;color:rgb(0,0,0);line-height:18.2000007629395px;background:rgb(248,248,248)"><br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 13, 2015 at 9:04 PM, Jérémie Galarneau <span dir="ltr"><<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Do you have a specific changeset in mind when you say you applied the changes mentioned in that thread?<div><br></div><div>Do you simply revert the following commit?</div><div><a href="https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f" target="_blank">https://github.com/lttng/lttng-tools/commit/1dc0526df43f2b5f86ef451e4c0331445346b15f</a></div><div><br></div><div>Thanks,</div><div>Jérémie<div><div><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jul 11, 2015 at 3:26 AM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">To try this fix, I also needed the changes talked about in <a href="http://lists.lttng.org/pipermail/lttng-dev/2015-July/024689.html" target="_blank">http://lists.lttng.org/pipermail/lttng-dev/2015-July/024689.html</a> , otherwise, I would see relayd coring.<div>Once I took them I could see that the soft hang issue was no longer reproducible. </div><div><br></div><div>Thanks for helping.</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 8, 2015 at 9:49 AM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Sure, I will try the fix and update.</div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 6, 2015 at 10:12 PM, Jérémie Galarneau <span dir="ltr"><<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Would you mind testing Mathieu's patch?<div><br></div><div>I have rebased it on stable-2.6:</div><div><div>git clone <a href="https://github.com/jgalar/lttng-tools.git" target="_blank">https://github.com/jgalar/lttng-tools.git</a> -b hang-fix<br></div></div><div><br></div><div><div>Thanks,</div><div>Jérémie</div></div><div><br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 6, 2015 at 11:23 AM, Jérémie Galarneau <span dir="ltr"><<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>What do you observe when this happens? Does the session daemon become unresponsive?</div><div><br></div><div>Jérémie</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Wed, May 27, 2015 at 6:19 AM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div><div dir="ltr">Request someone to kindly help with this. Blocked at this point, unable to continue as any application crash leads to lttng not working.<div><br></div><div>Thanks,</div><div>Aravind.</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 21, 2015 at 12:18 AM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><p class="MsoNormal" style="font-size:12.8000001907349px">Hi,</p><p class="MsoNormal" style="font-size:12.8000001907349px"> </p><p class="MsoNormal" style="font-size:12.8000001907349px">I have recently started trying lttng 2.6 for a few applications on my Ubuntu 12.04 and I noticed that the health check on sessiond and consumerd failed soon after starting the session.</p><p class="MsoNormal" style="font-size:12.8000001907349px"><br></p><p class="MsoNormal" style="font-size:12.8000001907349px">On investigating, I found that thread_manage_consumer() had exited causing an overall health check failure.</p><p class="MsoNormal" style="font-size:12.8000001907349px"><br></p><p class="MsoNormal" style="font-size:12.8000001907349px">Here are the sequence of steps that I found contributed to <b>thread_manage_consumer()</b> exiting.</p><p class="MsoNormal" style="font-size:12.8000001907349px"></p><p class="MsoNormal" style="font-size:12.8000001907349px"><br></p><p style="font-size:12.8000001907349px">1.<span style="font-stretch:normal;font-size:7pt;font-family:'Times New Roman'">       </span>In <b>thread_manage_apps() : 1558 , ust_app_unregister(pollfd)</b> is being called. This happens when there is an error detected by <b>revents = LTTNG_POLL_GETEV(&events, i)</b></p><p style="font-size:12.8000001907349px">My initial guess here is that as one of my apps has crashed, producing a <b>LPOLLERR | LPOLLHUP | LPOLLRDHUP </b>to be generated for <b>epoll()</b> causing <b>ust_app_unregister()</b> to be called for that app.</p><p style="font-size:12.8000001907349px"> </p><p style="font-size:12.8000001907349px">2.<span style="font-stretch:normal;font-size:7pt;font-family:'Times New Roman'">       </span>In <b>ust_app_unregister():3154 , close_metadata(registry, ua_sess->consumer)  </b>in which<b> registry->metadata_closed = 1 </b>is set<b>.</b></p><p style="font-size:12.8000001907349px;text-indent:0.5in"> </p><p style="font-size:12.8000001907349px;text-indent:0.5in">(2.a) Note:<b> close_metadata() </b>also calls<b> consumer_close_metadata() </b>which sends<b> LTTNG_CONSUMER_CLOSE_METADATA </b>and<b> metadata_key </b>to the consumerd to stop it from further dealing with the concerned app. Somehow this doesn’t seem to help<b>.</b></p><p style="font-size:12.8000001907349px"><b> </b></p><p style="font-size:12.8000001907349px">3.<span style="font-stretch:normal;font-size:7pt;font-family:'Times New Roman'">       </span>Next, I see that the <b>thread_manage_consumer():1353</b> for some reason has ignored the above 2.a and gets to do request/reply for that app by calling<b>ust_consumer_metadata_request():491 -> ust_app_push_metadat(ust_reg, socket,1) </b>which at line 460 checks for <b>registry->metadata_closed</b> and returns an <b>–EPIPE</b></p><p class="MsoNormal" style="font-size:12.8000001907349px"> </p><p style="font-size:12.8000001907349px">4.<span style="font-stretch:normal;font-size:7pt;font-family:'Times New Roman'">       </span>This <b>–EPIPE</b> error cascades all the way back up to <b>thread_manage_consumer():1353</b> at which point <b>thread_manage_consumer()</b> decides to exit causing health_check() to fail.</p><p style="font-size:12.8000001907349px"> </p><p class="MsoNormal" style="font-size:12.8000001907349px"> </p><p class="MsoNormal" style="font-size:12.8000001907349px">So it looks like under some scenario, an application crash could cause the lttng some problems.</p><p class="MsoNormal" style="font-size:12.8000001907349px"> </p><p class="MsoNormal" style="font-size:12.8000001907349px">I think a possible fix for this scenario, is to instead of 4, send an <b>ERROR</b> message back to <b>consumerd()</b> . This could be done from <b>ust_consumer_metadata_request()</b> call. Can someone please let me know if this is correct and shed more light on the issue ?</p><p class="MsoNormal" style="font-size:12.8000001907349px"> </p><p class="MsoNormal" style="font-size:12.8000001907349px"> </p><p class="MsoNormal" style="font-size:12.8000001907349px">Please forgive if there are any guideline omissions for posting here from my part. This is my first post.</p><p class="MsoNormal" style="font-size:12.8000001907349px"><br></p><p class="MsoNormal" style="font-size:12.8000001907349px">Regards,</p><p class="MsoNormal" style="font-size:12.8000001907349px">Aravind.</p></div>
</blockquote></div><br></div>
</div></div><br></div></div>_______________________________________________<br>
lttng-dev mailing list<br>
<a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a><br>
<a href="http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" rel="noreferrer" target="_blank">http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><br>
<br></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div>Jérémie Galarneau<br>EfficiOS Inc.<br><a href="http://www.efficios.com" target="_blank">http://www.efficios.com</a></div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Jérémie Galarneau<br>EfficiOS Inc.<br><a href="http://www.efficios.com" target="_blank">http://www.efficios.com</a></div>
</div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Jérémie Galarneau<br>EfficiOS Inc.<br><a href="http://www.efficios.com" target="_blank">http://www.efficios.com</a></div>
</div></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>