[lttng-dev] [RFC PATCH 1/2] Fix: unregister app notify socket on sessiond tear down

Jonathan Rajotte jonathan.rajotte-julien at efficios.com
Thu Aug 24 20:15:06 UTC 2017


A race between the sessiond tear down and applications initialization
can lead to a deadlock.

Applications try to communicate via the notify sockets while sessiond
does not listen anymore on these sockets since the thread responsible
for reception/response is terminated (ust_thread_manage_notify). These
sockets are never closed hence an application could hang on
communication.

Sessiond hang happen during call to cmd_destroy_session during
sessiond_cleanup. Sessiond is trying to communicate with the app while
the app is waiting for a response on the app notification socket.

To prevent this situation a call to ust_app_notify_sock_unregister is
performed on all entry of the ust_app_ht_by_notify_sock hash table at
the time of termination. This ensure that any pending communication
initiated by the application will be terminated since all sockets will
be closed at the end of the grace period via call_rcu inside
ust_app_notify_sock_unregister. The use of ust_app_ht_by_notify_sock
instead of the ust_app_ht prevent a double call_rcu since entries are
removed from ust_app_ht_by_notify_sock during ust_app_notify_sock_unregister.

RFC: ust_app_notify_sock_unregister might never call
call_rcu(close_notify_sock_rcu) when no memory is available. Should we
just issue a direct call to close() on app->notify_scoket  when
ust_thread_manage_notify quit?

This can be reproduced using the sessiond_teardown_active_session
scenario provided by [1].

[1] https://github.com/PSRCode/lttng-stress

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
---
 src/bin/lttng-sessiond/ust-thread.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/bin/lttng-sessiond/ust-thread.c b/src/bin/lttng-sessiond/ust-thread.c
index 7fb18a78..e605703a 100644
--- a/src/bin/lttng-sessiond/ust-thread.c
+++ b/src/bin/lttng-sessiond/ust-thread.c
@@ -27,6 +27,19 @@
 #include "health-sessiond.h"
 #include "testpoint.h"
 
+
+static
+void notify_sock_unregister_all()
+{
+	struct lttng_ht_iter iter;
+	struct ust_app *app;
+	rcu_read_lock();
+	cds_lfht_for_each_entry(ust_app_ht_by_notify_sock->ht, &iter.iter, app, notify_sock_n.node) {
+		ust_app_notify_sock_unregister(app->notify_sock);
+	}
+	rcu_read_unlock();
+}
+
 /*
  * This thread manage application notify communication.
  */
@@ -179,6 +192,7 @@ restart:
 
 exit:
 error:
+	notify_sock_unregister_all();
 	lttng_poll_clean(&events);
 error_poll_create:
 error_testpoint:
-- 
2.11.0



More information about the lttng-dev mailing list