[lttng-dev] [RFC PATCH 1/2] Fix: unregister app notify socket on sessiond tear down

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Fri Aug 25 18:50:28 UTC 2017


----- On Aug 24, 2017, at 1:15 PM, Jonathan Rajotte jonathan.rajotte-julien at efficios.com wrote:

> A race between the sessiond tear down and applications initialization
> can lead to a deadlock.
> 
> Applications try to communicate via the notify sockets while sessiond
> does not listen anymore on these sockets since the thread responsible
> for reception/response is terminated (ust_thread_manage_notify). These
> sockets are never closed hence an application could hang on
> communication.
> 
> Sessiond hang happen during call to cmd_destroy_session during
> sessiond_cleanup. Sessiond is trying to communicate with the app while
> the app is waiting for a response on the app notification socket.
> 
> To prevent this situation a call to ust_app_notify_sock_unregister is
> performed on all entry of the ust_app_ht_by_notify_sock hash table at
> the time of termination. This ensure that any pending communication
> initiated by the application will be terminated since all sockets will
> be closed at the end of the grace period via call_rcu inside
> ust_app_notify_sock_unregister. The use of ust_app_ht_by_notify_sock
> instead of the ust_app_ht prevent a double call_rcu since entries are
> removed from ust_app_ht_by_notify_sock during ust_app_notify_sock_unregister.
> 
> RFC: ust_app_notify_sock_unregister might never call
> call_rcu(close_notify_sock_rcu) when no memory is available. Should we
> just issue a direct call to close() on app->notify_scoket  when
> ust_thread_manage_notify quit?

I'm tempted to keep using ust_app_notify_sock_unregister similarly to
the rest. If the ENOMEM handling is an issue here, it's an issue
for other ust_app_notify_sock_unregister callers anyway, and would
need to be fixed through a larger refactoring (e.g. not allocating
objects in unregister functions).

> 
> This can be reproduced using the sessiond_teardown_active_session
> scenario provided by [1].
> 
> [1] https://github.com/PSRCode/lttng-stress
> 
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
> src/bin/lttng-sessiond/ust-thread.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
> 
> diff --git a/src/bin/lttng-sessiond/ust-thread.c
> b/src/bin/lttng-sessiond/ust-thread.c
> index 7fb18a78..e605703a 100644
> --- a/src/bin/lttng-sessiond/ust-thread.c
> +++ b/src/bin/lttng-sessiond/ust-thread.c
> @@ -27,6 +27,19 @@
> #include "health-sessiond.h"
> #include "testpoint.h"
> 
> +
> +static
> +void notify_sock_unregister_all()

missing (void) in function prototypes throughout this patch.

Unlike C++, () means (...) (var args) in C.


> +{
> +	struct lttng_ht_iter iter;
> +	struct ust_app *app;

missing newline.

Thanks,

Mathieu

> +	rcu_read_lock();
> +	cds_lfht_for_each_entry(ust_app_ht_by_notify_sock->ht, &iter.iter, app,
> notify_sock_n.node) {
> +		ust_app_notify_sock_unregister(app->notify_sock);
> +	}
> +	rcu_read_unlock();
> +}
> +
> /*
>  * This thread manage application notify communication.
>  */
> @@ -179,6 +192,7 @@ restart:
> 
> exit:
> error:
> +	notify_sock_unregister_all();
> 	lttng_poll_clean(&events);
> error_poll_create:
> error_testpoint:
> --
> 2.11.0
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list