[lttng-dev] [RFC PATCH 1/2] Fix: unregister app notify socket on sessiond tear down
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Fri Aug 25 18:50:28 UTC 2017
----- On Aug 24, 2017, at 1:15 PM, Jonathan Rajotte jonathan.rajotte-julien at efficios.com wrote:
> A race between the sessiond tear down and applications initialization
> can lead to a deadlock.
>
> Applications try to communicate via the notify sockets while sessiond
> does not listen anymore on these sockets since the thread responsible
> for reception/response is terminated (ust_thread_manage_notify). These
> sockets are never closed hence an application could hang on
> communication.
>
> Sessiond hang happen during call to cmd_destroy_session during
> sessiond_cleanup. Sessiond is trying to communicate with the app while
> the app is waiting for a response on the app notification socket.
>
> To prevent this situation a call to ust_app_notify_sock_unregister is
> performed on all entry of the ust_app_ht_by_notify_sock hash table at
> the time of termination. This ensure that any pending communication
> initiated by the application will be terminated since all sockets will
> be closed at the end of the grace period via call_rcu inside
> ust_app_notify_sock_unregister. The use of ust_app_ht_by_notify_sock
> instead of the ust_app_ht prevent a double call_rcu since entries are
> removed from ust_app_ht_by_notify_sock during ust_app_notify_sock_unregister.
>
> RFC: ust_app_notify_sock_unregister might never call
> call_rcu(close_notify_sock_rcu) when no memory is available. Should we
> just issue a direct call to close() on app->notify_scoket when
> ust_thread_manage_notify quit?
I'm tempted to keep using ust_app_notify_sock_unregister similarly to
the rest. If the ENOMEM handling is an issue here, it's an issue
for other ust_app_notify_sock_unregister callers anyway, and would
need to be fixed through a larger refactoring (e.g. not allocating
objects in unregister functions).
>
> This can be reproduced using the sessiond_teardown_active_session
> scenario provided by [1].
>
> [1] https://github.com/PSRCode/lttng-stress
>
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
> src/bin/lttng-sessiond/ust-thread.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/src/bin/lttng-sessiond/ust-thread.c
> b/src/bin/lttng-sessiond/ust-thread.c
> index 7fb18a78..e605703a 100644
> --- a/src/bin/lttng-sessiond/ust-thread.c
> +++ b/src/bin/lttng-sessiond/ust-thread.c
> @@ -27,6 +27,19 @@
> #include "health-sessiond.h"
> #include "testpoint.h"
>
> +
> +static
> +void notify_sock_unregister_all()
missing (void) in function prototypes throughout this patch.
Unlike C++, () means (...) (var args) in C.
> +{
> + struct lttng_ht_iter iter;
> + struct ust_app *app;
missing newline.
Thanks,
Mathieu
> + rcu_read_lock();
> + cds_lfht_for_each_entry(ust_app_ht_by_notify_sock->ht, &iter.iter, app,
> notify_sock_n.node) {
> + ust_app_notify_sock_unregister(app->notify_sock);
> + }
> + rcu_read_unlock();
> +}
> +
> /*
> * This thread manage application notify communication.
> */
> @@ -179,6 +192,7 @@ restart:
>
> exit:
> error:
> + notify_sock_unregister_all();
> lttng_poll_clean(&events);
> error_poll_create:
> error_testpoint:
> --
> 2.11.0
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list