[lttng-dev] [RFC PATCH v2 10/13] Teardown apps_notify_thread before thread_manage_apps

Jérémie Galarneau jeremie.galarneau at efficios.com
Thu Dec 14 01:57:24 UTC 2017


Sorry, I fat-finger-sent an incomplete reply, disregard it. This e-mail
contains all the comments.

On 18 September 2017 at 18:52, Jonathan Rajotte
<jonathan.rajotte-julien at efficios.com> wrote:
> RFC:
> This is necessary since we use the ust_app_ht_by_notify_sock to
> perform the cleanup operation. Since both ust_app_unregister and
> ust_app_ust_app_notify_unregister perform a remove of the app on the
> ust_ust_app_by_notify_sock but only ust_app_notify_unregister actually

There are some extra ust_ and ust_app_ on the last two lines ;)

> end up performing a close(call_rcu) on the socket.

a close(call_rcu) ?

>
> Other way to do fix this problem?
>

Let me walk through the problem as I understand it and
we'll see if/where I am going off-course.

An ust_app has two sockets:
  - command (through which the sessiond sends commands to the app)
  - notify (through which the application notifies the sessiond of
    various conditions such as new events being available, etc.)

These two sockets are received from applications by the
reg_apps_thread [1] which posts them to the dispatch_thread [2].
Once the dispatch thread is aware of both sockets for a given app,
the two sockets are "dispatched" to their respective management
threads.

* The "command" socket is sent to the "apps_thread" [3]
* The "notify" socket is sent to the "apps_notify_thread" [4]

Let's look at what happens when an application dies.

The "apps_thread" (handles application command sockets):
- wakes up from its poll() and notices an error/hang-up has
  occurred on an application's command socket
  - calls ust_app_unregister() on with that socket's FD
    - retrieves the ust_app from the socket's FD through
      ust_app_ht_by_sock
      - flushes that application's buffers and metadata
      - removes the app from ust_app_ht_by_sock
      - removes the app from ust_app_ht_by_notify_sock
        (it's okay for this to fail)
      - removes the app from ust_app_ht (pid <-> ust_app)
      - enqueues a call_rcu() job to close the command socket

The "apps_notify_thread" (handles application notification sockets):
- wakes up from its poll() and notices an error/hand-up has
  occurred on an application's notification socket
  - calls ust_app_notify_sock_unregister()
    - removes the app from ust_app_ht_by_notify_sock
      (it's okay for this to fail since both threads are trying
       to clean this up)
    - call_rcu() to close the notify socket


Now, provided that I understand the problem correctly
(as stated in my reply to patch #06), you want to clean-up
the command and notify sockets in the same way that they
would be if their associated apps had died. That's fair.

However, as seen above, cleaning up the "apps_thread" will cause
it to empty all the ust_app data structures:
  * ust_app_ht_by_sock
  * ust_app_ht_by_notify_sock
  * ust_app_ht

However, the "apps_notify_thread" needs at least one of those
data structures to be present to iterate on all of the
applications and perform its clean-up.

Hence, you want to make sure that the "apps_notify_thread" can
complete its shutdown before the "apps_thread" starts its own
clean-up. Correct?

I would be okay with reordering the threads' teardown to
provide that guarantee. My only gripe is that it needs to
be documented _extensively_ as it is not obvious at all that
such a dependency exists between those threads.

On the other hand, it seems to me that it would be simpler
to _not_ perform the clean-up you added in the "apps_thread"
and leave that to the sessiond_cleanup(), once all threads
have been torn down.

Not performing the clean-up in the "apps_thread" allows the
clean-up (that you added) to occur in the "apps_notify_thread"
as the data structures (such as ust_app_ht) are still in place.

As a result of the "apps_notify_thread" clean-up, the notify
sockets would eventually be closed by through the call_rcu()'s
during the next grace period. This will then unblock the
applications that are stuck waiting on their notify socket.

Would that solve the problem or am I missing something?

Jérémie

[1] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L2005
[2] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L1749
[3] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/main.c#L1450
[4] https://github.com/lttng/lttng-tools/blob/master/src/bin/lttng-sessiond/ust-thread.c#L33


> Could we simply not remove it on a ust_app_unregister? And always defer
> to the apps_notify_thread for cleanup?
>
> Update the value in the hash table to -1 and emit a close and remove
> from the hash table if the value is -1?
>
> We could also keep a local list of fd in apps_notify_thread and use it for
> cleanup instead of relying on ust_ust_app_by_notify_sock.
>
> I'm not sure what is the best/elegant solution here. I am not a fan of
> the current solution but it working.
>
> Obviously this commit will be reworded and modified accordingly.
>
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
>  src/bin/lttng-sessiond/main.c | 55 ++++++++++++++++++++++++-------------------
>  1 file changed, 31 insertions(+), 24 deletions(-)
>
> diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c
> index 4a2a661f..216d0da6 100644
> --- a/src/bin/lttng-sessiond/main.c
> +++ b/src/bin/lttng-sessiond/main.c
> @@ -6209,16 +6209,6 @@ int main(int argc, char **argv)
>         }
>         notification_thread_running = true;
>
> -       /* Create thread to manage application notify socket */
> -       ret = pthread_create(&apps_notify_thread, default_pthread_attr(),
> -                       ust_thread_manage_notify, (void *) NULL);
> -       if (ret) {
> -               errno = ret;
> -               PERROR("pthread_create notify");
> -               retval = -1;
> -               stop_threads();
> -               goto exit_apps_notify;
> -       }
>
>         /* Create thread to manage application socket */
>         ret = pthread_create(&apps_thread, default_pthread_attr(),
> @@ -6231,6 +6221,17 @@ int main(int argc, char **argv)
>                 goto exit_apps;
>         }
>
> +       /* Create thread to manage application notify socket */
> +       ret = pthread_create(&apps_notify_thread, default_pthread_attr(),
> +                       ust_thread_manage_notify, (void *) NULL);
> +       if (ret) {
> +               errno = ret;
> +               PERROR("pthread_create notify");
> +               retval = -1;
> +               stop_threads();
> +               goto exit_apps_notify;
> +       }
> +
>         /* Create thread to dispatch registration */
>         ret = pthread_create(&dispatch_thread, default_pthread_attr(),
>                         thread_dispatch_ust_registration, (void *) NULL);
> @@ -6358,20 +6359,6 @@ exit_reg_apps:
>         }
>
>  exit_dispatch:
> -       /* Instruct the apps thread to quit */
> -       ret = notify_thread_pipe(thread_apps_teardown_trigger_pipe[1]);
> -       if (ret < 0) {
> -               ERR("write error on thread quit pipe");
> -       }
> -
> -       ret = pthread_join(apps_thread, &status);
> -       if (ret) {
> -               errno = ret;
> -               PERROR("pthread_join apps");
> -               retval = -1;
> -       }
> -
> -exit_apps:
>         /* Instruct the apps_notify thread to quit */
>         ret = notify_thread_pipe(thread_apps_notify_teardown_trigger_pipe[1]);
>         if (ret < 0) {
> @@ -6386,6 +6373,26 @@ exit_apps:
>         }
>
>  exit_apps_notify:
> +       /*
> +        * The barrier ensure that all previous resources, notify sockets in
> +        * particular, are freed/closed.
> +        */
> +       rcu_barrier();
> +
> +       /* Instruct the apps thread to quit */
> +       ret = notify_thread_pipe(thread_apps_teardown_trigger_pipe[1]);
> +       if (ret < 0) {
> +               ERR("write error on thread quit pipe");
> +       }
> +
> +       ret = pthread_join(apps_thread, &status);
> +       if (ret) {
> +               errno = ret;
> +               PERROR("pthread_join apps");
> +               retval = -1;
> +       }
> +
> +exit_apps:
>  exit_notification:
>  exit_health:
>  exit_init_data:
> --
> 2.11.0
>



-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list