[lttng-dev] [PATCH lttng-tools] Fix: check validity of a stream before invoking ust flush command

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Wed Aug 28 16:09:45 EDT 2019


----- On Aug 28, 2019, at 3:26 PM, Jonathan Rajotte jonathan.rajotte-julien at efficios.com wrote:

> At the time ustctl_flush_buffer is called the ustream object might have
> already been freed on lttng-ust side.
> 
> This can happen following a lttng_consumer_cleanup_relayd and concurrent
> consumer flush command (lttng stop).
> 
> The train of events goes as follows.

train -> sequence

> 
> An error on communication with lttng-relayd occurs.
> lttng_consumer_cleanup_relayd flags the streams for deletion
> (CONSUMER_ENDPOINT_INACTIVE). validate_endpoint_status_data_stream calls
> consumer_del_stream.
> 
> At the same time the hash table of streams is iterated over in the
> flush_channel function following a stop command. The loop is iterating on
> a given stream. The current thread is unscheduled before taking the stream
> lock.
> 
> In the initial thread, the same stream is the current iteration of
> cds_lfht_for_each_entry in validate_endpoint_status_data_stream.
> 
> consumer_del_stream is called on it. The stream lock is acquired, and
> destroy_close_stream is called. lttng_ustconsumer_del_stream is eventually
> called and at this point the ustream is freed.
> 
> Going back to the iteration in flush_channel. The current stream is still
> valid from the point of view of the iteration, ustctl_flush_buffer is then
> called on a freed ustream object.
> 
> This can lead to unknown behaviour since there is no validation on

..on the..

> lttng-ust side. The underlying memory of the ustream object is garbage at
> this point.
> 
> To prevent such scenario, we check for the presence of the node in the
> hash table via cds_lfht_is_node_deleted

...while holding the stream lock.

. This is valid because the node is
> removed from the hash table before deleting the ustream object on
> lttng-ust side.

Rephrase to:

This is valid because the stream destruction both removes the node from
the hash table and frees the ustream object with the stream lock held.

> The removal from the hash table also requires the stream
> lock ensuring the validity of cds_lfht_is_node_deleted return value.

This last sentence can therefore be removed (becomes redundant).

With those commit message changes:

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>

Thanks!

Mathieu

> 
> This duplicate similar "validation" check of the stream object. [1][2]
> 
> [1] src/common/consumer/consumer.c:consumer_close_channel_streams
> [2] src/common/ust-consumer/ust-consumer.c:close_metadata
> 
> This issue can be reproduced by the following scenario:
> 
>    Modify flush_channel to sleep (i.e 10s) before acquiring the lock on
>    a stream.
> 
>    Modify lttng-ust ustctl_destroy_stream to set the
>    ring_buffer_clock_read callback to NULL.
>      Note: An assert on !cds_lfht_is_node_deleted in flush channel
>      after acquiring the lock can provide the same information. We are
>      modifying the callback to simulate the original backtrace from our
>      customer.
> 
>    lttng-relayd
>    lttng-sessiond
>    lttng create --live
>    lttng enable-event -u -a
>    lttng start
>    Start some applications to generate data.
>    lttng stop
>      The stop command force a flush of the channel/streams.
>    pkill -9 lttng-relayd
>    Expect assert or segfault
> 
> The original customer backtrace:
> 
>  0  lib_ring_buffer_try_switch_slow (handle=<optimized out>, tsc=<synthetic
>  pointer>, offsets=0x3fffa9b76c80, chan=0x3fff98006e90, buf=<optimized out>,
>     mode=<optimized out>) at
>     /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1834
>  1  lib_ring_buffer_switch_slow (buf=0x3fff98016b40, mode=<optimized out>,
>  handle=0x3fff98017670)
>     at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1952
>  2  0x00003fffac680940 in ustctl_flush_buffer (stream=<optimized out>,
>  producer_active=<optimized out>)
>     at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1568
>  3  0x0000000010031bc8 in flush_channel (chan_key=<optimized out>) at
>  ust-consumer.c:772
>  4  lttng_ustconsumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>,
>  consumer_sockpoll=<optimized out>) at ust-consumer.c:1651
>  5  0x000000001000de50 in lttng_consumer_recv_cmd (ctx=<optimized out>,
>  sock=<optimized out>, consumer_sockpoll=<optimized out>) at consumer.c:2011
>  6  0x0000000010014208 in consumer_thread_sessiond_poll (data=0x10079430) at
>  consumer.c:3192
>  7  0x00003fffac608b30 in start_thread (arg=0x3fffa9b7bdb0) at
>  pthread_create.c:462
>  8  0x00003fffac530d0c in .__clone () at
>  ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
> 
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
> src/common/ust-consumer/ust-consumer.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
> 
> diff --git a/src/common/ust-consumer/ust-consumer.c
> b/src/common/ust-consumer/ust-consumer.c
> index 94b761cb8..f2e62d0ee 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -764,10 +764,19 @@ static int flush_channel(uint64_t chan_key)
> 		health_code_update();
> 
> 		pthread_mutex_lock(&stream->lock);
> +
> +		/*
> +		 * Protect against concurrent teardown of a stream.
> +		 */
> +		if (cds_lfht_is_node_deleted(&stream->node.node)) {
> +			goto next;
> +		}
> +
> 		if (!stream->quiescent) {
> 			ustctl_flush_buffer(stream->ustream, 0);
> 			stream->quiescent = true;
> 		}
> +next:
> 		pthread_mutex_unlock(&stream->lock);
> 	}
> error:
> --
> 2.17.1
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list