[lttng-dev] [PATCH lttng-tools v2] Fix: check validity of a stream before invoking ust flush command

Jérémie Galarneau jeremie.galarneau at efficios.com
Thu Sep 5 17:29:13 EDT 2019


Merged in master, stable-2.11, stable-2.10, and stable-2.9.

Thanks!
Jérémie

On Wed, Aug 28, 2019 at 04:36:03PM -0400, Jonathan Rajotte wrote:
> At the time ustctl_flush_buffer is called the ustream object might have
> already been freed on lttng-ust side.
> 
> This can happen following a lttng_consumer_cleanup_relayd and concurrent
> consumer flush command (lttng stop).
> 
> The chain of events goes as follows.
> 
> An error on communication with lttng-relayd occurs.
> lttng_consumer_cleanup_relayd flags the streams for deletion
> (CONSUMER_ENDPOINT_INACTIVE). validate_endpoint_status_data_stream calls
> consumer_del_stream.
> 
> At the same time the hash table of streams is iterated over in the
> flush_channel function following a stop command. The loop is iterating on
> a given stream. The current thread is unscheduled before taking the stream
> lock.
> 
> In the initial thread, the same stream is the current iteration of
> cds_lfht_for_each_entry in validate_endpoint_status_data_stream.
> 
> consumer_del_stream is called on it. The stream lock is acquired, and
> destroy_close_stream is called. lttng_ustconsumer_del_stream is eventually
> called and at this point the ustream is freed.
> 
> Going back to the iteration in flush_channel. The current stream is still
> valid from the point of view of the iteration, ustctl_flush_buffer is then
> called on a freed ustream object.
> 
> This can lead to unknown behaviour since there is no validation on the
> lttng-ust side. The underlying memory of the ustream object is garbage at
> this point.
> 
> To prevent such scenario, we check for the presence of the node in the
> hash table via cds_lfht_is_node_deleted while holding the stream lock.
> This is valid because the stream destruction removes the node from
> the hash table and frees the ustream object with the stream lock held.
> 
> This duplicate similar "validation" check of the stream object. [1][2]
> 
> [1] src/common/consumer/consumer.c:consumer_close_channel_streams
> [2] src/common/ust-consumer/ust-consumer.c:close_metadata
> 
> This issue can be reproduced by the following scenario:
> 
>     Modify flush_channel to sleep (i.e 10s) before acquiring the lock on
>     a stream.
> 
>     Modify lttng-ust ustctl_destroy_stream to set the
>     ring_buffer_clock_read callback to NULL.
>       Note: An assert on !cds_lfht_is_node_deleted in flush channel
>       after acquiring the lock can provide the same information. We are
>       modifying the callback to simulate the original backtrace from our
>       customer.
> 
>     lttng-relayd
>     lttng-sessiond
>     lttng create --live
>     lttng enable-event -u -a
>     lttng start
>     Start some applications to generate data.
>     lttng stop
>       The stop command force a flush of the channel/streams.
>     pkill -9 lttng-relayd
>     Expect assert or segfault
> 
> The original customer backtrace:
> 
>   0  lib_ring_buffer_try_switch_slow (handle=<optimized out>, tsc=<synthetic pointer>, offsets=0x3fffa9b76c80, chan=0x3fff98006e90, buf=<optimized out>,
>      mode=<optimized out>) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1834
>   1  lib_ring_buffer_switch_slow (buf=0x3fff98016b40, mode=<optimized out>, handle=0x3fff98017670)
>      at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1952
>   2  0x00003fffac680940 in ustctl_flush_buffer (stream=<optimized out>, producer_active=<optimized out>)
>      at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1568
>   3  0x0000000010031bc8 in flush_channel (chan_key=<optimized out>) at ust-consumer.c:772
>   4  lttng_ustconsumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at ust-consumer.c:1651
>   5  0x000000001000de50 in lttng_consumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at consumer.c:2011
>   6  0x0000000010014208 in consumer_thread_sessiond_poll (data=0x10079430) at consumer.c:3192
>   7  0x00003fffac608b30 in start_thread (arg=0x3fffa9b7bdb0) at pthread_create.c:462
>   8  0x00003fffac530d0c in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien at efficios.com>
> ---
>  src/common/ust-consumer/ust-consumer.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c
> index 94b761cb8..f2e62d0ee 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -764,10 +764,19 @@ static int flush_channel(uint64_t chan_key)
>  		health_code_update();
>  
>  		pthread_mutex_lock(&stream->lock);
> +
> +		/*
> +		 * Protect against concurrent teardown of a stream.
> +		 */
> +		if (cds_lfht_is_node_deleted(&stream->node.node)) {
> +			goto next;
> +		}
> +
>  		if (!stream->quiescent) {
>  			ustctl_flush_buffer(stream->ustream, 0);
>  			stream->quiescent = true;
>  		}
> +next:
>  		pthread_mutex_unlock(&stream->lock);
>  	}
>  error:
> -- 
> 2.17.1
> 


More information about the lttng-dev mailing list