[lttng-dev] [PATCH lttng-tools] Fix: consumer-stream: use-after-free of metadata bucket
Jérémie Galarneau
jeremie.galarneau at efficios.com
Mon Mar 7 12:37:49 EST 2022
Hi Vincent,
I had a chance to look into this and came up with the following fix:
https://review.lttng.org/c/lttng-tools/+/7478/4
Would you have a chance to try it on your end before I merge it?
Thanks for the great bug report!
Jérémie
----- Original Message -----
> From: "Vincent Whitchurch" <vincent.whitchurch at axis.com>
> To: "Jeremie Galarneau" <jeremie.galarneau at efficios.com>
> Cc: "lttng-dev" <lttng-dev at lists.lttng.org>, "kernel" <kernel at axis.com>
> Sent: Wednesday, March 2, 2022 4:27:30 AM
> Subject: Re: [lttng-dev] [PATCH lttng-tools] Fix: consumer-stream: use-after-free of metadata bucket
> On Tue, Mar 01, 2022 at 06:19:23PM +0100, Jérémie Galarneau wrote:
>> Thanks a lot for reporting the problem. If I understand the ASAN
>> report correctly, the stream itself will also be double free'd, so
>> I don't think this is the complete fix.
>
> Yeah, it looked odd that consumer_stream_destroy() is called recursively
> on the same stream but AFAICS the code's been like this for a while so I
> assumed it was on purpose, and only the metadata bucket stuff was
> relatively new. ASAN doesn't detect any double frees of the stream
> itself, but I guess calling call_rcu(..., free_stream_rcu) twice on the
> same stream is not expected behaviour and could lead to other problems.
>
>> There definitely seems to be a problem with regards to the ownership
>> of the metadata channel vs stream. Let me look into it.
>
> Great, thank you!
>
>> I see that you fall into a case where the metadata setup fails,
>> can you share more info about how this can be reproduced?
>
> In the core dump I received (on v2.12.4), consumer_stream_destroy() was
> called from the error label in setup_metadata and ret was set to
> LTTCOMM_CONSUMERD_ERROR_METADATA. So consumer_send_relayd_stream() had
> returned an error. I only had the core dump and no other logs, so I did
> not know which of the paths inside consumer_send_relayd_stream() had
> failed, but since I was primarily interested in fixing the crash itself
> I simply forced this code path to be taken:
>
> diff --git a/src/common/ust-consumer/ust-consumer.c
> b/src/common/ust-consumer/ust-consumer.c
> index fa1c71299..97ed59632 100644
> --- a/src/common/ust-consumer/ust-consumer.c
> +++ b/src/common/ust-consumer/ust-consumer.c
> @@ -908,8 +908,7 @@ static int setup_metadata(struct lttng_consumer_local_data
> *ctx, uint64_t key)
>
> /* Send metadata stream to relayd if needed. */
> if (metadata->metadata_stream->net_seq_idx != (uint64_t) -1ULL) {
> - ret = consumer_send_relayd_stream(metadata->metadata_stream,
> - metadata->pathname);
> + ret = -1;
> if (ret < 0) {
> ret = LTTCOMM_CONSUMERD_ERROR_METADATA;
> goto error;
>
> With the above patch, I could easily reproduce the use-after-free using
> the following steps on the latest release v2.13.4, and it was clear that
> this use-after-free was the cause of the original core dump on the older
> release too.
>
> Build with ASAN:
>
> lttng-tools$ LDFLAGS=-fsanitize=address CFLAGS=-fsanitize=address ./configure
>
> Shell #1:
>
> lttng-ust$ tests/compile/api0/hello/hello 10000
>
> Shell #2:
>
> lttng-tools$ ASAN_OPTIONS=detect_odr_violation=0
> ./src/bin/lttng-sessiond/lttng-sessiond
>
> Shell #3:
>
> lttng-tools$ export ASAN_OPTIONS=detect_odr_violation=0
> lttng-tools$ ./src/bin/lttng/lttng create --live && ./src/bin/lttng/lttng
> enable-event --userspace 1 && ./src/bin/lttng/lttng start && sleep 1 &&
> ./src/bin/lttng/lttng stop
>
> The ASAN splat should be seen in shell #2. Note that you may have to
> run the command in shell #3 a couple of times since
> LTTNG_CONSUMER_SETUP_METADATA doesn't seem to be sent every time.
More information about the lttng-dev
mailing list