[lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure

Jonathan Rajotte-Julien jonathan.rajotte-julien at efficios.com
Tue May 31 09:11:24 EDT 2022


Hi Marcel,

This is exactly the kind of reproducer we are looking for.

Thanks for providing it. I'll try it out and check if we need anything more in that patch.

Cheers

----- Original Message -----
> From: "Marcel Hamer" <marcel.hamer at windriver.com>
> To: "jonathan rajotte-julien" <jonathan.rajotte-julien at efficios.com>
> Cc: "lttng-dev" <lttng-dev at lists.lttng.org>
> Sent: Tuesday, May 31, 2022 7:28:55 AM
> Subject: Re: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure

> Hello Jonathan,
> 
> On Mon, May 30, 2022 at 11:27:55AM -0400, Jonathan Rajotte-Julien wrote:
>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>> 
>> Hi Marcel,
>> 
>> Thanks for sending this patch.
>> 
>> Looks sensible to me, still do you have a reproducer for it? I went back to bug
>> 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force
>> the assert failure.
> 
> I can only reproduce it when running lttng-consumerd in a debugger
> environment, in my case gdb. My reproduction scenario is:
> 
> 1. Setting a breakpoint on snapshot_channel() inside
>   src/common/ust-consumer/ust-consumer.c
> 2. When the breakpoint hits, remove the the complete lttng directory
>   containing the session data.
> 3. Continue the lttng_consumerd process from gdb.
> 4. In that case you see a negative return value -1 from
>   consumer_stream_create_output_files() inside snapshot_channel().
> 5. Take another snapshot and you will see lttng_consumerd crash because
>   of the assert(!stream->trace_chunk); inside snapshot_channel(). This
>   last action does not require any breakpoint intervention.
> 
> The scenario seems to be very timing sensitive to reproduce. I do not
> have a clear command sequence to achieve the same error.
> 
> The proposed patch prevents lttng_consumerd from crashing in step 5.
> 
> Kind regards,
> 
> Marcel
> 
>> 
>> Cheers
>> 
>> ----- Original Message -----
>> > From: "Marcel Hamer via lttng-dev" <lttng-dev at lists.lttng.org>
>> > To: "lttng-dev" <lttng-dev at lists.lttng.org>
>> > Sent: Monday, 30 May, 2022 10:10:21
>> > Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure
>> 
>> > When a channel snapshot creation fails the stream should be cleaned up
>> > properly. If the stream is not closed and cleaned properly on a failure,
>> > the next time a snapshot is created an assert is triggered for:
>> >
>> >       assert(!stream->trace_chunk);
>> >
>> > inside the snapshot_channel function. Since the stream->trace_chunk was
>> > not reset to NULL. The reset to NULL happens inside the
>> > consumer_stream_close function.
>> >
>> > Fixes #1352
>> >
>> > Signed-off-by: Marcel Hamer <marcel.hamer at windriver.com>
>> > ---
>> > src/common/ust-consumer/ust-consumer.c | 10 +++++-----
>> > 1 file changed, 5 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/src/common/ust-consumer/ust-consumer.c
>> > b/src/common/ust-consumer/ust-consumer.c
>> > index f176ca40a..f43216829 100644
>> > --- a/src/common/ust-consumer/ust-consumer.c
>> > +++ b/src/common/ust-consumer/ust-consumer.c
>> > @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct
>> > lttng_consumer_channel *channel,
>> >               if (use_relayd) {
>> >                       ret = consumer_send_relayd_stream(stream, path);
>> >                       if (ret < 0) {
>> > -                             goto error_unlock;
>> > +                             goto error_close_stream;
>> >                       }
>> >               } else {
>> >                       ret = consumer_stream_create_output_files(stream,
>> >                                       false);
>> >                       if (ret < 0) {
>> > -                             goto error_unlock;
>> > +                             goto error_close_stream;
>> >                       }
>> >                       DBG("UST consumer snapshot stream (%" PRIu64 ")",
>> >                                       stream->key);
>> > @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct
>> > lttng_consumer_channel *channel,
>> >               ret = lttng_ustconsumer_take_snapshot(stream);
>> >               if (ret < 0) {
>> >                       ERR("Taking UST snapshot");
>> > -                     goto error_unlock;
>> > +                     goto error_close_stream;
>> >               }
>> >
>> >               ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos);
>> >               if (ret < 0) {
>> >                       ERR("Produced UST snapshot position");
>> > -                     goto error_unlock;
>> > +                     goto error_close_stream;
>> >               }
>> >
>> >               ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos);
>> >               if (ret < 0) {
>> >                       ERR("Consumerd UST snapshot position");
>> > -                     goto error_unlock;
>> > +                     goto error_close_stream;
>> >               }
>> >
>> >               /*
>> > --
>> > 2.25.1
>> >
>> > _______________________________________________
>> > lttng-dev mailing list
>> > lttng-dev at lists.lttng.org
> > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


More information about the lttng-dev mailing list