From bruce.ashfield at gmail.com Thu May 19 11:02:57 2022 From: bruce.ashfield at gmail.com (bruce.ashfield at gmail.com) Date: Thu, 19 May 2022 11:02:57 -0400 Subject: [lttng-dev] [PATCH] sched/tracing: fix __trace_sched_switch_state (5.18-rc7+) Message-ID: <20220519150257.26136-1-bruce.ashfield@gmail.com> From: Bruce Ashfield The commit [fix: sched/tracing: Don't re-read p->state when emitting sched_switch event (v5.18)] was correct, but the kernel changed their mind with the following commit: commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d Author: Delyan Kratunov Date: Wed May 11 18:28:36 2022 +0000 sched/tracing: Append prev_state to tp args instead Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20) added a new prev_state argument to the sched_switch tracepoint, before the prev task_struct pointer. This reordering of arguments broke BPF programs that use the raw tracepoint (e.g. tp_btf programs). The type of the second argument has changed and existing programs that assume a task_struct* argument (e.g. for bpf_task_storage access) will now fail to verify. If we instead append the new argument to the end, all existing programs would continue to work and can conditionally extract the prev_state argument on supported kernel versions. Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting sched_switch event, 2022-01-20) Signed-off-by: Delyan Kratunov Signed-off-by: Peter Zijlstra (Intel) Acked-by: Steven Rostedt (Google) Link: https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel at fb.com By reordering the parameters (again) we can get back up and building. Signed-off-by: Bruce Ashfield --- Hi all, This is more than likely NOT a correct fix, but while working on the yocto -dev reference kernel, I ran into this build failure against 5.18-rc7. I didn't see any sign of another fix on the mailing list, so I wanted to send this in case anyone else runs into the failure and to check to see if there's a better fix in progress. Cheers, Bruce include/instrumentation/events/sched.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/include/instrumentation/events/sched.h b/include/instrumentation/events/sched.h index 339bec9..f9e9c38 100644 --- a/include/instrumentation/events/sched.h +++ b/include/instrumentation/events/sched.h @@ -23,8 +23,9 @@ #if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(5,18,0)) static inline long __trace_sched_switch_state(bool preempt, - unsigned int prev_state, - struct task_struct *p) + struct task_struct *p, + struct task_struct *n, + unsigned int prev_state ) { unsigned int state; @@ -356,20 +357,20 @@ LTTNG_TRACEPOINT_EVENT_INSTANCE(sched_wakeup_template, sched_wakeup_new, LTTNG_TRACEPOINT_EVENT(sched_switch, TP_PROTO(bool preempt, - unsigned int prev_state, struct task_struct *prev, - struct task_struct *next), + struct task_struct *next, + unsigned int prev_state), - TP_ARGS(preempt, prev_state, prev, next), + TP_ARGS(preempt, prev, next, prev_state), TP_FIELDS( ctf_array_text(char, prev_comm, prev->comm, TASK_COMM_LEN) ctf_integer(pid_t, prev_tid, prev->pid) ctf_integer(int, prev_prio, prev->prio - MAX_RT_PRIO) #ifdef CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM - ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, prev_state, prev)) + ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, prev, next, prev_state)) #else - ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev_state, prev)) + ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev, next, prev_state)) #endif ctf_array_text(char, next_comm, next->comm, TASK_COMM_LEN) ctf_integer(pid_t, next_tid, next->pid) -- 2.19.1 From mathieu.desnoyers at efficios.com Thu May 19 11:31:02 2022 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 19 May 2022 11:31:02 -0400 (EDT) Subject: [lttng-dev] [PATCH] sched/tracing: fix __trace_sched_switch_state (5.18-rc7+) In-Reply-To: <20220519150257.26136-1-bruce.ashfield@gmail.com> References: <20220519150257.26136-1-bruce.ashfield@gmail.com> Message-ID: <1908357724.62379.1652974262072.JavaMail.zimbra@efficios.com> ----- On May 19, 2022, at 11:02 AM, Bruce Ashfield via lttng-dev lttng-dev at lists.lttng.org wrote: > From: Bruce Ashfield > > The commit [fix: sched/tracing: Don't re-read p->state when emitting > sched_switch event (v5.18)] was correct, but the kernel changed their > mind with the following commit: > > commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d > Author: Delyan Kratunov > Date: Wed May 11 18:28:36 2022 +0000 > > sched/tracing: Append prev_state to tp args instead > > Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting > sched_switch event, 2022-01-20) added a new prev_state argument to the > sched_switch tracepoint, before the prev task_struct pointer. > > This reordering of arguments broke BPF programs that use the raw > tracepoint (e.g. tp_btf programs). The type of the second argument has > changed and existing programs that assume a task_struct* argument > (e.g. for bpf_task_storage access) will now fail to verify. > > If we instead append the new argument to the end, all existing programs > would continue to work and can conditionally extract the prev_state > argument on supported kernel versions. > > Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting > sched_switch event, 2022-01-20) > Signed-off-by: Delyan Kratunov > Signed-off-by: Peter Zijlstra (Intel) > Acked-by: Steven Rostedt (Google) > Link: > https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel at fb.com > > By reordering the parameters (again) we can get back up and building. > > Signed-off-by: Bruce Ashfield > --- > > Hi all, > > This is more than likely NOT a correct fix, but while working on the > yocto -dev reference kernel, I ran into this build failure against > 5.18-rc7. > > I didn't see any sign of another fix on the mailing list, so I wanted > to send this in case anyone else runs into the failure and to check > to see if there's a better fix in progress. Hi Bruce, Please see: https://review.lttng.org/c/lttng-modules/+/8045 I was planning to merge it today, unless you have objections. I notice that in your own patch, you modify the arguments passed to __trace_sched_switch_state(), althrough I don't see them changing in the upstream kernel. Am I missing something ? Thanks, Mathieu > > Cheers, > > Bruce > > > include/instrumentation/events/sched.h | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/include/instrumentation/events/sched.h > b/include/instrumentation/events/sched.h > index 339bec9..f9e9c38 100644 > --- a/include/instrumentation/events/sched.h > +++ b/include/instrumentation/events/sched.h > @@ -23,8 +23,9 @@ > #if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(5,18,0)) > > static inline long __trace_sched_switch_state(bool preempt, > - unsigned int prev_state, > - struct task_struct *p) > + struct task_struct *p, > + struct task_struct *n, > + unsigned int prev_state ) > { > unsigned int state; > > @@ -356,20 +357,20 @@ LTTNG_TRACEPOINT_EVENT_INSTANCE(sched_wakeup_template, > sched_wakeup_new, > LTTNG_TRACEPOINT_EVENT(sched_switch, > > TP_PROTO(bool preempt, > - unsigned int prev_state, > struct task_struct *prev, > - struct task_struct *next), > + struct task_struct *next, > + unsigned int prev_state), > > - TP_ARGS(preempt, prev_state, prev, next), > + TP_ARGS(preempt, prev, next, prev_state), > > TP_FIELDS( > ctf_array_text(char, prev_comm, prev->comm, TASK_COMM_LEN) > ctf_integer(pid_t, prev_tid, prev->pid) > ctf_integer(int, prev_prio, prev->prio - MAX_RT_PRIO) > #ifdef CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM > - ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, > prev_state, prev)) > + ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, > prev, next, prev_state)) > #else > - ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev_state, > prev)) > + ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev, next, > prev_state)) > #endif > ctf_array_text(char, next_comm, next->comm, TASK_COMM_LEN) > ctf_integer(pid_t, next_tid, next->pid) > -- > 2.19.1 > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com From bruce.ashfield at gmail.com Thu May 19 12:01:41 2022 From: bruce.ashfield at gmail.com (Bruce Ashfield) Date: Thu, 19 May 2022 12:01:41 -0400 Subject: [lttng-dev] [PATCH] sched/tracing: fix __trace_sched_switch_state (5.18-rc7+) In-Reply-To: <1908357724.62379.1652974262072.JavaMail.zimbra@efficios.com> References: <20220519150257.26136-1-bruce.ashfield@gmail.com> <1908357724.62379.1652974262072.JavaMail.zimbra@efficios.com> Message-ID: On Thu, May 19, 2022 at 11:31 AM Mathieu Desnoyers wrote: > > > ----- On May 19, 2022, at 11:02 AM, Bruce Ashfield via lttng-dev lttng-dev at lists.lttng.org wrote: > > > From: Bruce Ashfield > > > > The commit [fix: sched/tracing: Don't re-read p->state when emitting > > sched_switch event (v5.18)] was correct, but the kernel changed their > > mind with the following commit: > > > > commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d > > Author: Delyan Kratunov > > Date: Wed May 11 18:28:36 2022 +0000 > > > > sched/tracing: Append prev_state to tp args instead > > > > Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting > > sched_switch event, 2022-01-20) added a new prev_state argument to the > > sched_switch tracepoint, before the prev task_struct pointer. > > > > This reordering of arguments broke BPF programs that use the raw > > tracepoint (e.g. tp_btf programs). The type of the second argument has > > changed and existing programs that assume a task_struct* argument > > (e.g. for bpf_task_storage access) will now fail to verify. > > > > If we instead append the new argument to the end, all existing programs > > would continue to work and can conditionally extract the prev_state > > argument on supported kernel versions. > > > > Fixes: fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting > > sched_switch event, 2022-01-20) > > Signed-off-by: Delyan Kratunov > > Signed-off-by: Peter Zijlstra (Intel) > > Acked-by: Steven Rostedt (Google) > > Link: > > https://lkml.kernel.org/r/c8a6930dfdd58a4a5755fc01732675472979732b.camel at fb.com > > > > By reordering the parameters (again) we can get back up and building. > > > > Signed-off-by: Bruce Ashfield > > --- > > > > Hi all, > > > > This is more than likely NOT a correct fix, but while working on the > > yocto -dev reference kernel, I ran into this build failure against > > 5.18-rc7. > > > > I didn't see any sign of another fix on the mailing list, so I wanted > > to send this in case anyone else runs into the failure and to check > > to see if there's a better fix in progress. > > Hi Bruce, > > Please see: https://review.lttng.org/c/lttng-modules/+/8045 > > I was planning to merge it today, unless you have objections. > > I notice that in your own patch, you modify the arguments passed to > __trace_sched_switch_state(), althrough I don't see them changing in > the upstream kernel. Am I missing something ? No objects here. That is obviously better than mine :) I swear I saw the order of the arguments changing when I looked at the kernel, and had a secondary compile error in those calls .. but I very easily could have been mistaken. Bruce > > Thanks, > > Mathieu > > > > > Cheers, > > > > Bruce > > > > > > include/instrumentation/events/sched.h | 15 ++++++++------- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/include/instrumentation/events/sched.h > > b/include/instrumentation/events/sched.h > > index 339bec9..f9e9c38 100644 > > --- a/include/instrumentation/events/sched.h > > +++ b/include/instrumentation/events/sched.h > > @@ -23,8 +23,9 @@ > > #if (LTTNG_LINUX_VERSION_CODE >= LTTNG_KERNEL_VERSION(5,18,0)) > > > > static inline long __trace_sched_switch_state(bool preempt, > > - unsigned int prev_state, > > - struct task_struct *p) > > + struct task_struct *p, > > + struct task_struct *n, > > + unsigned int prev_state ) > > { > > unsigned int state; > > > > @@ -356,20 +357,20 @@ LTTNG_TRACEPOINT_EVENT_INSTANCE(sched_wakeup_template, > > sched_wakeup_new, > > LTTNG_TRACEPOINT_EVENT(sched_switch, > > > > TP_PROTO(bool preempt, > > - unsigned int prev_state, > > struct task_struct *prev, > > - struct task_struct *next), > > + struct task_struct *next, > > + unsigned int prev_state), > > > > - TP_ARGS(preempt, prev_state, prev, next), > > + TP_ARGS(preempt, prev, next, prev_state), > > > > TP_FIELDS( > > ctf_array_text(char, prev_comm, prev->comm, TASK_COMM_LEN) > > ctf_integer(pid_t, prev_tid, prev->pid) > > ctf_integer(int, prev_prio, prev->prio - MAX_RT_PRIO) > > #ifdef CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM > > - ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, > > prev_state, prev)) > > + ctf_enum(task_state, long, prev_state, __trace_sched_switch_state(preempt, > > prev, next, prev_state)) > > #else > > - ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev_state, > > prev)) > > + ctf_integer(long, prev_state, __trace_sched_switch_state(preempt, prev, next, > > prev_state)) > > #endif > > ctf_array_text(char, next_comm, next->comm, TASK_COMM_LEN) > > ctf_integer(pid_t, next_tid, next->pid) > > -- > > 2.19.1 > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com -- - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end - "Use the force Harry" - Gandalf, Star Trek II From marcel.hamer at windriver.com Mon May 30 10:10:21 2022 From: marcel.hamer at windriver.com (Marcel Hamer) Date: Mon, 30 May 2022 16:10:21 +0200 Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure Message-ID: <20220530141021.267219-1-marcel.hamer@windriver.com> When a channel snapshot creation fails the stream should be cleaned up properly. If the stream is not closed and cleaned properly on a failure, the next time a snapshot is created an assert is triggered for: assert(!stream->trace_chunk); inside the snapshot_channel function. Since the stream->trace_chunk was not reset to NULL. The reset to NULL happens inside the consumer_stream_close function. Fixes #1352 Signed-off-by: Marcel Hamer --- src/common/ust-consumer/ust-consumer.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index f176ca40a..f43216829 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct lttng_consumer_channel *channel, if (use_relayd) { ret = consumer_send_relayd_stream(stream, path); if (ret < 0) { - goto error_unlock; + goto error_close_stream; } } else { ret = consumer_stream_create_output_files(stream, false); if (ret < 0) { - goto error_unlock; + goto error_close_stream; } DBG("UST consumer snapshot stream (%" PRIu64 ")", stream->key); @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct lttng_consumer_channel *channel, ret = lttng_ustconsumer_take_snapshot(stream); if (ret < 0) { ERR("Taking UST snapshot"); - goto error_unlock; + goto error_close_stream; } ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos); if (ret < 0) { ERR("Produced UST snapshot position"); - goto error_unlock; + goto error_close_stream; } ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos); if (ret < 0) { ERR("Consumerd UST snapshot position"); - goto error_unlock; + goto error_close_stream; } /* -- 2.25.1 From jonathan.rajotte-julien at efficios.com Mon May 30 11:27:55 2022 From: jonathan.rajotte-julien at efficios.com (Jonathan Rajotte-Julien) Date: Mon, 30 May 2022 11:27:55 -0400 (EDT) Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure In-Reply-To: <20220530141021.267219-1-marcel.hamer@windriver.com> References: <20220530141021.267219-1-marcel.hamer@windriver.com> Message-ID: <769020238.11656.1653924475516.JavaMail.zimbra@efficios.com> Hi Marcel, Thanks for sending this patch. Looks sensible to me, still do you have a reproducer for it? I went back to bug 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force the assert failure. Cheers ----- Original Message ----- > From: "Marcel Hamer via lttng-dev" > To: "lttng-dev" > Sent: Monday, 30 May, 2022 10:10:21 > Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure > When a channel snapshot creation fails the stream should be cleaned up > properly. If the stream is not closed and cleaned properly on a failure, > the next time a snapshot is created an assert is triggered for: > > assert(!stream->trace_chunk); > > inside the snapshot_channel function. Since the stream->trace_chunk was > not reset to NULL. The reset to NULL happens inside the > consumer_stream_close function. > > Fixes #1352 > > Signed-off-by: Marcel Hamer > --- > src/common/ust-consumer/ust-consumer.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/src/common/ust-consumer/ust-consumer.c > b/src/common/ust-consumer/ust-consumer.c > index f176ca40a..f43216829 100644 > --- a/src/common/ust-consumer/ust-consumer.c > +++ b/src/common/ust-consumer/ust-consumer.c > @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct > lttng_consumer_channel *channel, > if (use_relayd) { > ret = consumer_send_relayd_stream(stream, path); > if (ret < 0) { > - goto error_unlock; > + goto error_close_stream; > } > } else { > ret = consumer_stream_create_output_files(stream, > false); > if (ret < 0) { > - goto error_unlock; > + goto error_close_stream; > } > DBG("UST consumer snapshot stream (%" PRIu64 ")", > stream->key); > @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct > lttng_consumer_channel *channel, > ret = lttng_ustconsumer_take_snapshot(stream); > if (ret < 0) { > ERR("Taking UST snapshot"); > - goto error_unlock; > + goto error_close_stream; > } > > ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos); > if (ret < 0) { > ERR("Produced UST snapshot position"); > - goto error_unlock; > + goto error_close_stream; > } > > ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos); > if (ret < 0) { > ERR("Consumerd UST snapshot position"); > - goto error_unlock; > + goto error_close_stream; > } > > /* > -- > 2.25.1 > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From marcel.hamer at windriver.com Tue May 31 07:28:55 2022 From: marcel.hamer at windriver.com (Marcel Hamer) Date: Tue, 31 May 2022 13:28:55 +0200 Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure In-Reply-To: <769020238.11656.1653924475516.JavaMail.zimbra@efficios.com> References: <20220530141021.267219-1-marcel.hamer@windriver.com> <769020238.11656.1653924475516.JavaMail.zimbra@efficios.com> Message-ID: <20220531112855.GA856582@windriver.com> Hello Jonathan, On Mon, May 30, 2022 at 11:27:55AM -0400, Jonathan Rajotte-Julien wrote: > [Please note: This e-mail is from an EXTERNAL e-mail address] > > Hi Marcel, > > Thanks for sending this patch. > > Looks sensible to me, still do you have a reproducer for it? I went back to bug 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force the assert failure. I can only reproduce it when running lttng-consumerd in a debugger environment, in my case gdb. My reproduction scenario is: 1. Setting a breakpoint on snapshot_channel() inside src/common/ust-consumer/ust-consumer.c 2. When the breakpoint hits, remove the the complete lttng directory containing the session data. 3. Continue the lttng_consumerd process from gdb. 4. In that case you see a negative return value -1 from consumer_stream_create_output_files() inside snapshot_channel(). 5. Take another snapshot and you will see lttng_consumerd crash because of the assert(!stream->trace_chunk); inside snapshot_channel(). This last action does not require any breakpoint intervention. The scenario seems to be very timing sensitive to reproduce. I do not have a clear command sequence to achieve the same error. The proposed patch prevents lttng_consumerd from crashing in step 5. Kind regards, Marcel > > Cheers > > ----- Original Message ----- > > From: "Marcel Hamer via lttng-dev" > > To: "lttng-dev" > > Sent: Monday, 30 May, 2022 10:10:21 > > Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure > > > When a channel snapshot creation fails the stream should be cleaned up > > properly. If the stream is not closed and cleaned properly on a failure, > > the next time a snapshot is created an assert is triggered for: > > > > assert(!stream->trace_chunk); > > > > inside the snapshot_channel function. Since the stream->trace_chunk was > > not reset to NULL. The reset to NULL happens inside the > > consumer_stream_close function. > > > > Fixes #1352 > > > > Signed-off-by: Marcel Hamer > > --- > > src/common/ust-consumer/ust-consumer.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/src/common/ust-consumer/ust-consumer.c > > b/src/common/ust-consumer/ust-consumer.c > > index f176ca40a..f43216829 100644 > > --- a/src/common/ust-consumer/ust-consumer.c > > +++ b/src/common/ust-consumer/ust-consumer.c > > @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct > > lttng_consumer_channel *channel, > > if (use_relayd) { > > ret = consumer_send_relayd_stream(stream, path); > > if (ret < 0) { > > - goto error_unlock; > > + goto error_close_stream; > > } > > } else { > > ret = consumer_stream_create_output_files(stream, > > false); > > if (ret < 0) { > > - goto error_unlock; > > + goto error_close_stream; > > } > > DBG("UST consumer snapshot stream (%" PRIu64 ")", > > stream->key); > > @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct > > lttng_consumer_channel *channel, > > ret = lttng_ustconsumer_take_snapshot(stream); > > if (ret < 0) { > > ERR("Taking UST snapshot"); > > - goto error_unlock; > > + goto error_close_stream; > > } > > > > ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos); > > if (ret < 0) { > > ERR("Produced UST snapshot position"); > > - goto error_unlock; > > + goto error_close_stream; > > } > > > > ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos); > > if (ret < 0) { > > ERR("Consumerd UST snapshot position"); > > - goto error_unlock; > > + goto error_close_stream; > > } > > > > /* > > -- > > 2.25.1 > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From jonathan.rajotte-julien at efficios.com Tue May 31 09:11:24 2022 From: jonathan.rajotte-julien at efficios.com (Jonathan Rajotte-Julien) Date: Tue, 31 May 2022 09:11:24 -0400 (EDT) Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure In-Reply-To: <20220531112855.GA856582@windriver.com> References: <20220530141021.267219-1-marcel.hamer@windriver.com> <769020238.11656.1653924475516.JavaMail.zimbra@efficios.com> <20220531112855.GA856582@windriver.com> Message-ID: <1268831549.13571.1654002684841.JavaMail.zimbra@efficios.com> Hi Marcel, This is exactly the kind of reproducer we are looking for. Thanks for providing it. I'll try it out and check if we need anything more in that patch. Cheers ----- Original Message ----- > From: "Marcel Hamer" > To: "jonathan rajotte-julien" > Cc: "lttng-dev" > Sent: Tuesday, May 31, 2022 7:28:55 AM > Subject: Re: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure > Hello Jonathan, > > On Mon, May 30, 2022 at 11:27:55AM -0400, Jonathan Rajotte-Julien wrote: >> [Please note: This e-mail is from an EXTERNAL e-mail address] >> >> Hi Marcel, >> >> Thanks for sending this patch. >> >> Looks sensible to me, still do you have a reproducer for it? I went back to bug >> 1352 and even with https://bugs.lttng.org/attachments/546 was unable to force >> the assert failure. > > I can only reproduce it when running lttng-consumerd in a debugger > environment, in my case gdb. My reproduction scenario is: > > 1. Setting a breakpoint on snapshot_channel() inside > src/common/ust-consumer/ust-consumer.c > 2. When the breakpoint hits, remove the the complete lttng directory > containing the session data. > 3. Continue the lttng_consumerd process from gdb. > 4. In that case you see a negative return value -1 from > consumer_stream_create_output_files() inside snapshot_channel(). > 5. Take another snapshot and you will see lttng_consumerd crash because > of the assert(!stream->trace_chunk); inside snapshot_channel(). This > last action does not require any breakpoint intervention. > > The scenario seems to be very timing sensitive to reproduce. I do not > have a clear command sequence to achieve the same error. > > The proposed patch prevents lttng_consumerd from crashing in step 5. > > Kind regards, > > Marcel > >> >> Cheers >> >> ----- Original Message ----- >> > From: "Marcel Hamer via lttng-dev" >> > To: "lttng-dev" >> > Sent: Monday, 30 May, 2022 10:10:21 >> > Subject: [lttng-dev] [PATCH lttng-tools] Fix: cleanup stream on snapshot failure >> >> > When a channel snapshot creation fails the stream should be cleaned up >> > properly. If the stream is not closed and cleaned properly on a failure, >> > the next time a snapshot is created an assert is triggered for: >> > >> > assert(!stream->trace_chunk); >> > >> > inside the snapshot_channel function. Since the stream->trace_chunk was >> > not reset to NULL. The reset to NULL happens inside the >> > consumer_stream_close function. >> > >> > Fixes #1352 >> > >> > Signed-off-by: Marcel Hamer >> > --- >> > src/common/ust-consumer/ust-consumer.c | 10 +++++----- >> > 1 file changed, 5 insertions(+), 5 deletions(-) >> > >> > diff --git a/src/common/ust-consumer/ust-consumer.c >> > b/src/common/ust-consumer/ust-consumer.c >> > index f176ca40a..f43216829 100644 >> > --- a/src/common/ust-consumer/ust-consumer.c >> > +++ b/src/common/ust-consumer/ust-consumer.c >> > @@ -1147,13 +1147,13 @@ static int snapshot_channel(struct >> > lttng_consumer_channel *channel, >> > if (use_relayd) { >> > ret = consumer_send_relayd_stream(stream, path); >> > if (ret < 0) { >> > - goto error_unlock; >> > + goto error_close_stream; >> > } >> > } else { >> > ret = consumer_stream_create_output_files(stream, >> > false); >> > if (ret < 0) { >> > - goto error_unlock; >> > + goto error_close_stream; >> > } >> > DBG("UST consumer snapshot stream (%" PRIu64 ")", >> > stream->key); >> > @@ -1170,19 +1170,19 @@ static int snapshot_channel(struct >> > lttng_consumer_channel *channel, >> > ret = lttng_ustconsumer_take_snapshot(stream); >> > if (ret < 0) { >> > ERR("Taking UST snapshot"); >> > - goto error_unlock; >> > + goto error_close_stream; >> > } >> > >> > ret = lttng_ustconsumer_get_produced_snapshot(stream, &produced_pos); >> > if (ret < 0) { >> > ERR("Produced UST snapshot position"); >> > - goto error_unlock; >> > + goto error_close_stream; >> > } >> > >> > ret = lttng_ustconsumer_get_consumed_snapshot(stream, &consumed_pos); >> > if (ret < 0) { >> > ERR("Consumerd UST snapshot position"); >> > - goto error_unlock; >> > + goto error_close_stream; >> > } >> > >> > /* >> > -- >> > 2.25.1 >> > >> > _______________________________________________ >> > lttng-dev mailing list >> > lttng-dev at lists.lttng.org > > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev