[lttng-dev] [PATCH lttng-tools] Fix: test code assumes that child process is schedule to run before parent
Jonathan Rajotte-Julien
jonathan.rajotte-julien at efficios.com
Thu Apr 1 09:45:27 EDT 2021
On Thu, Apr 01, 2021 at 03:21:10AM +0200, Anders Wallin wrote:
> Hi Jonathan :)
>
> It's very unlikely that the race could occur, BUT it can happen.
>
> OK run
> 1. test_event_tracker starts gen-ust-events
> 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> 4. gen-ust-event continue and create 99 events
> 5. gen-ust-event wait for BEFORE_LAST_PATH
> 6. test_event_track start collecting events (lttng start .....)
> 7. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> 8. test_event_tracker touches BEFORE_LAST_PATH
> 9. gen-ust-events creates the last event
> 10. test_event finishes and since it tracks the faulty pid the result will
> be 0 events == OK
The sequence of event for the `test_event_tracker` function is (as of master):
create
enable event
start
track
launch app
wait for app return
stop
destroy
The sequence you are describing here is:
lauch app
start
track
...
I'm pretty sure we are not talking about the same things. Please specify the
test case and all functions involved and make sure to use the proper name for
each of them.
I suspect you are talking about test_event_pid_tracker. Still let's make sure of
it. If it is, I do agree that it seems to have a window where we can gather
event for.
You might want to look if there is a real reason for this sequence instead of
mimicking test_event_tracker
Current code:
enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
prepare_"$domain"_app
start_lttng_tracing_ok
if [ "$expect_event" -eq 1 ]; then
lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
else
lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
fi
trace_"$domain"_app
stop_lttng_tracing_ok
destroy_lttng_session_ok $SESSION_NAME
We might simply want to move the track command before the start considering have
all the information to do so.
enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
prepare_"$domain"_app
if [ "$expect_event" -eq 1 ]; then
lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
else
lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
fi
start_lttng_tracing_ok
trace_"$domain"_app
stop_lttng_tracing_ok
destroy_lttng_session_ok $SESSION_NAME
After testing this, seems like the validate_trace_empty does not handle the
case were simply stream allocation did not take place since no application was
'valid' at the moment of the trace start.
Well we got a good one here. We will wait for your updated patch and go from
there.
Cheers
>
> Faulty run
> 1. test_event_tracker starts gen-ust-events
> 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> 4. gen-ust-event gets rescheduled before it has created 99 events, e.g
> after 9 events
> 5. test_event_track start collecting events (lttng start .....)
> 6. gen-ust-event is rescheduled and starts generating the remaining events.
> 90 events
> 7. lttng collects these 90 events since we have not setup "tracking" of PID
> yet
> 8. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> 9. test_event_tracker touches BEFORE_LAST_PATH
> 10. gen-ust-events creates the last event
> 11. test_event finishes and since it tracks the faulty pid the result
> should be 0 events, but we got 90 == FAULTY
>
> We can solve this by;
> A: using NR_ITER=2
> or
> B: by adding a flag to gen-ust-events to wait before sending the first event
> 1. test_event_tracker starts gen-ust-events
> 2. test_event_tracker waits for gen-ust-events waits for BEFORE_FIRST_PATH
> 3. test_event_track start collecting events (lttng start .....)
> 4. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> 5. test_event_track creates BEFORE_FIRST_PATH
> 6. gen_ust_event creates 100 events
> 7. test_event_tracker waits for gen_event_ust to end
> 8. test_event finishes and since it tracked the faulty pid the result will
> be 0 events == OK
>
> This is in principle how gen-kernel-test-events works (but with different
> arguments)
> I would suggest to use B since that will be bulletproof
>
> Anders Wallin
>
>
> On Wed, Mar 31, 2021 at 11:33 PM Jonathan Rajotte-Julien <
> jonathan.rajotte-julien at efficios.com> wrote:
>
> > Hi,
> >
> > On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> > > Hi Julian,
> >
> > You can use Jonathan. ;)
> >
> > >
> > > Neither mine "sleep 0.1" or your version with "while [! -f ............"
> > > are race condition free.
> >
> > I might be missing something here but as far as I understand the race you
> > are
> > trying to mitigate is that the test script execute/continue before the
> > `backgrounded`
> > command (background test app) had time to execute, right?
> >
> > If so at least waiting for the app to create a file is necessary. Now
> > gen_kernel_test_events does not have this functionality. The
> > PATH_WAIT_FILE is
> > used to control when the testapp can continue. Hence the script still
> > cannot
> > know if the app have been scheduled.
> >
> > Now based on the test case you might need more synchronization for the test
> > cases.
> >
> > Note that in the ust cases, the trace_ust_app uses `touch
> > "$BEFORE_LAST_PATH"`
> > that effectively unblock the app and allows it to perform the last
> > tracepoint
> > hit and the we `wait` on the background process.
> >
> > Note: some tests case are a bit clever and uses "trace_"$domain"_app"
> > instead of
> > calling trace_ust_app directly.
> >
> > For these tests case it seems that we are only expecting at least a single
> > event
> > matching the event name under test. Here the last tracepoint hit should
> > satisfy
> > this criteria.
> >
> > Am I missing a race?
> >
> > Cheers
> >
> >
> > > I suggest that we add an option to gen-ust-events to wait before the
> > first
> > > event is generated.
> > > gen_kernel_test_events already have this functionality to wait before the
> > > first event.
> > >
> > > Something like this
> > > static struct option long_options[] =
> > > {
> > > /* These options set a flag. */
> > > {"iter", required_argument, 0, 'i'},
> > > {"wait", required_argument, 0, 'w'},
> > > {"sync-after-first-event", required_argument, 0, 'a'},
> > > {"sync-before-last-event", required_argument, 0, 'b'},
> > > {"sync-before-last-event-touch", required_argument, 0, 'c'},
> > > {"sync-before-exit", required_argument, 0, 'd'},
> > > {"sync-before-exit-touch", required_argument, 0, 'e'},
> > > *+ {"sync-before-first-event", required_argument, 0, 'f'},*
> > >
> > > {0, 0, 0, 0}
> > > };
> > > ....
> > >
> > > I will create one or more patches for this tomorrow
> > >
> > > Anders Wallin
> > >
> > >
> > > On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> > > jonathan.rajotte-julien at efficios.com> wrote:
> > >
> > > > > #
> > > > > # SPDX-License-Identifier: GPL-2.0-only
> > > > >
> > > > > -TEST_DESC="LTTng - Event traker test"
> > > > > +TEST_DESC="LTTng - Event tracker test"
> > > > >
> > > > > CURDIR=$(dirname "$0")/
> > > > > TESTDIR="$CURDIR/../../.."
> > > > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > > > >
> > > > > $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a
> > "$AFTER_FIRST_PATH" -b
> > > > > "$BEFORE_LAST_PATH" &
> > > > > CHILD_PID=$!
> > > > > + # voluntary context switch to start $TESTAPP_BIN
> > > > > + sleep 0.1
> > > >
> > > > A wait on the $AFTER_FIRST_PATH file would be probably more
> > deterministic
> > > > than a sleep here.
> > > >
> > > > while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> > > > sleep 0.1
> > > > done
> > > >
> > > > I would also expect something similar for the `prepare_kernel_app`
> > > > function considering the same race is most probably present and simply
> > not
> > > > triggered by a chance of luck.
> > > > Seems like gen-kernel-test-events does not expose the same sync
> > > > capabilities here, please use gen-ust-events as an example of how it is
> > > > done.
> > > >
> > > > Let us know how your testing goes.
> > > >
> > > > Thanks
> > > >
> >
> > --
> > Jonathan Rajotte-Julien
> > EfficiOS
> >
--
Jonathan Rajotte-Julien
EfficiOS
More information about the lttng-dev
mailing list