[lttng-dev] [PATCH lttng-tools] Fix: test code assumes that child process is schedule to run before parent

Anders Wallin wallinux at gmail.com
Thu Apr 1 11:02:09 EDT 2021


Hi,

I think you understood what I meant. The issue I have is for this 4 test
cases in ./regression/tools/tracker/test_event_tracker
- test_event_vpid_tracker ust 0 "${EVENT_NAME}"
- test_event_vpid_track_untrack ust 0 "${EVENT_NAME}"
- test_event_pid_tracker ust 0 "${EVENT_NAME}"
- test_event_pid_track_untrack ust 0 "${EVENT_NAME}"

I have 2 patches ready, but I want to run some more tests first before
posting them

Anders Wallin


On Thu, Apr 1, 2021 at 3:45 PM Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:

> On Thu, Apr 01, 2021 at 03:21:10AM +0200, Anders Wallin wrote:
> > Hi Jonathan :)
> >
> > It's very unlikely that the race could occur, BUT it can happen.
> >
> > OK run
> > 1. test_event_tracker starts gen-ust-events
> > 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> > 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> > 4. gen-ust-event continue and create 99 events
> > 5. gen-ust-event wait for  BEFORE_LAST_PATH
> > 6. test_event_track start collecting events (lttng start .....)
>
> > 7. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> > 8. test_event_tracker touches BEFORE_LAST_PATH
> > 9. gen-ust-events creates the last event
> > 10. test_event finishes and since it tracks the faulty pid the result
> will
> > be 0 events == OK
>
> The sequence of event for the `test_event_tracker` function is (as of
> master):
>
> create
> enable event
> start
> track
>
> launch app
> wait for app return
> stop
> destroy
>
>
> The sequence you are describing here is:
>
> lauch app
> start
> track
> ...
>
> I'm pretty sure we are not talking about the same things. Please specify
> the
> test case and all functions involved and make sure to use the proper name
> for
> each of them.
>
> I suspect you are talking about test_event_pid_tracker. Still let's make
> sure of
> it. If it is, I do agree that it seems to have a window where we can gather
> event for.
>
> You might want to look if there is a real reason for this sequence instead
> of
> mimicking test_event_tracker
>
> Current code:
>  enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
>  prepare_"$domain"_app
>  start_lttng_tracing_ok
>
>  if [ "$expect_event" -eq 1 ]; then
>          lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
>  else
>          lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
>  fi
>  trace_"$domain"_app
>  stop_lttng_tracing_ok
>  destroy_lttng_session_ok $SESSION_NAME
>
>
> We might simply want to move the track command before the start
> considering have
> all the information to do so.
>
>  enable_"$domain"_lttng_event_ok $SESSION_NAME "$wildcard" "$channel"
>
>  prepare_"$domain"_app
>
>  if [ "$expect_event" -eq 1 ]; then
>          lttng_track_"$domain"_ok "--vpid ${CHILD_PID}"
>  else
>          lttng_track_"$domain"_ok "--vpid $((CHILD_PID+1))"
>  fi
>
>  start_lttng_tracing_ok
>  trace_"$domain"_app
>
>  stop_lttng_tracing_ok
>  destroy_lttng_session_ok $SESSION_NAME
>
> After testing this, seems like the validate_trace_empty does not handle the
> case were simply stream allocation did not take place since no application
> was
> 'valid' at the moment of the trace start.
>
> Well we got a good one here. We will wait for your updated patch and go
> from
> there.
>
> Cheers
>
> >
> > Faulty run
> > 1. test_event_tracker starts gen-ust-events
> > 2. test_event_tracker waits for gen-ust-events to create AFTER_FIRST_PATH
> > 3. gen-ust-event create first event and create AFTER_FIRST_PATH
> > 4. gen-ust-event gets rescheduled before it has created 99 events, e.g
> > after 9 events
> > 5. test_event_track start collecting events (lttng start .....)
> > 6. gen-ust-event is rescheduled and starts generating the remaining
> events.
> > 90 events
> > 7. lttng collects these 90 events since we have not setup "tracking" of
> PID
> > yet
> > 8. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> > 9. test_event_tracker touches BEFORE_LAST_PATH
> > 10. gen-ust-events creates the last event
> > 11. test_event finishes and since it tracks the faulty pid the result
> > should  be 0 events, but we got 90 == FAULTY
> >
> > We can solve this by;
> > A: using NR_ITER=2
> > or
> > B: by adding a flag to gen-ust-events to wait before sending the first
> event
> > 1. test_event_tracker starts gen-ust-events
> > 2. test_event_tracker waits for gen-ust-events waits for
> BEFORE_FIRST_PATH
> > 3. test_event_track start collecting events (lttng start .....)
> > 4. test_event_tracker calls "lttng track ... -pid "Faulty PID"
> > 5. test_event_track creates BEFORE_FIRST_PATH
> > 6. gen_ust_event creates 100 events
> > 7. test_event_tracker waits for gen_event_ust to end
> > 8. test_event finishes and since it tracked the faulty pid the result
> will
> > be 0 events == OK
> >
> > This is in principle how gen-kernel-test-events works (but with different
> > arguments)
> > I would suggest to use B since that will be bulletproof
> >
> > Anders Wallin
> >
> >
> > On Wed, Mar 31, 2021 at 11:33 PM Jonathan Rajotte-Julien <
> > jonathan.rajotte-julien at efficios.com> wrote:
> >
> > > Hi,
> > >
> > > On Wed, Mar 31, 2021 at 11:09:42PM +0200, Anders Wallin wrote:
> > > > Hi Julian,
> > >
> > > You can use Jonathan. ;)
> > >
> > > >
> > > > Neither mine "sleep 0.1" or your version with "while [! -f
> ............"
> > > > are race condition free.
> > >
> > > I might be missing something here but as far as I understand the race
> you
> > > are
> > > trying to mitigate is that the test script execute/continue before the
> > > `backgrounded`
> > > command (background test app) had time to execute, right?
> > >
> > > If so at least waiting for the app to create a file is necessary. Now
> > > gen_kernel_test_events does not have this functionality. The
> > > PATH_WAIT_FILE is
> > > used to control when the testapp can continue. Hence the script still
> > > cannot
> > > know if the app have been scheduled.
> > >
> > > Now based on the test case you might need more synchronization for the
> test
> > > cases.
> > >
> > > Note that in the ust cases, the trace_ust_app uses `touch
> > > "$BEFORE_LAST_PATH"`
> > > that effectively unblock the app and allows it to perform the last
> > > tracepoint
> > > hit and the we `wait` on the background process.
> > >
> > > Note: some tests case are a bit clever and uses "trace_"$domain"_app"
> > > instead of
> > > calling trace_ust_app directly.
> > >
> > > For these tests case it seems that we are only expecting at least a
> single
> > > event
> > > matching the event name under test. Here the last tracepoint hit should
> > > satisfy
> > > this criteria.
> > >
> > > Am I missing a race?
> > >
> > > Cheers
> > >
> > >
> > > > I suggest that we add an option to gen-ust-events to wait before the
> > > first
> > > > event is generated.
> > > > gen_kernel_test_events already have this functionality to wait
> before the
> > > > first event.
> > > >
> > > > Something like this
> > > > static struct option long_options[] =
> > > > {
> > > > /* These options set a flag. */
> > > > {"iter", required_argument, 0, 'i'},
> > > > {"wait", required_argument, 0, 'w'},
> > > > {"sync-after-first-event", required_argument, 0, 'a'},
> > > > {"sync-before-last-event", required_argument, 0, 'b'},
> > > > {"sync-before-last-event-touch", required_argument, 0, 'c'},
> > > > {"sync-before-exit", required_argument, 0, 'd'},
> > > > {"sync-before-exit-touch", required_argument, 0, 'e'},
> > > > *+ {"sync-before-first-event", required_argument, 0, 'f'},*
> > > >
> > > > {0, 0, 0, 0}
> > > > };
> > > > ....
> > > >
> > > > I will create one or more patches for this tomorrow
> > > >
> > > > Anders Wallin
> > > >
> > > >
> > > > On Wed, Mar 31, 2021 at 9:25 PM Jonathan Rajotte-Julien <
> > > > jonathan.rajotte-julien at efficios.com> wrote:
> > > >
> > > > > > #
> > > > > > # SPDX-License-Identifier: GPL-2.0-only
> > > > > >
> > > > > > -TEST_DESC="LTTng - Event traker test"
> > > > > > +TEST_DESC="LTTng - Event tracker test"
> > > > > >
> > > > > > CURDIR=$(dirname "$0")/
> > > > > > TESTDIR="$CURDIR/../../.."
> > > > > > @@ -42,6 +42,8 @@ function prepare_ust_app
> > > > > >
> > > > > >       $TESTAPP_BIN -i $NR_ITER -w $NR_USEC_WAIT -a
> > > "$AFTER_FIRST_PATH" -b
> > > > > >       "$BEFORE_LAST_PATH" &
> > > > > >       CHILD_PID=$!
> > > > > > +     # voluntary context switch to start $TESTAPP_BIN
> > > > > > +     sleep 0.1
> > > > >
> > > > > A wait on the $AFTER_FIRST_PATH file would be probably more
> > > deterministic
> > > > > than a sleep here.
> > > > >
> > > > >   while [ ! -f "${AFTER_FIRST_PATH}" ]; do
> > > > >           sleep 0.1
> > > > >   done
> > > > >
> > > > > I would also expect something similar for the `prepare_kernel_app`
> > > > > function considering the same race is most probably present and
> simply
> > > not
> > > > > triggered by a chance of luck.
> > > > > Seems like gen-kernel-test-events does not expose the same sync
> > > > > capabilities here, please use gen-ust-events as an example of how
> it is
> > > > > done.
> > > > >
> > > > > Let us know how your testing goes.
> > > > >
> > > > > Thanks
> > > > >
> > >
> > > --
> > > Jonathan Rajotte-Julien
> > > EfficiOS
> > >
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20210401/72dbb60d/attachment-0001.htm>


More information about the lttng-dev mailing list