[lttng-dev] CTF stress tests

Thu Nov 20 14:27:58 EST 2014

CC-ing lttng-dev since this applies to Babeltrace.

On Tue, Nov 18, 2014 at 9:50 AM, Matthew Khouzam
<matthew.khouzam at ericsson.com> wrote:
> Hi,
> I was looking into the CTF stress tests.
> They are good, but I don't know if we want them in our standard test
> cases. They basically check the scalability of the computer it is being
> run on in most cases and in all cases, the test of reading 12g of files

I respectfully disagree.

In most tests, there is definitely a smart way to deal with the trace.
That is not to say that all implementations are expected to support
every imaginable test cases. However, they should strive to, at the
very least, gracefully handle failures to read traces and document their
limitations as the spec imposes very few.

While I agree that some of these tests are far-fetched (you're not likely
to see a CTF trace with a 32 768 level deep structure any time soon),
traces with enough streams to exhaust the max fd count are not
far-fetched at all. In fact, tracing any decently-sized cluster will bust
that limit in no time.

Handling a multi-megabyte sequence (~100mb), something that
Babeltrace can't do at the moment, may seem unreasonable at first.
It quickly becomes very pertinent when people start talking of core
dumping to a CTF trace.

> is rather prohibitive. That being said, maybe having this as a weekly
> build could be an interesting idea. Also, I don't think the idea that
> busting the heap size is a good indication of a test failure. I can
> always allocate more heap at the problem. :)

Unfortunately, my motherboard's DIMM slots are already full. ;-)

>
> Now for a breakdown of the tests:
> ├── gen.sh
> ├── metadata
> │   └── pass
> │       ├── large-metadata - interesting test, I think something like
> this should be used to improve robustness
> │       ├── long-identifier - this should be our approach here, I think
> http://stackoverflow.com/questions/6007568/what-is-max-length-for-an-c-c-identifier-on-common-build-systems

If you mean that an implementation should document its limitations,
I agree.

> │       ├── many-callsites - works here until we oome but it's slow to
> test and we don't have a real application to validate the data yet. So
> it's good, but we may wish to put efforts elsewhere for now.
> │       ├── many-stream-class - This test is looking at the max size of
> an array...

Not sure what you mean here although I agree that reading a trace with
16 million stream classes is not a "realistic" test.

That's not the point of the CTF stress test suite. This test suite will
expose errors that are not handled gracefully. Implementations are
expected to fail, but to do so in a controlled and documented way;
not silently or by triggering the OOM-killer.

> │       ├── many-typealias - interesting test, the parser will suffer.
> │       └── many-typedef - ditto
> └── stream
>     └── pass
>         ├── array-large - Testing the max size of ram and arrays.

There is no need to load the entire array in RAM. I, personally, intend
on implementing a slow fallback to disk when arrays exceed a given
size.

This will not work in live mode and that's completely okay. As long
as it doesn't outright crash.

There is also a security aspect to this; an implementation shouldn't
trust the relay daemon to send packets it can handle safely. This
doesn't really apply to arrays as their length is statically known,
but unchecked sequence length are definitely a security/stability
concern.

>         ├── many-events - This can be an interesting tool for profiling.
> But it's not really a test for the reader...

Not sure why. Tracing multiple UST applications over a long time is
bound to stress this at some point.

Just imagine a snapshot session running for a year, through multiple
application updates. The number of UST tracepoints can quickly
become humongous.

>         ├── many-packets - This will be good to test distributed indexes
>         ├── many-streams - Good, but even better is...
>         ├── many-traces - interesting and a real problem
>         ├── packet-large - this is actually easier than many-events

Speak for yourself! Babeltrace mmaps entire packets! ;-)
We have our own share of embarrassing limitations :-P

>         ├── sequence-large - see array-large
>         ├── string-large - see array-large
>         ├── struct-many-fields - see array-large
>         ├── struct-nest-n-deep - I like this one, it highlights one of
> our optimisations

Hmm, interesting! Care to explain the gist of it?

>         └── variant-many-tags - see array-large

Again, this is something I can easily see being produced by a
dynamically-typed language. Imagine a VM implementation that
would trace every function call for every possible argument type
permutation.

>
> Why no deep variants? just curious.

Good point, we should add that!
We should also have nested sequences, arrays and mixes
of variants, sequences and arrays.

Our assumption is that the nesting-handling code is shared
between various types; the nested-structures test should ideally
trigger it.

Perhaps we should not presume such implementation details...
The test suite is still a work in progress; any weird test case
you can think of is welcome!

>
> so, tl;dr the tests are good at making a system suffer and qualifying
> its boundaries. I think this is great for profiling, but for actual
> testing, it should not be part of the per-patch system we have set up.
>
> Any thoughts?

I think the real problem is assuming that a failing test is necessarily
a deal-breaker.

I see this test suite as CTF's ACID equivalent [1]. Most web browsers
knowingly fail this test, it doesn't make it any less relevant.

Jérémie

[1] http://www.acidtests.org/

-- 
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com