[ltt-dev] [PATCH] Poll : introduce poll_wait_exclusive() new function

Mathieu Desnoyers compudj at krystal.dyndns.org
Wed Nov 26 17:27:15 EST 2008

* Andrew McDermott (andrew.mcdermott at windriver.com) wrote:
> Mathieu Desnoyers <compudj at krystal.dyndns.org> writes:
> [...]
> >> > Mathieu Desnoyers explained it cause following problem to LTTng.
> >> > 
> >> >    In LTTng, all lttd readers are polling all the available debugfs files
> >> >    for data. This is principally because the number of reader threads is
> >> >    user-defined and there are typical workloads where a single CPU is
> >> >    producing most of the tracing data and all other CPUs are idle,
> >> >    available to consume data. It therefore makes sense not to tie those
> >> >    threads to specific buffers. However, when the number of threads grows,
> >> >    we face a "thundering herd" problem where many threads can be woken up
> >> >    and put back to sleep, leaving only a single thread doing useful work.
> >> 
> >> Why do you need to have so many threads banging a single device/file?
> >> Have one (or any other very little number) puller thread(s), that 
> >> activates with chucks of pulled data the other processing threads. That 
> >> way there's no need for a new wakeup abstraction.
> >> 
> >> 
> >> 
> >> - Davide
> >
> > One of the key design rule of LTTng is to do not depend on such
> > system-wide data structures, or entity (e.g. single manager thread).
> > Everything is per-cpu, and it does scale very well.
> >
> > I wonder how badly the approach you propose can scale on large NUMA
> > systems, where having to synchronize everything through a single thread
> > might become an important point of contention, just due to the cacheline
> > bouncing and extra scheduler activity involved.
> But at the end of the day these threads end up writing to the (possibly)
> single spindle.  Isn't that the biggest bottlneck here?

Not if those threads are either

- analysing the data on-the-fly without exporting it to disk
- sending the data through more than one network card
- Writing data to multiple disks

There are therefore ways to improve scalability by adding more data
output paths. Therefore, I don't want to limit scalability due to the
inner design, so that if someone has the resources to send the
information out at great speed scaleably, he can.


Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

More information about the lttng-dev mailing list