[ltt-dev] [PATCH] Poll : introduce poll_wait_exclusive() new function

Mathieu Desnoyers compudj at krystal.dyndns.org
Wed Nov 26 06:15:11 EST 2008


* Davide Libenzi (davidel at xmailserver.org) wrote:
> On Tue, 25 Nov 2008, KOSAKI Motohiro wrote:
> 
> > 
> > patch againt: tip/tracing/marker
> > 
> > ==========
> > Currently, wake_up() function behavior depend on the way of
> > wait queue adding function.
> > 
> > 
> >                               wake_up()          wake_up_all()
> > ---------------------------------------------------------------
> > add_wait_queue()              wake up all        wake up all
> > add_wait_queue_exclusive()    wake up one task   wake up all
> > 
> > 
> > Unforunately, poll_wait() always use add_wait_queue().
> > it means there is no way that wake up only one process in polled processes.
> > wake_up() also wake up all sleeping processes, not 1 process.
> > 
> > 
> > Mathieu Desnoyers explained it cause following problem to LTTng.
> > 
> >    In LTTng, all lttd readers are polling all the available debugfs files
> >    for data. This is principally because the number of reader threads is
> >    user-defined and there are typical workloads where a single CPU is
> >    producing most of the tracing data and all other CPUs are idle,
> >    available to consume data. It therefore makes sense not to tie those
> >    threads to specific buffers. However, when the number of threads grows,
> >    we face a "thundering herd" problem where many threads can be woken up
> >    and put back to sleep, leaving only a single thread doing useful work.
> 
> Why do you need to have so many threads banging a single device/file?
> Have one (or any other very little number) puller thread(s), that 
> activates with chucks of pulled data the other processing threads. That 
> way there's no need for a new wakeup abstraction.
> 
> 
> 
> - Davide

One of the key design rule of LTTng is to do not depend on such
system-wide data structures, or entity (e.g. single manager thread).
Everything is per-cpu, and it does scale very well.

I wonder how badly the approach you propose can scale on large NUMA
systems, where having to synchronize everything through a single thread
might become an important point of contention, just due to the cacheline
bouncing and extra scheduler activity involved.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list