[lttng-dev] Tracing peak size of the running queue

Wed Nov 30 10:19:29 EST 2011

Hi all,

I am currently using lttng 0.226, and lttv 0.12.38, trying to understand 
the behavior of a system with many concurrent threads, which sometimes 
gets to CPU saturation and timeouts. This could be due to a problem 
either with the scheduler or with the way tasks interact with each 
other, I believe.

I thinkg it might be useful if I could get some representation 
(graphical or otherwise) about the number of RUNNable tasks at any given 
time to get an idea of how much contention is ongoing on the CPU.

So I guess something like the following information:
- peak measurement of the running queue length
- max/mean task latency (i.e. time from the task enters the running 
queue to the time it actually gets the CPU)
- cpu distribution among tasks
when measured over a given time window, might help a lot understand 
these problematic scenarios.

I noticed how you can get some rough information by guihistogram (but I 
understand than only graphs the density of events in the trace) and some 
overall statistics through libguistatistics, but that only relates to 
the overall trace, if I understand it correctly.

I think the above information could be so useful that I am surprised 
lttng does not provide it out of the box. After all, ALL scheduling 
events are traced so it should trivially be a matter of displaying this 
data.

My question is then: is the information I'm looking for completely 
nonsense, or could it easily be obtained somehow (and I just can't find 
the right button)?

Thank you!
Gerlando