[lttng-dev] [diamon-discuss] My experience on perf, CTF and TraceCompass, and some suggection.

Sat Feb 7 00:14:51 EST 2015

On 2015-02-06 10:13 PM, Wang Nan wrote:
>> I think such an approach could help in all 3 use cases you have presented. What do you think?
>> >
> Good to see you are looking at this problem.
>
> "Frequency analysis" you mentioned is a good viewpoint for finding outliner. However, it should not be the only one we consider. Could you please explain how "frequency analysis" can solve my first problem "finding the reason why most of CPUs are idle by matching syscalls events?"
>
> Thank you!
>

I was thinking of the automatic event matching, for example matching 
syscall_entry_* events with their corresponding syscall_exit_*. This is 
a pre-requisite of doing frequency analysis, but could be useful on its 
own. (Re-reading myself now I notice I didn't mention event matching at 
all, my bad!)

If I understand your first problem correctly, it boils down to wanting 
to identify the system call that is ongoing when a CPU is idle. And this 
is not straightforward, because the Select Next/Previous Event buttons 
in the time graph views will stop at every state transition, like CPU 
idle or IRQs, which are "in the way" of the system call entry event. 
Correct?

Now, what if we had a view that simply lists all the system calls in the 
trace? Each row would contain the complete information of a system call, 
so its start time (timestamp of the sys_entry event), end time 
(timestamp of the sys_exit event), duration, name/id, arguments, return 
value, etc.

And this view could be synchronized with the other ones, where clicking 
on a row would bring you to the corresponding syscall_entry event. And 
inversely, clicking on any timestamp would bring this view to the row 
corresponding to the system call that was active at that time, if there 
is one. I believe this could speed up your problem of identifying the 
running system call for any arbitrary point in the trace.

Also, that view could be re-used for other types of intervals, like 
IRQs, IO operations, and so on. And if the user sorts by the Duration 
column, bam, they have a sorted list of the worst offenders for longer 
system calls, IRQs, etc.

Would this be helpful?

Cheers,
Alexandre