[lttng-dev] [linuxtools-dev] View for virtual machine monitoring

Tue Jul 9 16:28:10 EDT 2013

> Hello Mohamad!
>
> Your work looks very interesting.  I have been forced to be away from it for
> a couple months now due to other work priorities, but I have been building
> something quite similar myself.
>

Hello,
Thank you for your elaborated answer!

> ----- Original Message -----
> > Hello,
> > We are currently working on a new view in Eclipse's TMF plugin (Tracing and
> > Monitoring Framework) specific to virtual machine analysis. This view
> > requires
> > kernel traces from the host and from each guest with a set of specific
> > tracepoints activated. The traces are then merged together and analysed in
> a
> > way that the real state of each system can be rebuilt, while taking into
> > account all the interactions between the different systems.
>
> I assume you are using LTTng for Linux, are you using it for KVM as well?

Yes, I am using the tracepoints of KVM that are already in the kernel, and
generating the traces using LTTng. Actually I had disabled a feature in this
view before taking the screenshots, which showed when the CPU was in VMX root
mode to handle a VMEXIT. This information was built from KVM's tracepoints and
showed the overhead caused by virtualization.

>
> I assume then that you are using CTF formatted traces?

Yes.

> Are you using TMF's CTF parser?

Yes, I am using the CTF parser from TMF.

> > The main purpose of this view is to easily point out latency problems due
> to
> > resource sharing. For now, we only consider CPU time, but more resources
> > (such
> > as memory allocation, disks...) will be added.
> >
> > Two screenshots are attached. The first one shows the virtual machines and
> > the
> > state of their respective virtual CPUs. The second screenshot gives
> in-depth
> > information about one of the virtual CPUs, showing only the threads that
> > interacted with this vCPU and their state during the time of the trace. We
> > think that this approach of showing information across the layers (OS, KVM,
> > guest OS, and eventually JVM...) can be helpful to investigate
> > latency-related
> > problems specific to virtual machines.
>
> I agree!
>
> I am interested to know how you setup your view under the hood.  Did you
> build from the code base that was already there with the ControlFlowView
> (which is what I did), and then using the TMF
> state system infrastructure to model state of the various elements you wish
> to display?

Actually I used the ResourceView as a base for my view since I have different
data types in the same view (VMs, vCPUs, threads and eventually interrupts).

> If you look through the history on this list you will see some links that I
> posted to the prototype that I was working with on github as well as some
> screenshots.

Yes I have seen your previous work in TMF.

> I went with an approach of trying to make the view a generic
> display of hierarchical state of objects vs time, and then pluggable code
> that understands the event schema, iterating the events and updating the
> view.  I like the idea of having a view that can be data driven and so it is
> then fairly straight forward to plug in any sort of state vs. time in context
> display.  My work is incomplete, it still lacks a number of features that I
> intend to add including the ability to have multiple instances of the view
> open at the same time, all syncronized and/or a single view that aggregates
> the contents of many different traces.  It looks as though you are already
> doing that, though I can't help but wonder how you defined the hierarchy when
> different levels in the hierarchy have different traces.

How it works for now is that we have to create an Experiment with a predefined
type (VM analysis type), and add to it all of the traces we want to analyse. At
this step, we also have to specify which traces were recorded on the host, and
which ones were recorded on the guests.
Thanks to Genevieve Bastien's experimental work, we were able to merge all of
the traces together as one single trace owned by the Experiment. Then the
events are handled one by one, and depending on if they came from a host, or
from a guest, the state system is modified accordingly.
I will very soon post a link for my experimental branch for this view if you
want to take a closer look at the code underneath.

Thanks again for your reponse,
Mohamad.