[lttng-dev] introduction & usage of LTTNG in machinekit

Mon Apr 27 16:41:05 EDT 2015

> Am 27.04.2015 um 14:30 schrieb Michel Dagenais <michel.dagenais at polymtl.ca>:
> 
> 
>> That said, the folks which have timing problems have not caught on yet.. it
>> looks some more guidance, examples and a bit of a machinekit-specific
>> writeup is needed, competing with the other 247 priority projects on my desk
>> ;)
> 
> If we can reproduce easily the problems, bad latency on Linux Prempt-RT in a specific configuration, we can take care of tracing and diagnosing the problems. Otherwise, the users will have to activate the tracing themselves on their setup and we can guide them for the diagnosis. 

The problem at hand has these characteristics:

- it happens only on low-end ARM platforms
- the code is part of the trajectory planner, which involves fp and transcendental functions
- the code is executed from a period Xenomai thread and once in a while execution time exceeds the cycle time budget
- no interrupts, system call, or Xenomai API call is involved, so kernel or ipipe tracing will not help to narrow it down

my idea was to insert trace points in the tp code and one triggered on 'time window exceeded' so one could look backwards in history to determine which span of code used excessive time; does this sound reasonable?

Alex (in cc) has a test case which reproducibly triggers the problem

> 
>> Let me add a question out of the blue: I've read the literature on observing
>> time and distributed systems and am aware of the theoretical issues, as well
>> as clock synchronisation. Still - has any thought been given to merge traces
>> of several systems based on synchronisation points (known causality),
>> well-synchronized  clocks, or other forms of hardware synchronisation?
> 
> I suppose that you mean something like this which we offer in TraceCompass:
> 
> http://archive.eclipse.org/tracecompass/doc/org.eclipse.tracecompass.doc.user/Trace-synchronization.html
> 
> We have two use cases supported already, distributed systems exchanging packets through TCP and physical / virtual machines communicating through vmenter / exit and tracepoints on both sides of those transitions. This framework can easily be extended to other cases. We could in the future make it available declaratively (define event types and unique identifiers that should be used to match causality-related events).

That's pretty impressive. I'll try to reproduce an example and think through how this could fit into what we're doing.

Background is - we're getting more into supporting fieldbus-connected components, some of which might run our code, so distributed tracing would become an option

with CANbus in particular, an open question is if driving a remote motion controller is within the range of bus throughput; ballpark - no, but there are some drives which have quite intelligent features and it might work. If not, it would help to know where we loose out, and up to which rates it would still work.

It would good to have such a tool at hand, and the developers aware of it, and how to use it. I could actually see a use case for mixed software tracing and hardware-visible events, like from a logic analyzer.

The other area where I could find use for such a capability is UI/server interaction (could it be there might be much wider interest for this than just me?)

Those is a more 'researchy' applications than the one above though. That one we need to nail.

- Michael