[lttng-dev] Golang agent for LTTng-ust

Tue May 29 09:47:33 EDT 2018

> I agree that integrating C code into a Go codebase is somewhat inelegant.

Not only that, but it's not sustainable. It is more a hack than a feature.

> However, I'm not sure what you mean by "implementation issues that are
> specific to the language itself".

I mean that if you put static calls to C tracepoints from a Go  
program, you always have a function call (and the ~50ns overhead)  
triggered each time you hit the tracepoint, whether the tracepoint is  
actually enabled or not. So basically you can't count on the compiler  
(specific to the language) to do clever branch prediction for you,  
which reduces the interest of instrumenting your code.

I like the third solution that you propose. I think that the first one  
is definitely not ideal and that the second one is too much work and  
maintaining. How much time do you estimate is necessary for the  
development of the third solution?

By the way, I am currently working on instrumenting the Go runtime to  
capture information on the goroutines. I am using dyntrace  
(https://github.com/charpercyr/dyntrace) for that, which kind of works  
but is really hacky.

Jérémie Galarneau <jeremie.galarneau at efficios.com> a écrit :

> On 28 May 2018 at 10:30, Loïc Gelle <loic.gelle at polymtl.ca> wrote:
>
>> Hi Jeremie,
>>
>> Thanks for your answer. I roughly estimated the overhead of calling an
>> empty C function (passing two integer arguments) from Go to 50ns per call.
>> Maybe not a big deal for a lot of use cases, but more problematic if you
>> want to trace performance-critical parts of Go like its runtime itself. The
>> overhead could even be bigger when it involves passing strings or arrays
>> that have different memory layouts in Golang and C. What was the overhead
>> that you observed for Python and Java?
>>
>
> 50ns per call doesn't sound too bad honestly.
>
> You have to ask yourself if you could get within 50ns of lttng-ust's
> performance with a custom ring buffer implemented in Go.
>
> To use some very rough numbers, lttng-ust for that payload, takes around
> ~250ns per event. With Mathieu's work on restartable sequence, that number
> will be shaved off quite a bit (by half, if I remember correctly), and I'm
> not sure you'll be able to use that kind of mechanism from Go code.
>
> I don't have numbers on hand for Python and Java. In both cases, we are
> hooking into logging frameworks so the overhead of calling into C code
> probably pales in comparison to the time spent formatting strings.
> That's another problem in using the current "agent" mechanism; it really
> only accommodates a very specific tracepoint signature that takes a string
> payload.
>
>
>>
>> From what I understand, it will always be a problem to have agents for
>> languages different than C, especially if you want to keep relying on
>> existing C code. Even if the sessiond part is independant from the agent
>> itself, there are tons of implementation issues that are specific to the
>> language itself. The problem with Go is that calling C functions is really
>> a hack that does not integrate well with the build system that was designed
>> for Go.
>>
>
>
> The solutions I see:
>
> 1) Replicate the current "agent" scheme and serialize all Go events to
> strings
>
> Not ideal as you lose the events' typing, you have to serialize to strings
> on the fast path, and you can hardly filter on event payloads.
>
> 2) Write a native Go ring-buffer that can be consumed by LTTng
>
> In essence, all the tracing would happen in Go. Events would be serialized
> by Go code and the Go "agent" would produce the CTF metadata that describes
> their layout.
>
> From an integration standpoint, that's probably the most elegant solution
> as you have no hard dependency on native code in your go projects. However,
> it's a _lot_ of work.
>
> First, you have to re-implement a ring-buffer that needs to perform within
> 50ns of lttng-ust's ring-buffer to be useful. You also need to port the
> event filtering bytecode interpreter to Go.
> Then, we need to find a way to consume that ring-buffer's content from a
> form of consumer daemon within lttng-tools.
>
> 3) Add an lttng-ust API to allow dynamic event declaration
>
> This is something we have been considering for a while.
>
> Basically, we would like to introduce an API that allows applications to
> dynamically declare tracepoints.
> Then, those events would be serialized from Go, but the ring-buffer logic
> would remain in C.
>
> On each event, we would:
>   - Obtain a memory area from lttng-ust (reserve phase, C code called from
> Go)
>   - Write the event's content to that area (from Go code)
>   - Commit the event (C code called from Go)
>
> With this, you don't have to manually declare tracepoints and integrate
> them into a build system to generate providers; the Go application just
> needs to link to lttng-ust at runtime.
> It's not a perfect solution, but it seems like an interesting compromise.
>
>
> What do you think?
>
> Jérémie
>
>
>
>>
>> Did I provide more context?
>>
>> Cheers,
>> Loïc.
>>
>> Jérémie Galarneau <jeremie.galarneau at efficios.com> a écrit :
>>
>> On 4 May 2018 at 06:03, Loïc Gelle <loic.gelle at polymtl.ca> wrote:
>>>
>>> Hi,
>>>>
>>>> There has been a previous discussion on the mailing list about porting
>>>> LTTng to Golang, about a year ago: https://lists.lttng.org/
>>>> pipermail/lttng-dev/2017-June/027203.html . This new topic is to discuss
>>>>
>>>> more precisely about implementation possibilities.
>>>>
>>>> Currently, one has to use the the C UST agent from LTTng in order to
>>>> instrument Golang programs, and to compile the whole thing using custom
>>>> Makefiles and cgo. Here is a recent example that I wrote:
>>>> https://github.com/loicgelle/jaeger-go-lttng-instr
>>>>
>>>> As you can guess, there are a low of drawbacks in that approach. It is
>>>> actually a hack and cannot be integrated into more complex Golang program
>>>> that use a more complex build process (e.g. the Golang runtime itself),
>>>> because of the compiler instructions that you have to include at the top
>>>> of
>>>> the Golang files. There is also a big concern about the performance of
>>>> this
>>>> solution, as calling a C function from Go requires to do a full context
>>>> switch on the stack, because the calling conventions in C and Golang are
>>>> different.
>>>>
>>>>
>>> I think a more integrated and performant solution is needed. We can’t
>>>> really ignore a language such as Golang that is now widely adopted for
>>>> cloud applications. LTTng is really the best solution out there in terms
>>>> of
>>>> overhead per tracepoint, and could benefit from being made available to
>>>> such a large community. My question to the experts on this mailing list:
>>>> how much would it take to write a Golang agent for LTTng?
>>>>
>>>>
>>>
>>> Hi Loïc,
>>>
>>> Without having performed any measurements myself, it does seem like
>>> calling
>>> C from Go is very expensive. In that context, I can see that LTTng would
>>> probably lose its performance advantage over any native Go solution.
>>> However, it wouldn't hurt to measure the impact and see if it really is a
>>> deal breaker.
>>>
>>> We faced the same dilemma when implementing the Java and Python support in
>>> lttng-ust. In those cases, we ended up calling C code, with the
>>> performance
>>> penalties it implies. The correlation with other applications' and the
>>> kernel's events, along with the rest of LTTng's features, provided enough
>>> value to make that solution worthwhile.
>>>
>>> There aren't a ton of solutions if we can't call existing C code. We
>>> basically have to reimplement a ring-buffer and the setup/communication
>>> infrastructure to interact with the lttng-sessiond. The communication with
>>> the session daemon is not a big concern as the protocol is fairly
>>> straightforward.
>>>
>>> The "hairy" part is that lttng-ust and lttng-consumerd use a shared memory
>>> map to produce and consume the tracing buffers. This means that all
>>> changes
>>> to that memory layout would need to be replicated in the Go tracer, making
>>> future evolution more difficult. Also, I don't know how easy it would be
>>> to
>>> synchronize C and Go applications interacting in a shared memory map given
>>> those languages have different memory models. My knowledge of Go doesn't
>>> go
>>> that far.
>>>
>>> A more viable solution could be to introduce a Go-native consumer daemon
>>> implementing its own synchronization with Go applications. This way, that
>>> implementation could evolve on its own and could also start with a simpler
>>> ring buffer than lttng-ust's.
>>>
>>> Still, it is not a small undertaking and it basically means maintaining a
>>> third tracer implementation.
>>>
>>>
>>> What do you think?
>>>
>>> Thanks!
>>> Jérémie
>>>
>>>
>>> Cheers,
>>>> Loïc.
>>>>
>>>> _______________________________________________
>>>> lttng-dev mailing list
>>>> lttng-dev at lists.lttng.org
>>>> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>>>
>>>>
>>>>
>>>
>>> --
>>> Jérémie Galarneau
>>> EfficiOS Inc.
>>> http://www.efficios.com
>>>
>>
>>
>>
>>
>
>
> --
> Jérémie Galarneau
> EfficiOS Inc.
> http://www.efficios.com