[lttng-dev] A question on protocols

Mon Nov 19 09:47:21 EST 2018

On Fri, Nov 16, 2018 at 6:16 PM Jonathan Rajotte-Julien <
jonathan.rajotte-julien at efficios.com> wrote:

> Hi Patrick,
>
> On Fri, Nov 16, 2018 at 09:49:52AM +0100, mrx wrote:
> > Hi,
> >
> > I have a need to collect LTTng live-traces from systems with very limited
> > RAM
> > and flash resources. This tracing will be running continuesly for months
> > monitoring our systems. The only way for me to transport those CTF
> records
> > somewhere else is via HTTP proxy. LTTng doesn't seem to have support for
> > sending
> > over proxies at all. So I think I really have a challenge a head of me,
> if
> > this
> > is at all possible.
>
> VPN through http proxy. Better alternative would be to speak with your
> sysadmin
> and see what you can do. Keep in mind that the protocol between relayd and
> consumer is in no way "secure".
>

The security between consumerd and relayd isn't an issue as we'd be forced
to keep both on the device. This because we have no way of transporting the
consumerd <-> relayd communication over the HTTP proxy, which is our only
choice. We have already talked to the team in charge of the network paths.

>
> >
> > The plan is to write my own relayd from which I can then stream the
> > received CTF
> > records + metadata to where I can analyze them. For this to work I need
> > documentation on the protocol between consumerd and relayd. I cannot find
> > the
> > documentation for this, where can I find it?
>
> The source code.
>
> >
> > Do you think this is a viable solution?
>
> Doubt it. But we never know.
>
> >
> > Once I receive the data where I have the possiblity to analyze it. Then
> I'm
> > not
> > sure if I'm required to write everything to the file system to be able to
> > analyze the data. How would I then rotate the logs on disc so I can clean
> > up?
>
> The 2.11 release will include a new feature for session rotation.
>
> See this presentation [1] from Jérémie Galarneau explaining how the
> session rotation
> can be used.
>
> [1]
> https://events.linuxfoundation.org/wp-content/uploads/2017/12/Fine-grained-Distributed-Application-Monitoring-Using-LTTng-J%C3%A9r%C3%A9mie-Galarneau-EfficiOS.pdf

I'll for sure look into this.

>
>
>
> > The best for me would be if I didn't have to go via disc at all I think.
> >
> > Are there any others working on similar solution, if so, how are they
> > solving this?
> > How would you recommend I solve this?
>
> I would go for session rotation via relayd (not in live mode) with a daemon
> watching for ready-to-consume chunks. You can adjust for the granularity
> you need at the target level.
>
> This could be done close to the target then compress the trace chunks and
> send them
> over http(s) to the monitoring pipeline.
>
> >
> > The reason the current relayd doesn't work for me is two-fold:
> > 1. I cannot get relayd to not write down the trace to disc. Can you
> control
> > this at all for live tracing?
>
> What is the real reason for not writing to disk on the relayd side?
>

The relayd side have to be on the device. The device has very limited
amounts of free flash. Even if there were flash available it would wear
down the flash faster than what we'd like.

>
> > 2. I cannot find the documentation for the relayd <-> viewer protocol.
> > Where can I find it?
>
> Source code. The initial design proposal is under doc/ in the lttng-tools
> tree.
>

Thanks!

>
> >
> > It might be that storing the traces on disc is a pre-requisite for
> serving a
> > viewer properly. Perhaps it's just something required by relayd based on
> > how it works internally. I don't know.
>
> You can look at relayd as a specialized ftp server. The user is
> responsible in
> managing the lifetime of the traces generated. The live protocol simply
> allow a
> viewer to see the trace as it get received/stored.
>

That helps in visualizing the different components for me, thanks for that!

It also clarifies some of the problems we face with using relayd as a
solution to our problem.

>
> Nothing prevent you to output on a tmpfs (ramdisk) if hitting the disk is
> such a
> problem.
>

A RAM-based disc might have been a solution if we would have had any
mentionable amounts of RAM to spare for this. Unfortunately we don't have
that.

>
> Keep in mind that trace production is normally much more quicker than trace
> reading/analysis. A buffering scheme is mostly always necessary.
>

Very valid point, I'll for sure keep that in mind.

This speaks strongly against replacing relayd with something homebrewed, as
was the original plan. We'd probably not be able to shuffle the data fast
enough over the network to keep up with the pace.

>
> You can also use the --trace-file-size and --trace-file-count to limit the
> disk
> footprint of each live session. Make sure to have enough buffer for live
> reading if still using live.
>

This isn't an option as our disc is flash based and we'd like to limit the
wear due to collecting metrics.

Thanks for your insights, much appreciated.

// Patrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20181119/802204a1/attachment.html>