[lttng-dev] Trigger snapshots on a watchdog

Damien Berget damien.berget at flyzipline.com
Thu Sep 12 12:14:17 EDT 2024


Thanks for the quick response Kienan,
Your proposal is exactly how we were thinking the monitor application could
work, so we'll go with that for now.
Reacting to absence of an event (watch dog) would really be a good
complement to the existing trigger types.
It's a really useful feature for a flight recorder in embedded medium
real-time applications, is the team open to feature requests?
Cheers
Damien

On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart <kstewart at efficios.com>
wrote:

> Hi Damien,
>
> On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> > Good day,
> > We are trying to see what it the best way to monitor some applications
> > not hitting a deadline. Ideally something like a watchdog that needs
> > to be pat regularly and if timeout is reached triggers the snapshot.
> >
> > Before we reinvent the wheel and code some userland applications, is
> > there a canonical way in LTTng to do it? I found this
> > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously
> > close maybe?
> >
> I don't think the the proposed changes you linked to are useful or
> related to what you hope to achieve. The patch series is a concept about
> how some types of UST ring buffer stalls might be addressed by the
> session daemon. After a quick glance, the monitoring seems to be more
> closely related to the 'monitor timer', which is used to sample
> statistical information channels[1].
>
>
> There is a concept of triggers[2]; however triggers react to the
> presence of events rather than the absence thereof.
>
>
> I think a small user space application that monitors the state of other
> applications is more the direction to head in. There's at least of
> couple of ways that a snapshot on unhealthy state could be achieved:
>
>
> * Use liblttng-ctl to trigger a snapshot from your watchdog
> application[3][4].
>
> * Have the watchdog application exec `lttng snapshot record`[5].
>
> * Have the watchdog application emit some sort of "health state" events
> with some data (e.g. health_okay, health_bad, ...) per your usage
> requirements, and configure a trigger[2] to take a snapshot on the
> "health state" events that have the non-okay state.
>
>
> Depending on your tracing configuration - channel overwrite/discard
> mode[6], buffer sizes, blocking mode, and number of events it is
> possible that events may not be recorded. I would privilege using
> liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger
> guarantee that your watchdog will cause a snapshot to be taken.
>
>
> I would love to hear if there are other ideas. Regardless, hope this helps!
>
>
> thanks,
>
> kienan
>
>
> [1]: https://lttng.org/docs/v2.13/#doc-channel-timers
>
> [2]:  https://lttng.org/docs/v2.13/#doc-trigger
>
> [3]:  https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng
>
> [4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl
>
> [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/
>
> [6]:
> https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode
>
>
> > Thanks,
> > Cheers
> >
> > --
> > *Damien Berget*
> > Embedded Platform Lead
> > damien.berget at flyzipline.com
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>


-- 
*Damien Berget*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.lttng.org/pipermail/lttng-dev/attachments/20240912/e0e65aac/attachment.htm>


More information about the lttng-dev mailing list