[lttng-dev] Trigger snapshots on a watchdog
Kienan Stewart
kstewart at efficios.com
Fri Sep 13 05:51:02 EDT 2024
Hi Damien,
I've added a very summaryfeature request issue here[1], referring to
this discussion. If you would like to elaborate or add other details,
that would be most excellent.
thanks,
kienan
[1]: https://bugs.lttng.org/issues/1416
On 2024-09-12 12:14, Damien Berget wrote:
> Thanks for the quick response Kienan,
> Your proposal is exactly how we were thinking the monitor application
> could work, so we'll go with that for now.
> Reacting to absence of an event (watch dog) would really be a good
> complement to the existing trigger types.
> It's a really useful feature for a flight recorder in embedded medium
> real-time applications, is the team open to feature requests?
> Cheers
> Damien
>
> On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart
> <kstewart at efficios.com> wrote:
>
> Hi Damien,
>
> On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> > Good day,
> > We are trying to see what it the best way to monitor some
> applications
> > not hitting a deadline. Ideally something like a watchdog that
> needs
> > to be pat regularly and if timeout is reached triggers the snapshot.
> >
> > Before we reinvent the wheel and code some userland
> applications, is
> > there a canonical way in LTTng to do it? I found this
> > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is
> suspiciously
> > close maybe?
> >
> I don't think the the proposed changes you linked to are useful or
> related to what you hope to achieve. The patch series is a concept
> about
> how some types of UST ring buffer stalls might be addressed by the
> session daemon. After a quick glance, the monitoring seems to be more
> closely related to the 'monitor timer', which is used to sample
> statistical information channels[1].
>
>
> There is a concept of triggers[2]; however triggers react to the
> presence of events rather than the absence thereof.
>
>
> I think a small user space application that monitors the state of
> other
> applications is more the direction to head in. There's at least of
> couple of ways that a snapshot on unhealthy state could be achieved:
>
>
> * Use liblttng-ctl to trigger a snapshot from your watchdog
> application[3][4].
>
> * Have the watchdog application exec `lttng snapshot record`[5].
>
> * Have the watchdog application emit some sort of "health state"
> events
> with some data (e.g. health_okay, health_bad, ...) per your usage
> requirements, and configure a trigger[2] to take a snapshot on the
> "health state" events that have the non-okay state.
>
>
> Depending on your tracing configuration - channel overwrite/discard
> mode[6], buffer sizes, blocking mode, and number of events it is
> possible that events may not be recorded. I would privilege using
> liblttng-ctl or exec'ing `lttng snapshort record` if you want a
> stronger
> guarantee that your watchdog will cause a snapshot to be taken.
>
>
> I would love to hear if there are other ideas. Regardless, hope
> this helps!
>
>
> thanks,
>
> kienan
>
>
> [1]: https://lttng.org/docs/v2.13/#doc-channel-timers
>
> [2]: https://lttng.org/docs/v2.13/#doc-trigger
>
> [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng
>
> [4]:
> https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl
>
> [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/
>
> [6]:
> https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode
>
>
> > Thanks,
> > Cheers
> >
> > --
> > *Damien Berget*
> > Embedded Platform Lead
> > damien.berget at flyzipline.com
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>
>
> --
> *Damien Berget*
More information about the lttng-dev
mailing list