[lttng-dev] Trigger snapshots on a watchdog
Kienan Stewart
kstewart at efficios.com
Thu Sep 12 03:57:33 EDT 2024
Hi Damien,
On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> Good day,
> We are trying to see what it the best way to monitor some applications
> not hitting a deadline. Ideally something like a watchdog that needs
> to be pat regularly and if timeout is reached triggers the snapshot.
>
> Before we reinvent the wheel and code some userland applications, is
> there a canonical way in LTTng to do it? I found this
> <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously
> close maybe?
>
I don't think the the proposed changes you linked to are useful or
related to what you hope to achieve. The patch series is a concept about
how some types of UST ring buffer stalls might be addressed by the
session daemon. After a quick glance, the monitoring seems to be more
closely related to the 'monitor timer', which is used to sample
statistical information channels[1].
There is a concept of triggers[2]; however triggers react to the
presence of events rather than the absence thereof.
I think a small user space application that monitors the state of other
applications is more the direction to head in. There's at least of
couple of ways that a snapshot on unhealthy state could be achieved:
* Use liblttng-ctl to trigger a snapshot from your watchdog
application[3][4].
* Have the watchdog application exec `lttng snapshot record`[5].
* Have the watchdog application emit some sort of "health state" events
with some data (e.g. health_okay, health_bad, ...) per your usage
requirements, and configure a trigger[2] to take a snapshot on the
"health state" events that have the non-okay state.
Depending on your tracing configuration - channel overwrite/discard
mode[6], buffer sizes, blocking mode, and number of events it is
possible that events may not be recorded. I would privilege using
liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger
guarantee that your watchdog will cause a snapshot to be taken.
I would love to hear if there are other ideas. Regardless, hope this helps!
thanks,
kienan
[1]: https://lttng.org/docs/v2.13/#doc-channel-timers
[2]: https://lttng.org/docs/v2.13/#doc-trigger
[3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng
[4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl
[5]: https://lttng.org/man/1/lttng-snapshot/v2.13/
[6]:
https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode
> Thanks,
> Cheers
>
> --
> *Damien Berget*
> Embedded Platform Lead
> damien.berget at flyzipline.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
More information about the lttng-dev
mailing list