[lttng-dev] Trigger snapshots on a watchdog

Kienan Stewart kstewart at efficios.com
Thu Sep 12 03:57:33 EDT 2024


Hi Damien,

On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
> Good day,
> We are trying to see what it the best way to monitor some applications 
> not hitting a deadline. Ideally something like a watchdog that needs 
> to be pat regularly and if timeout is reached triggers the snapshot.
>
> Before we reinvent the wheel and code some userland applications, is 
> there a canonical way in LTTng to do it? I found this 
> <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously 
> close maybe?
>
I don't think the the proposed changes you linked to are useful or 
related to what you hope to achieve. The patch series is a concept about 
how some types of UST ring buffer stalls might be addressed by the 
session daemon. After a quick glance, the monitoring seems to be more 
closely related to the 'monitor timer', which is used to sample 
statistical information channels[1].


There is a concept of triggers[2]; however triggers react to the 
presence of events rather than the absence thereof.


I think a small user space application that monitors the state of other 
applications is more the direction to head in. There's at least of 
couple of ways that a snapshot on unhealthy state could be achieved:


* Use liblttng-ctl to trigger a snapshot from your watchdog 
application[3][4].

* Have the watchdog application exec `lttng snapshot record`[5].

* Have the watchdog application emit some sort of "health state" events 
with some data (e.g. health_okay, health_bad, ...) per your usage 
requirements, and configure a trigger[2] to take a snapshot on the 
"health state" events that have the non-okay state.


Depending on your tracing configuration - channel overwrite/discard 
mode[6], buffer sizes, blocking mode, and number of events it is 
possible that events may not be recorded. I would privilege using 
liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger 
guarantee that your watchdog will cause a snapshot to be taken.


I would love to hear if there are other ideas. Regardless, hope this helps!


thanks,

kienan


[1]: https://lttng.org/docs/v2.13/#doc-channel-timers

[2]:  https://lttng.org/docs/v2.13/#doc-trigger

[3]:  https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng

[4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl

[5]: https://lttng.org/man/1/lttng-snapshot/v2.13/

[6]: 
https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode


> Thanks,
> Cheers
>
> -- 
> *Damien Berget*
> Embedded Platform Lead
> damien.berget at flyzipline.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


More information about the lttng-dev mailing list