[lttng-dev] Capturing snapshot on kernel panic
Kienan Stewart
kstewart at efficios.com
Thu May 16 09:37:33 EDT 2024
Hi Damien,
On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
> Good day,
> we have been using LTTng successfully to capture snapshots on user
> defined tracepoints and it did provide invaluable to debug our issues.
> Thanks to all the contributors of this project!
>
> We'd like to know if it would be possible to trigger on a kernel panic?
> I might be dubiously possible as you would still need to have the
> file-system working to write the results but I should ask.
>
For userspace tracing, I think the recommendation is usually to use a
dax/pmem device and have the buffers for the session mapped there. After
a panic, the contents of the buffers can be restored using lttng-crash[1].
Note that dax/pem isn't supported by the kernel space tracer at this time.
If I recall, there are other ways to things in the panic sequence (that
aren't lttng specific), but I'm personally not as familiar with the
details of that stage of linux.
> Looking at available kernel syscall, the "reboot" one seems like a good
> candidate, however I was not able to capture a snapshot on it. I have
> tested the setup below with "--name=chdir" syscall and it works, "cd" to
> a directory will create a trace. But no dice with reboot.
>
The details of how this work will depend on your system. For example, my
installations tend to use systemd as PID 1. The broad strokes seem to
be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
believe then kicks off the reboot.service, the PID 1 is swapped to
/usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent to all
processes, unmounts, syncs, calls the reboot system call [2,3].
As both the sigterm and the unmounts are done before the syscall,
lttng-sessiond and the consumers will have already shutdown by the time
it enters.
While this doesn't necessarily help your original question of panics, if
you want to snapshot before shutdown or reboot and are using systemd,
it's possible to leave a script or binary in a known directory so that
it's invoked prior to the rest of the shutdown sequence[4].
[1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
[2]:
https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c
[3]:
https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77
[4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/
hope this helps,
kienan
> Would you have any suggestions?
> Thanks for your help,
> Cheers
> Damien
>
> ============================
>
> # Prep output dir
> mkdir /application/trace/
> rm -rf /application/trace/*
>
> # Create session
> sudo lttng destroy snapshot-trace-session
> sudo lttng create snapshot-trace-session --snapshot
> --output="/application/trace/"
> sudo lttng enable-channel --kernel --num-subbuf=8 channelk
> sudo lttng enable-channel --userspace --num-subbuf=8 channelu
>
> # Configure session
> sudo lttng enable-event --kernel --syscall --all --channel channelk
> sudo lttng enable-event --kernel --tracepoint "sched*" --channel channelk
> sudo lttng enable-event --userspace --all --channel channelu
> sudo lttng add-context -u -t vtid -t procname
> sudo lttng remove-trigger trig_reboot
> sudo lttng add-trigger --name=trig_reboot \
> --condition=event-rule-matches --type=kernel:syscall:entry \
> --name=reboot\
> --action=snapshot-session snapshot-trace-session \
> --rate-policy=once-after:1
>
> # start & list info
> sudo lttng start
> sudo lttng list snapshot-trace-session
> sudo lttng list-triggers
>
> #======== test it...
> sudo reboot
>
> #======= reconnect and Nothing :(
> $ ls -alu /application/trace/
> drwxr-xr-x 2 u u 4096 May 15 2024 .
> drwxr-xr-x 10 u u 4096 May 15 2024 ..
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
More information about the lttng-dev
mailing list