[lttng-dev] Capturing snapshot on kernel panic

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Thu May 16 15:56:28 EDT 2024


Hi Damien,

If kexec is not an option on your system, you might be able to
access the pmem+dax filesystem after a warm reboot, but it very
much depends on whether your bios clears your memory or not on
warm reboot.

Cheers,

Mathieu

On 2024-05-16 14:22, Damien Berget via lttng-dev wrote:
> Thanks Kienan for these quick suggestions,
> we'll investigate the pmem route (I was not aware of the lttng-cash 
> utility, it's pretty nice) even if I'm not sure how fast it would burn 
> through our SSD, it might still be worth trying.
> As for kexec-tool, it's not officially supported on our embedded modules 
> unfortunately, so we might be SOL there. We may have to try to add our 
> own trace-point in kernel to use as trigger.
> Cheers
> Damien
> 
> On Thu, May 16, 2024 at 8:12 AM Kienan Stewart <kstewart at efficios.com 
> <mailto:kstewart at efficios.com>> wrote:
> 
>     Hi Damien,
> 
>     I want to expand on one of the options that could work for your case.
> 
>     On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
>      > Hi Damien,
>      >
>      >
>      > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
>      >> Good day,
>      >> we have been using LTTng successfully to capture snapshots on user
>      >> defined tracepoints and it did provide invaluable to debug our
>     issues.
>      >> Thanks to all the contributors of this project!
>      >>
>      >> We'd like to know if it would be possible to trigger on a kernel
>      >> panic? I might be dubiously possible as you would still need to
>     have
>      >> the file-system working to write the results but I should ask.
>      >>
>      >
>      > For userspace tracing, I think the recommendation is usually to
>     use a
>      > dax/pmem device and have the buffers for the session mapped
>     there. After
>      > a panic, the contents of the buffers can be restored using
>     lttng-crash[1].
>      >
>      > Note that dax/pem isn't supported by the kernel space tracer at
>     this time.
>      >
>      > If I recall, there are other ways to things in the panic sequence
>     (that
>      > aren't lttng specific), but I'm personally not as familiar with the
>      > details of that stage of linux.
>      >
> 
>     It's possible to kexec-tools to load a new kernel post-panic[1]. If
>     your
>     system uses kexec, the contents of RAM aren't necessarily flushed, and
>     if both the initial kernel and post-panic kernel started by kexec have
>     the same configuration for an emulated PMEM device using the memmap
>     paramenter [2,3] that region of memory can have a daxfs created in it
>     post-clean boot.
> 
>     Note: some systems may not flush the memory during a warm reboot, but
>     this is dependent on the BIOS.
> 
>     When your system boots you could do something like the following:
> 
>        * If it's a clean boot, create the daxfs
>        * If it's an "unclean" boot (e.g. the daxfs already exists, or a
>     kernel parameter informs you that it started post-panic) then you can
>     copy/move/use lttng-crash to persistent storage for analysis
>        * Start tracing using a snapshot session and the userspace
>     buffers on
>     the daxfs.
> 
>     In this type of situation the "snapshot" command is never invoked
>     directly, but the recovery of the buffers to create a snapshot is
>     possible.
> 
>     [1]:
>     https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html
>     <https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html>
>     [2]:
>     https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html <https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html>
>     [3]:
>     https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap <https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap>
> 
>     thanks,
>     kienan
> 
>      >> Looking at available kernel syscall, the "reboot" one seems like a
>      >> good candidate, however I was not able to capture a snapshot on
>     it. I
>      >> have tested the setup below with "--name=chdir" syscall and it
>      >> works, "cd" to a directory will create a trace. But no dice with
>     reboot.
>      >>
>      >
>      > The details of how this work will depend on your system. For
>     example, my
>      > installations tend to use systemd as PID 1. The broad strokes
>     seem to
>      > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
>      > believe then kicks off the reboot.service, the PID 1 is swapped to
>      > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent
>     to all
>      > processes, unmounts, syncs, calls the reboot system call [2,3].
>      >
>      > As both the sigterm and the unmounts are done before the syscall,
>      > lttng-sessiond and the consumers will have already shutdown by
>     the time
>      > it enters.
>      >
>      > While this doesn't necessarily help your original question of
>     panics, if
>      > you want to snapshot before shutdown or reboot and are using
>     systemd,
>      > it's possible to leave a script or binary in a known directory so
>     that
>      > it's invoked prior to the rest of the shutdown sequence[4].
>      >
>      > [1]:
>     https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
>     <https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems>
>      > [2]:
>      >
>     https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c <https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c>
>      > [3]:
>      >
>     https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77 <https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77>
>      > [4]:
>     https://www.systutorials.com/docs/linux/man/8-systemd-reboot/
>     <https://www.systutorials.com/docs/linux/man/8-systemd-reboot/>
>      >
>      > hope this helps,
>      > kienan
>      >
>      >> Would you have any suggestions?
>      >> Thanks for your help,
>      >> Cheers
>      >> Damien
>      >>
>      >> ============================
>      >>
>      >> # Prep output dir
>      >> mkdir /application/trace/
>      >> rm -rf /application/trace/*
>      >>
>      >> # Create session
>      >> sudo lttng destroy snapshot-trace-session
>      >> sudo lttng create snapshot-trace-session --snapshot
>      >> --output="/application/trace/"
>      >> sudo lttng enable-channel --kernel --num-subbuf=8 channelk
>      >> sudo lttng enable-channel --userspace --num-subbuf=8 channelu
>      >>
>      >> # Configure session
>      >> sudo lttng enable-event --kernel --syscall --all --channel channelk
>      >> sudo lttng enable-event --kernel --tracepoint "sched*" --channel
>     channelk
>      >> sudo lttng enable-event --userspace --all --channel channelu
>      >> sudo lttng add-context -u -t vtid -t procname
>      >> sudo lttng remove-trigger trig_reboot
>      >> sudo lttng add-trigger --name=trig_reboot \
>      >>          --condition=event-rule-matches
>     --type=kernel:syscall:entry \
>      >>          --name=reboot\
>      >>          --action=snapshot-session snapshot-trace-session \
>      >>          --rate-policy=once-after:1
>      >>
>      >> # start & list info
>      >> sudo lttng start
>      >> sudo lttng list snapshot-trace-session
>      >> sudo lttng list-triggers
>      >>
>      >> #======== test it...
>      >> sudo reboot
>      >>
>      >> #======= reconnect and Nothing :(
>      >> $ ls -alu /application/trace/
>      >> drwxr-xr-x    2 u  u       4096 May 15  2024 .
>      >> drwxr-xr-x   10 u  u       4096 May 15  2024 ..
>      >>
>      >>
>      >> _______________________________________________
>      >> lttng-dev mailing list
>      >> lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
>      >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>     <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
>      > _______________________________________________
>      > lttng-dev mailing list
>      > lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
>      > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>     <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
> 
> 
> 
> -- 
> *Damien Berget*
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com



More information about the lttng-dev mailing list