[ltt-dev] Mini Design and Roadmap of LTT-Kdump

Mathieu Desnoyers mathieu.desnoyers at polymtl.ca
Wed Feb 18 22:58:07 EST 2009


* Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> 
> 
> Mini Design and Roadmap of LTT-Kdump
> 

Hi Lai,

This sounds very interesting and useful !

> -----------
> 	People in enterprise need to be able to diagnose why the system
> failed. Failing once is acceptable from a customer perspective, but
> failing again isn't. In this case, being able to extract the last events
> before the crash can be very valuable and helps solving the problems
> before they happen again.
> 
> 	Create tools to simplify extraction of traces from crashed kernel.
> The core file of crashed kernel is provided by kdump.
> -----------
> 
> 	We will implement it. This tool include two parts.
> 
> Part1: Core-file analyser.
> 	Analyse(needs kernel-debuginfo) the core-file and read
> ltt-relay files.
> 	This part will use elf-libs for analysing, but we use crash(8)
> instead at first. crash(8) can simplify this work and crash(8) can perform
> on a compressed core-file(http://sourceforge.net/projects/makedumpfile/).
> 	When we use crash(8), we will write a gdb script for analysing.
> crash(8) loads this gdb script, core-file and kernel-debuginfo then listens
> to a pipe and does works.
> 	crash(8) works very well when pages are vmap()ed into to a continuous
> memmory region. But ltt-relay's pages are not vmap()ed, it'll very slow.
> so we may use elf-libs or enhance crash(8) at last.

Why would it be so slow ? We don't have a contiguous memory mapping, but
we have a linked list of pages we can walk. I am not very familiar with
the crash(8) internals though, so there could be a limitation I do not
foresee...

> 
> Part2: ltt-relay extracter
> 	Extracter calls analyser's API to travel the debugfs tree in core-file,
> and copies all ltt-relay files to the disk. The written files are the same
> format exactly as the files what lttd writes, so we can use lttv or other
> lttng tools to read the events.

Yes, and we should think about giving the ability to copy the tracefiles
created from the crash buffers into an existing trace. So we get the
following scenario :

Tracing to disk.. trace gets written partially. The crash may happen
while the trace is being written.

The tool should detect these incomplete trace subbuffers on the disk and
truncate them. It should then add the information extracted from the
crash. This information can be added in separate tracefiles if
necessary, e.g. :

trace/sched_0 (normal tracefile for scheduler activity)
trace/crash/sched_0 (scheduler activity extracted from crash dump)

Best regards,

Mathieu




> 
> 
> Any comments and ideas are welcome!
> 
> Thanks, Lai
> 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list