<div dir="ltr">Hi,<div style>Just getting back into lttng after quite a few years</div><div style>I need to set up a continuous trace on a system.</div><div style><br></div><div style>Every 10 seconds I stop one session and start another to give me individual files for each 10 second segment.</div>
<div style>I originally intended to start the next session before stopping the previous but that was causing the sessond to hang.</div><div style><br></div><div style>Ever so often I can start the trace and the trace file is generated but ongoing commands to lttng are ignored. The new trace file then grows until all the disk space is used. sessiond seems to be hung</div>
<div style><br></div><div style><br></div><div style>I have the kernel dmseg log attached and I also have the trace file if needed</div><div style><br></div><div style><br></div><div style> [ 345.223564] LTT state dump thread start</div>
<div>[ 345.231033] LTT state dump end</div><div>[ 353.586011] ring buffer relay-metadata: 3604 records written, 0 records overrun</div><div>[ 353.590023] ring buffer relay-discard, cpu 0: 408474 records written, 0 records overrun</div>
<div>[ 353.590237] ring buffer relay-discard, cpu 1: 390793 records written, 0 records overrun</div><div>[ 353.590237] ring buffer relay-discard, cpu 2: 363560 records written, 0 records overrun</div><div>[ 353.590685] ring buffer relay-discard, cpu 3: 518867 records written, 0 records overrun</div>
<div>[ 355.720753] LTTng: state dump begin</div><div>[ 355.720753] LTT state dump thread start</div><div>[ 355.732034] LTT state dump end</div><div>[ 364.111388] ring buffer relay-metadata: 3604 records written, 0 records overrun</div>
<div>[ 364.120334] ring buffer relay-discard, cpu 0: 409693 records written, 0 records overrun</div><div>[ 364.120334] ring buffer relay-discard, cpu 1: 401971 records written, 0 records overrun</div><div>[ 364.121438] ring buffer relay-discard, cpu 2: 380074 records written, 0 records overrun</div>
<div>[ 364.122187] ring buffer relay-discard, cpu 3: 524538 records written, 0 records overrun</div><div>[ 366.259690] LTTng: state dump begin</div><div>[ 366.259690] LTT state dump thread start</div><div>[ 366.266253] LTT state dump end</div>
<div>[ 374.633764] ring buffer relay-metadata: 3604 records written, 0 records overrun</div><div>[ 374.640024] ring buffer relay-discard, cpu 0: 430919 records written, 0 records overrun</div><div>[ 374.640024] ring buffer relay-discard, cpu 1: 387021 records written, 0 records overrun</div>
<div>[ 374.641033] ring buffer relay-discard, cpu 2: 370007 records written, 0 records overrun</div><div>[ 374.641033] ring buffer relay-discard, cpu 3: 528606 records written, 0 records overrun</div><div>[ 376.795384] LTTng: state dump begin</div>
<div>[ 376.795384] LTT state dump thread start</div><div>[ 601.060274] INFO: task lttng-sessiond:2885 blocked for more than 120 seconds.</div><div>[ 601.069038] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.</div>
<div>[ 601.077626] lttng-sessiond D ffff880074456b18 0 2885 1 0x10000000</div><div>[ 601.085210] ffff88006c0679f8 0000000000000082 0000000000000001 ffff88007b0a4140</div><div>[ 601.094031] ffff88006c0679a8 ffff880074456840 ffff88006c067fd8 ffff88006c067fd8</div>
<div>[ 601.105026] 0000000000011c80 ffff880074456840 0000000000000000 ffff88006c067a68</div><div>[ 601.112170] Call Trace:</div><div>[ 601.116027] [<ffffffff812dd423>] schedule+0x5b/0x5d</div><div>[ 601.121025] [<ffffffffa01b46cb>] lttng_statedump_start+0x5aa/0x623 [lttng_statedump]</div>
<div>[ 601.129035] [<ffffffff810543c7>] ? abort_exclusive_wait+0x8f/0x8f</div><div>[ 601.136030] [<ffffffffa01c03bb>] lttng_session_enable+0x52d/0x55b [lttng_tracer]</div><div>[ 601.144034] [<ffffffff810543c7>] ? abort_exclusive_wait+0x8f/0x8f</div>
<div>[ 601.150039] [<ffffffffa01c0de8>] lttng_session_ioctl+0x15f/0x225 [lttng_tracer]</div><div>[ 601.158042] [<ffffffff8123a1c3>] ? sock_recvmsg+0xa4/0xb9</div><div>[ 601.163185] [<ffffffff810f37b7>] ? lookup_page_cgroup+0x36/0x4d</div>
<div>[ 601.169453] [<ffffffff81186cf9>] ? cpumask_any_but+0x29/0x38</div><div>[ 601.175508] [<ffffffff8109984b>] ? trace_preempt_on+0x12/0x2f</div><div>[ 601.182031] [<ffffffff812e15eb>] ? sub_preempt_count+0x97/0xc0</div>
<div>[ 601.187735] [<ffffffff812de9a1>] ? _raw_spin_unlock+0x2a/0x35</div><div>[ 601.193922] [<ffffffff810c9767>] ? spin_unlock+0xe/0x10</div><div>[ 601.200027] [<ffffffff810caa44>] ? do_wp_page+0x26f/0x563</div>
<div>[ 601.206028] [<ffffffff8109987a>] ? trace_preempt_off+0x12/0x30</div><div>[ 601.212031] [<ffffffff812e16c8>] ? add_preempt_count+0xb4/0xcf</div><div>[ 601.218043] [<ffffffff810cc320>] ? handle_pte_fault+0x6c6/0x76c</div>
<div>[ 601.223737] [<ffffffff812443a7>] ? verify_iovec+0x52/0xa3</div><div>[ 601.229426] [<ffffffff810cc6cc>] ? handle_mm_fault+0x199/0x1a9</div><div>[ 601.236033] [<ffffffff812e14d3>] ? do_page_fault+0x2f6/0x377</div>
<div>[ 601.242034] [<ffffffffa01c4f8d>] ? __event_probe__sys_mprotect+0xf5/0xf5 [lttng_tracer]</div><div>[ 601.251031] [<ffffffff8110332a>] vfs_ioctl+0x22/0x3a</div><div>[ 601.256029] [<ffffffff81103bb7>] do_vfs_ioctl+0x3c4/0x407</div>
<div>[ 601.261188] [<ffffffff810f6e8e>] ? fget_light+0x91/0x9d</div><div>[ 601.267026] [<ffffffff81103c41>] sys_ioctl+0x47/0x6a</div><div>[ 601.272050] [<ffffffff812e377f>] tracesys+0xd9/0xde</div>
<div><br></div><div style><br></div><div><div>cat /proc/cpuinfo </div><div><br></div><div>processor<span class="" style="white-space:pre"> </span>: 0</div><div>vendor_id<span class="" style="white-space:pre"> </span>: GenuineIntel</div>
<div>cpu family<span class="" style="white-space:pre"> </span>: 6</div><div>model<span class="" style="white-space:pre"> </span>: 28</div><div>model name<span class="" style="white-space:pre"> </span>: Intel(R) Atom(TM) CPU D525 @ 1.80GHz</div>
<div>stepping<span class="" style="white-space:pre"> </span>: 10</div><div>microcode<span class="" style="white-space:pre"> </span>: 0x107</div><div>cpu MHz<span class="" style="white-space:pre"> </span>: 1799.823</div><div>
cache size<span class="" style="white-space:pre"> </span>: 512 KB</div><div>physical id<span class="" style="white-space:pre"> </span>: 0</div><div>siblings<span class="" style="white-space:pre"> </span>: 4</div><div>core id<span class="" style="white-space:pre"> </span>: 0</div>
<div>cpu cores<span class="" style="white-space:pre"> </span>: 2</div><div>apicid<span class="" style="white-space:pre"> </span>: 0</div><div>initial apicid<span class="" style="white-space:pre"> </span>: 0</div><div>fpu<span class="" style="white-space:pre"> </span>: yes</div>
<div>fpu_exception<span class="" style="white-space:pre"> </span>: yes</div><div>cpuid level<span class="" style="white-space:pre"> </span>: 10</div><div>wp<span class="" style="white-space:pre"> </span>: yes</div><div>flags<span class="" style="white-space:pre"> </span>: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dts</div>
<div>bogomips<span class="" style="white-space:pre"> </span>: 1775.61</div><div>clflush size<span class="" style="white-space:pre"> </span>: 64</div><div>cache_alignment<span class="" style="white-space:pre"> </span>: 64</div>
<div>address sizes<span class="" style="white-space:pre"> </span>: 36 bits physical, 48 bits virtual</div><div>power management:</div><div><br></div></div><div style>(repeated for 3 other cpus )</div><div style><div><br></div>
<div> uname -a</div><div>Linux 3.2.21-x510v1 #1 SMP PREEMPT Thu Dec 20 11:24:36 EST 2012 x86_64 GNU/Linux</div><div><br></div><div style>Script used to run trace</div><div style><div>=================================================</div>
<div>#!/bin/sh</div><div><br></div><div>xx=`/bin/date +"%s"`</div><div>yy="yy"</div><div>zz="zz"</div><div>log_xx=/home/pswuser/trace/log/psw-${xx}.log</div><div>log_yy=/home/pswuser/trace/log/psw-${yy}.log</div>
<div>log_zz=/home/pswuser/trace/log/psw-${zz}.log</div><div>#depmod</div><div>mkdir -p /home/pswuser/trace</div><div>mount -t tmpfs -o size=500M tmpfs /home/pswuser/trace</div><div>mkdir -p /home/pswuser/trace/log</div><div>
touch /home/pswuser/trace/run_trace</div><div>sleep 20<br></div><div>if [ -f /home/pswuser/trace/run_trace ] ; then </div><div><br></div><div> echo "lttng create" > ${log_xx}</div><div> lttng create psw-${xx} \</div>
<div> -o /home/pswuser/trace/psw-${xx} >> ${log_xx} 2>&1</div><div> echo "lttng enable-event" >> ${log_xx}</div><div> sleep 1</div><div> lttng enable-event -a -k -s psw-${xx} >> ${log_xx} 2>&1</div>
<div> sleep 1</div><div> echo "lttng start" >> ${log_xx}</div><div> lttng start psw-${xx} >> ${log_xx} 2>&1</div><div> sleep 1</div><div><br></div><div>fi</div><div><br></div><div>while [ 1 -gt 0 ] ; do</div>
<div> sleep 7</div><div> yy=`/bin/date +"%s"`</div><div> log_yy=/home/pswuser/trace/log/psw-${yy}.log</div><div><br></div><div><br></div><div>#stop xx</div><div> if [ -d /home/pswuser/trace/psw-${xx} ] ; then </div>
<div> echo "lttng stop" >> ${log_xx}</div><div> lttng stop psw-${xx} >> ${log_xx} 2>&1</div><div> sleep 1</div><div> echo "lttng destroy" >> ${log_xx}</div>
<div> lttng destroy psw-${xx} >> ${log_xx} 2>&1</div><div> echo "lttng done" >> ${log_xx}</div><div> fi</div><div><br></div><div># start yy after stopping xx </div><div> if [ -f /home/pswuser/trace/run_trace ] ; then </div>
<div> echo "lttng create ${yy} after closing ${xx}" > ${log_yy}</div><div> lttng create psw-${yy} \</div><div> -o /home/pswuser/trace/psw-${yy} >> ${log_yy} 2>&1</div><div> sleep 1</div>
<div> echo "lttng enable_event" >> ${log_yy}</div><div> lttng enable-event -a -k -s psw-${yy} >> ${log_yy} 2>&1</div><div> sleep 1</div><div> echo "lttng start" >> ${log_yy}</div>
<div> lttng start psw-${yy} >> ${log_yy} 2>&1</div><div> fi</div><div># check zz</div><div># delete unless detected or detected2 is present</div><div> if [ ! -f /home/pswuser/bin/delay-detected ] &&</div>
<div> [ ! -f /home/pswuser/bin/delay-detected2 ] ; then</div><div> echo "lttng delete files" >> ${log_zz}</div><div> rm -rf /home/pswuser/trace/psw-${zz}* >> ${log_zz} 2>&1</div>
<div> #if [ $? -eq 0 ] ; then rm ${log_zz} ; fi</div><div> fi</div><div># delete second snapshot</div><div> if [ -f /home/pswuser/bin/delay-detected2 ] ; then</div><div> rm -f /home/pswuser/bin/delay-detected2</div>
<div><br></div><div># no more tracing once event detected </div><div> rm -f /home/pswuser/trace/run_trace</div><div> fi<br></div><div># trigger second snapshot delete first snapshot</div><div> if [ -f /home/pswuser/bin/delay-detected ] ; then</div>
<div> rm -f /home/pswuser/bin/delay-detected</div><div> touch /home/pswuser/bin/delay-detected2</div><div> fi</div><div><br></div><div># ls -l /lttng-traces/psw-*</div><div> # zz (delete candidate) is old xx </div>
<div> echo "xx ${xx} yy ${yy} zz ${zz}"</div><div> zz=${xx}</div><div> log_zz=$log_xx</div><div> </div><div> # xx is now the new one</div><div> xx=${yy}</div><div> log_xx=$log_yy</div><div><br></div><div style>
# check disk use</div><div> if [ -d /home/pswuser/trace ] ; then </div><div> du -h /home/pswuser/trace</div><div> fi</div><div><br></div><div><br></div><div>done</div><div>----------------------------------------</div>
</div><div style><br></div><div style>Any clues are welcome.</div><div style><br></div><div style>Another question.</div><div style><br></div><div style>Would it be possible to do this by setting up a ring buffer that will wrap around after a certain time. I can easily timestamp the event I am looking for and then capture the buffer containing the event.</div>
<div style><br></div><div style> </div><div style>Regards</div><div style> Phil Wilshire</div><div style><br></div></div></div>