[ltt-dev] Unable to stop trace when started with -a option

Dany Madden dany.r.madden at gmail.com
Fri Jul 24 16:17:08 EDT 2009


Hi Mathieu,

I tested your change and trace destruction still hang as you said. I
looked at lttd and and it is handling error from splice. But the
problem is that splice is not returning for lttd to handle the return
code, right? So, I am not sure how lttd's error handling is relavant
here.

Maybe I am not understanding what you are saying.

Dany

On 7/23/09, Mathieu Desnoyers <compudj at krystal.dyndns.org> wrote:
> * Mathieu Desnoyers (compudj at krystal.dyndns.org) wrote:
>> * Dany Madden (dany.r.madden at gmail.com) wrote:
>> [...]
>>
>
> Can you have a look at ltt-control 0.69 ? I fixed the O_APPEND issue by
> using lseek SEEK_END instead. I did not modify the error handling, so
> other splice errors could possibly still cause trace destruction to hang.
>
> Thanks,
>
> Mathieu
>
>> OK, part of the problem is lttd which should not use O_APPEND, because
>> it is not supported by splice(). The other part of the problem is that
>> lttd seems to exit from the splice error leaving a reference to the
>> subbuffers. e.g.
>>
>> ltt-armall
>>
>> #First trace
>> lttctl -C trace
>> lttd -t /tmp/trace -c /mnt/debugfs/ltt/trace -v &
>> #works ok
>> sleep 5
>> lttctl -D trace
>>
>> #Second trace
>> lttctl -C trace
>> lttd -t /tmp/trace -c /mnt/debugfs/ltt/trace -v -a
>>
>> Gives:
>>
>> Appending to file /tmp/trace/metadata_1 as requested
>> Channel file : /mnt/debugfs/ltt/trace/metadata_0
>> Opening file.
>> Appending to file /tmp/trace/metadata_0 as requested
>> Data received
>> Normal read on fd 5
>> cookie : 0
>> splice chan to pipe offset 0
>> splice chan to pipe ret 65536
>> splice pipe to file -1
>> Error in file splice: Invalid argument
>> Data received
>> Normal read on fd 325
>> cookie : 0
>> splice chan to pipe offset 0
>>
>> Which matches:
>>
>> fs/splice.c:
>>
>> static long do_splice_from(struct pipe_inode_info *pipe, struct file *out,
>>                            loff_t *ppos, size_t len, unsigned int flags)
>> ...
>>         if (unlikely(out->f_flags & O_APPEND))
>>                 return -EINVAL;
>>
>> Can you have a closer look at the lttd error handling for splice errors?
>> Once we fix this issue to handle it gracefully, we'll have to find
>> another way to support file append without O_APPEND.
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>> > Dany
>> >
>> > On Tue, Jul 21, 2009 at 5:07 PM, Mathieu Desnoyers <
>> > compudj at krystal.dyndns.org> wrote:
>> >
>> > > * Dany Madden (dany.r.madden at gmail.com) wrote:
>> > > > Hello,
>> > > >
>> > > > I wanted to follow up on my post from last week. Is this a bug in
>> > > > lttctl
>> > > or
>> > > > is there something wrong with the way I am calling lttctl? Any
>> > > suggestions
>> > > > on how to get to the bottom of this issue is much appreciated.
>> > > >
>> > >
>> > > Hrm, this problem still applies to the current LTTng.
>> > >
>> > > I'd bet splice() has problems with O_APPEND. Can you have a look into
>> > > this ? It could be caused by the splice actor implementation found in
>> > > LTTng.
>> > >
>> > > Thanks,
>> > >
>> > > Mathieu
>> > >
>> > > > Thanks,
>> > > > Dany
>> > > >
>> > > > On Thu, Jul 16, 2009 at 8:25 AM, Dany Madden
>> > > > <dany.r.madden at gmail.com
>> > > >wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > When I started a trace with the -a option, destroying the trace
>> > > > > later
>> > > would
>> > > > > hang. These are the ltt versions that I am using:
>> > > > >
>> > > > > lttv: 0.12.7
>> > > > > lttng: 0.72 (patched in sles11 2.6.27.19 kernel)
>> > > > > lttcontrol: 0.64
>> > > > >
>> > > > > It appears that lttctl is waiting on lttd to die, but lttd is
>> > > > > waiting
>> > > for
>> > > > > something else to finish. Please see sysrq-t output below.
>> > > > >
>> > > > > Restart tracing with append option -a
>> > > > >
>> > > > > ------------------------------------------------------
>> > > > >
>> > > > > punetest3:~ # lttctl -C -w /tmp/trace1 trace2 -a
>> > > > > Linux Trace Toolkit Trace Control 0.64-14012009
>> > > > >
>> > > > > Controlling trace : trace2
>> > > > >
>> > > > > lttctl: Creating trace
>> > > > >
>> > > > > lttctl: Forking lttd
>> > > > > Linux Trace Toolkit Trace Daemon 0.64-14012009
>> > > > >
>> > > > > Reading from debugfs directory : /sys/kernel/debug/ltt/trace2
>> > > > > Writing to trace directory : /tmp/trace1
>> > > > >
>> > > > > lttctl: Starting trace
>> > > > > punetest3:~ # echo $?
>> > > > >
>> > > > > 0
>> > > > >
>> > > > > Stop tracing hangs
>> > > > > ----------------------------------------
>> > > > > punetest3:~ # lttctl -D trace1
>> > > > > Linux Trace Toolkit Trace Control 0.64-14012009
>> > > > >
>> > > > > Controlling trace : trace1
>> > > > >
>> > > > > lttctl: Pausing trace
>> > > > >
>> > > > > lttctl: Destroying trace
>> > > > > ^C
>> > > > >
>> > > > > ---> Ran it thru gdb
>> > > > >
>> > > > > Program received signal SIGINT, Interrupt.
>> > > > > 0xffffe430 in __kernel_vsyscall ()
>> > > > > (gdb) bt
>> > > > > #0  0xffffe430 in __kernel_vsyscall ()
>> > > > > #1  0xb7f34bb3 in write () from /lib/libc.so.6
>> > > > >
>> > > > > #2  0xb7fd013c in lttctl_sendop (fname=0xbfffe5cc
>> > > > > "/sys/kernel/debug/ltt/destroy_trace", op=0xbffff843 "trace3") at
>> > > > > liblttctl.c:120
>> > > > > #3  0xb7fd0ecb in lttctl_destroy_trace (name=0xbffff843 "trace3")
>> > > > > at
>> > > > >
>> > > > > liblttctl.c:284
>> > > > > #4  0x080498d6 in main (argc=1667330676, argv=0x4c003365) at
>> > > lttctl.c:823
>> > > > >
>> > > > > --> It hangs in lttctl_sendop() while doing write.
>> > > > >
>> > > > > lttctl_sendop (fname=0xbfffe5bc
>> > > > > "/sys/kernel/debug/ltt/destroy_trace",
>> > > > >
>> > > > > op=0xbffff83c "trace7")
>> > > > >     at liblttctl.c:108
>> > > > > 108             if (!fname) {
>> > > > > (gdb)
>> > > > > 105     {
>> > > > > (gdb)
>> > > > > 108             if (!fname) {
>> > > > > (gdb)
>> > > > > 113             fd = open(fname, O_WRONLY);
>> > > > > (gdb)
>> > > > >
>> > > > > 114             if (fd == -1) {
>> > > > > (gdb)
>> > > > > 113             fd = open(fname, O_WRONLY);
>> > > > > (gdb)
>> > > > > 114             if (fd == -1) {
>> > > > > (gdb)
>> > > > > 120             if (write(fd, op, strlen(op)) == -1) {
>> > > > >  (gdb) p fd
>> > > > > $3 = <value optimized out>
>> > > > >
>> > > > > (gdb) p op
>> > > > > $4 = 0xbffff83c "trace7"
>> > > > > (gdb) p strlen(op)
>> > > > > $5 = 6
>> > > > > (gdb) c
>> > > > > Continuing.
>> > > > >
>> > > > > ----> Output from sysrq-t
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel: lttd          S 00000000     0
>> > > > > 3979
>> > >    1
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        f6373e20 00000086 0000000c
>> > > 00000000
>> > > > > 00000000 00013e36 c05bd67c c05c0380
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        c05c0380 c05c0380 c05c0380
>> > > f785f2c0
>> > > > > f785f534 c2225380 00000002 c017f8b0
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        b7e8b000 b7e8bbd8 f6373ea8
>> > > f785f534
>> > > > > ffffef5d c05521d8 00000001 00000000
>> > > > > Jul 15 17:47:27 elm3b36 kernel: Call Trace:
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c0147cd9>]
>> > > > > futex_wait+0x207/0x3d0
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c01490e5>] do_futex+0x7a/0x8c9
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c0149a33>] sys_futex+0xff/0x112
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c0104de2>] syscall_call+0x7/0xb
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<ffffe430>] 0xffffe430
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  =======================
>> > > > > Jul 15 17:47:27 elm3b36 kernel: lttd          S c1000000     0
>> > > > > 3980
>> > >    1
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        f6375dc0 00000086 0007f4a0
>> > > c1000000
>> > > > >
>> > > > > b7ff1c40 00011668 c05bd67c c05c0380
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        c05c0380 c05c0380 c05c0380
>> > > f7bb62b0
>> > > > > f7bb6524 c221b380 00000001 00000000
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        f6f60848 f6354b7c 00000000
>> > > f7bb6524
>> > > > >
>> > > > > 000005b7 00000004 00000004 00000000
>> > > > > Jul 15 17:47:27 elm3b36 kernel: Call Trace:
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c01989e3>] pipe_wait+0x50/0x6e
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c01afe80>]
>> > > splice_to_pipe+0x164/0x1e9
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c023db08>]
>> > > subbuf_splice_actor+0x167/0x172
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c023db40>]
>> > > > > ltt_relay_file_splice_read+0x2d/0x6b
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c01af28c>]
>> > > > > do_splice_to+0x56/0x64
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c01af419>]
>> > > > > sys_splice+0x17f/0x1de
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c0104ceb>]
>> > > sysenter_do_call+0x12/0x2f
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<ffffe430>] 0xffffe430
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  =======================
>> > > > > Jul 15 17:47:27 elm3b36 kernel: lttctl        S 00000086     0
>> > > > > 3981
>> > > 3869
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        f6c03d10 00000082 00000001
>> > > 00000086
>> > > > > f6375c7c 0001f77c c05bd67c c05c0380
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        c05c0380 c05c0380 c05c0380
>> > > f74dc3b0
>> > > > > f74dc624 c222f380 00000003 f624f790
>> > > > > Jul 15 17:47:27 elm3b36 kernel:        00000282 00000003 f624f790
>> > > f74dc624
>> > > > > 000005b3 c05ce964 f79b4c00 00000000
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel: Call Trace:
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c023bbdc>]
>> > > __ltt_trace_destroy+0xdc/0xfb
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c023be4d>]
>> > > ltt_trace_destroy+0x2d/0x77
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<f89cd070>]
>> > > destroy_trace_write+0x70/0x13b
>> > > > >
>> > > > > [ltt_trace_control]
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c019337c>] vfs_write+0x8b/0x13d
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c0193526>] sys_write+0x45/0x94
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<c0104de2>] syscall_call+0x7/0xb
>> > > > >
>> > > > > Jul 15 17:47:27 elm3b36 kernel:  [<ffffe430>] 0xffffe430
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > -----
>> > > > > ~ Dany Madden
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -----
>> > > > ~ Dany Madden
>> > >
>> > > > _______________________________________________
>> > > > ltt-dev mailing list
>> > > > ltt-dev at lists.casi.polymtl.ca
>> > > > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>> > >
>> > >
>> > > --
>> > > Mathieu Desnoyers
>> > > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE
>> > > 9A68
>> > >
>> >
>> >
>> >
>> > --
>> > -----
>> > ~ Dany Madden
>>
>> --
>> Mathieu Desnoyers
>> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE
>> 9A68
>>
>> _______________________________________________
>> ltt-dev mailing list
>> ltt-dev at lists.casi.polymtl.ca
>> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>>
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
>


-- 
-----
~ Dany Madden




More information about the lttng-dev mailing list