<html><body><div style="font-family: arial, helvetica, sans-serif; font-size: 12pt; color: #000000"><div>Hi,<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Can you try:<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>- Reproducing with a more recent LTTng-UST/LTTng-Tools ? (e.g. 2.13). LTTng 2.7 is not supported anymore.<br data-mce-bogus="1"></div><div>- Try to reproduce with "per-pid" UST buffers rather than "per-uid",<br data-mce-bogus="1"></div><div>- Try to reproduce without the "tracefile rotation" mode (without --tracefile-count and --tracefile-size options to the channel).<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Also, do you happen to have traced applications which are frequently killed asynchronously<br data-mce-bogus="1"></div><div>while some of their threads are actively tracing by any chance ?<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Thanks,<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Mathieu<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div><span id="zwchr" data-marker="__DIVIDER__">----- On Dec 7, 2021, at 4:03 AM, lttng-dev <lttng-dev@lists.lttng.org> wrote:<br></span></div><div data-marker="__QUOTED_TEXT__"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div class="__aliyun_email_body_block"><div><span style="font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;color:#000000">Hi, Lttng-dev:</span></div><div><span style="font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;color:#000000"><br></span></div><div><span style="font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;color:#000000">  I found a strange problem related to lttng-consumed and ctf files recently. The ctf files belongs to some CPUs have been stopped rotating but other ctf files(belong to other CPUs) keeped working as usual. I am very sure all CPUs were producing spans all the time.</span></div><br><div>#date; ls -ltrh channel0_0_* |tail -n 3; date;ls -ltrh channel0_1_* |tail -n 3<br>Tue Dec  7 16:25:45 CST 2021<br>-rw-rw---- 1 root root 1.8M Dec  7 16:20 channel0_0_17<br>-rw-rw---- 1 root root 2.0M Dec  7 16:23 channel0_0_18<br>-rw-rw---- 1 root root 916K Dec  7 16:24 channel0_0_19<br>Tue Dec  7 16:25:45 CST 2021<br>-rw-rw---- 1 root root 1.7M Dec  6 00:30 channel0_1_8<br>-rw-rw---- 1 root root 1.9M Dec  6 00:31 channel0_1_9<br>-rw-rw---- 1 root root 388K Dec  6 00:32 channel0_1_10</div><br><div>    Notice that the ctf files with CPU0 (channel0_0_19) was modified at the time "now", but the ctf files with CPU1(channeo0_1_10) has been stopped working at Dec 6.</div><div>    I gdb lttng-consumerd(break at lib_ring_buffer_channel_do_read() and  lib_ring_buffer_poll_deliver()--I configured a read timer on channels). See the followings (<span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">lib_ring_buffer_poll_deliver() )</span></div><br><div>   0x00007f0f2418a445 <+213>: mov    0x0(%r13),%rcx                rcx:0x7f0f0402e610(handle->table)<br>   0x00007f0f2418a449 <+217>: mov    0x98(%rsi),%r9                rsi:0x7f0efb86f000(buf) r9:0x86c400000(consumed_old = buf->consumed)<br>   0x00007f0f2418a450 <+224>: lea    0x150(%rsi),%rdi              rdi:0x7f0efda75150<br>   0x00007f0f2418a457 <+231>: mov    0x150(%rsi),%rax              rax:3(cpu?)<br>   0x00007f0f2418a45e <+238>: mov    0x78(%rbp),%rdx               rbp:chan      rdx:0x1000000(chan->backend.buf_size)<br>   0x00007f0f2418a462 <+242>: mov    0x88(%rbp),%r8d               r8d:0x14(20)(chan->backend.buf_size_order)<br>   0x00007f0f2418a469 <+249>: cmp    %rax,0x8(%rcx)<br>   0x00007f0f2418a46d <+253>: jbe    0x7f0f2418a80c <lib_ring_buffer_channel_do_read+1180><br>   0x00007f0f2418a473 <+259>: shl    $0x6,%rax                     rax:192<br>   0x00007f0f2418a477 <+263>: sub    $0x1,%rdx                     rdx:0xffffff 16777215 (chan->backend.buf_size - 1)<br>   0x00007f0f2418a47b <+267>: lea    0x10(%rax,%rcx,1),%r10        rax:192 rcx:0x7f0f0402e610(handle->table) r10:0x7f0f0402e6e0<br>   0x00007f0f2418a480 <+272>: and    %r9,%rdx                      r9:0x8fdc00000 rdx:0xc00000 12582912  buf_trunc(consume_old, chan->backend.buf_size - 1)<br>   0x00007f0f2418a483 <+275>: mov    0x8(%rdi),%rax                rdi:0x7f0efda75150 rax:2688 ref_offset = (size_t) ref->offset<br>   0x00007f0f2418a487 <+279>: mov    %r8d,%ecx                     ecx:0x14(20)<br>   0x00007f0f2418a48a <+282>: shr    %cl,%rdx               (....) <br>   0x00007f0f2418a48d <+285>: shl    $0x7,%rdx<br>   0x00007f0f2418a491 <+289>: add    %rax,%rdx<br>   0x00007f0f2418a494 <+292>: lea    0x80(%rdx),%rax<br>   0x00007f0f2418a49b <+299>: cmp    0x28(%r10),%rax<br>   0x00007f0f2418a49f <+303>: ja     0x7f0f2418a80c <lib_ring_buffer_channel_do_read+1180><br>   0x00007f0f2418a4a5 <+309>: mov    0x20(%r10),%rax               r10:0x7f0f0402e6e0 rax:0x7f0efb86f000(buf)<br>   0x00007f0f2418a4a9 <+313>: add    %rdx,%rax                     rdx:3200  rax:0x7f0efb86fc80<br>   0x00007f0f2418a4ac <+316>: mov    (%rax),%rax                   rax:0x86c00000(commit_count!!!!!!Incremented _once_ at sb switch cc_sb)<br>   0x00007f0f2418a4af <+319>: mov    0x80(%rbp),%r8                r8:0x100000(chan->backend.subbuf_size)<br>   0x00007f0f2418a4b6 <+326>: mov    0x78(%rbp),%rdx               rdx:0x1000000(chan->backend.buf_size)<br>   0x00007f0f2418a4ba <+330>: mov    0x8c(%rbp),%ecx               ecx:4<br>   0x00007f0f2418a4c0 <+336>: mov    0x80(%rsi),%rdi               rdi:0x86d400000<br>   0x00007f0f2418a4c7 <+343>: sub    %r8,%rax                      rax:0x86b00000(commit_count - chan->backend.subbuf_size)<br>   0x00007f0f2418a4ca <+346>: and    0x8(%rbp),%rax                rax & chan->commit_count_mask = rax:0x86b00000<br>   0x00007f0f2418a4ce <+350>: neg    %rdx                          rdx:0x1000000(16M chan->backend.buf_size)  --> 0xffffffffff000000(-16777216)<br>   0x00007f0f2418a4d1 <+353>: and    %r9,%rdx                      r9:0x86c400000(consume_old)   rdx:0x86c000000<br>   0x00007f0f2418a4d4 <+356>: shr    %cl,%rdx                      cl:4                          rdx:0x86c00000<br>   0x00007f0f2418a4d7 <+359>: cmp    %rdx,%rax                     rax:0x86b0000 rdx:0x86c00000<br>   0x00007f0f2418a4da <+362>: jne    0x7f0f2418a411 <lib_ring_buffer_channel_do_read+161><br>   0x00007f0f2418a4e0 <+368>: mov    %r8,%rax</div><div>   </div><div>209 static inline<br>210 int lib_ring_buffer_poll_deliver(const struct lttng_ust_lib_ring_buffer_config *config,<br>211                  struct lttng_ust_lib_ring_buffer *buf,<br>212                      struct channel *chan,<br>213                  struct lttng_ust_shm_handle *handle)<br>214 {<!-- --><br>215     unsigned long consumed_old, consumed_idx, commit_count, write_offset;<br>216 <br>217     consumed_old = uatomic_read(&buf->consumed);<br>218     consumed_idx = subbuf_index(consumed_old, chan);<br>219     commit_count = v_read(config, &shmp_index(handle, buf->commit_cold, consumed_idx)->cc_sb);<br>220     /*<br>221      * No memory barrier here, since we are only interested<br>222      * in a statistically correct polling result. The next poll will<br>223      * get the data is we are racing. The mb() that ensures correct<br>224      * memory order is in get_subbuf.<br>225      */<br>226     write_offset = v_read(config, &buf->offset);<br>227 <br>228     /*<br>229      * Check that the subbuffer we are trying to consume has been<br>230      * already fully committed.<br>231      */<br>232 <br>233     if (((commit_count - chan->backend.subbuf_size)<br>234          & chan->commit_count_mask)<br>235         - (buf_trunc(consumed_old, chan)<br>236            >> chan->backend.num_subbuf_order)<br>237         != 0)<br>238         return 0;<br>239 <br>240     /*<br>241      * Check that we are not about to read the same subbuffer in<br>242      * which the writer head is.<br>243      */<br>244     if (subbuf_trunc(write_offset, chan) - subbuf_trunc(consumed_old, chan)<br>245         == 0)<br>246         return 0;<br>247 <br>248     return 1;<br>249 <br>250 }</div><br><div>      It seems that lib_ring_buffer_channel_do_read()-><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">lib_ring_buffer_poll_deliver() returned 0 at line 238 ($consume_old:0x86c400000, $commint_count:0x86c00000, $write_offset:0x86d400000).</span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline"><br></span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">Buffer type: per UID<br>Channels:<br>-------------<br>- channel0: [enabled]<br><br>    Attributes:<br>      overwrite mode: 0<br>      subbufers size: 1048576<br>      number of subbufers: 16<br>      switch timer interval: 20000000<br>      read timer interval: 60000000<br>      trace file count: 80<br>      trace file size (bytes): 2097152<br>      output: mmap()</span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline"><br></span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">     PS: Lttng version is 2.7(I know it is an very old version:) )</span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">     Do you have any ideas about this problem? Thanks in advance</span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline"><br></span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">Thanks a lot</span></div><div><span style="color:#000000;font-family:'tahoma' , 'arial' , 'stheiti' , 'simsun';font-size:14px;font-style:normal;font-weight:400;text-indent:0px;text-transform:none;background-color:#ffffff;float:none;display:inline">zhenyu.ren</span></div></div><br>_______________________________________________<br>lttng-dev mailing list<br>lttng-dev@lists.lttng.org<br>https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev<br></blockquote></div><div><br></div><div data-marker="__SIG_POST__">-- <br></div><div>Mathieu Desnoyers<br>EfficiOS Inc.<br>http://www.efficios.com</div></div></body></html>