[lttng-dev] Fast way to clear trace logs and continue tracing?

Wed Dec 14 23:30:13 UTC 2016

Hi Douglas,

On 2016-12-14 02:47 PM, Douglas Graham wrote:
> Thanks for your reply Jonathan,
>
> I don't remember ever having this problem before.  I posted my question
> to lttng-dev without being subscribed, so I did not get your reply in an email.
> I'm pasting it from the archives into an email this time, but I've just sent a
> Subscription request so next time hopefully I won't have this problem.

Probably my error, Thunderbird have a "Reply to mailing list" button.

>
>>> Suppose I have configured and started a session and have collected
>>> a bunch of traces in a trace file.  Now I want to clear those
>>> traces but then continue saving new traces using the same session.
>>> Is there a reasonably fast way to do that?
>> Normally the typical use case is the other way around, you work in
>> flight-recorder mode (not touching disk or network) and take "snapshot"
>> when needed.
>>
>> Could you explain what you want to achieve? What is the bigger picture?
> We are using lttng as the backend to a legacy logging system that
> previously ran on an RTOS. Application code generates logs by calling
> routines that now map to lttng tracepoints. The commands that have
> historically been used to view and clear these logs, and which we want
> to remain the same, are:
>
>     log read
>     log clear
>
> It's that "log clear" command that is problematic with lttng.
> It gets used a lot in our automated testing where we want to view
> only the logs generated by a particular test. eg:
>
>   while have more tests to run
>     run test
>     log read  # the test engine captures the output from this command
>     log clear
>
> If we can't clear the logs between tests, then we have to dump
> the entire huge logfile after each test, and that can be slow and
> can cause other issues as well.

Seems like a fair use case.

>
>>> What we currently do is "lttng save --output-path=x.cfg; lttng stop;
>>>         lttng destroy; rm -rf <tracedir>; lttng load --input-path=x.cfg
>> Are you saving/loading multiple sessions ? Probably not since you use
>> "lttng stop" and "lttng destroy directly".
> All of these operations are performed on only one session,
> "xcs_trace". The session name is passed to the save, stop, and
> destroy commands.  I left if off to try to keep things brief (which
> might have been a bad idea).

It's alright, I just needed to make sure of it.

>   The contents of the xcs_trace.lttng
> file generated by the save command are at the end of this email.
>
>> You systematically remove the traces ? When does this get executed
>> (cron, user action, trace size)?
>> When does the trace data become important? When do you want to keep the
>> trace data?
> Hopefully, I've answered this above.  Our test engine use "log read" and "log clear"
> a lot, but these are also frequently used manually.  For example, a developer might
> boot up the system, which generates a lot of logs, but then clear those bootup logs
> so that any subsequent logs generated by manual testing are easier to get to without
> scrolling through thousands of lines of bootup logs.
>
>> Is it a requirement that the new session have the same name and/or the
>> same output path?
> Hmmm. I think the only real requirement is that any newly generated
> logs become visible via subsequent "log read" commands.  If we
> can do that with a different session name and/or output path,
> that would probably be sufficient.

Since the culprit seems to really be the load command the solution I had 
in mind does not apply here.

>
>>> But this is very slow in some cases.  The lttng load in particular is
>>> extremely slow when Linux is running inside a virtual machine.
>> I will test that. Not sure what would cause the load command to be slow
>> inside a vm particularly.
>> The " lttng stop" and "rm -rf" are most probably taking more time than
>> the load command.
> It's definitely the load command that is the killer in our
> environment.  I'd send you output from the time command, except that
> time inside our VM is pretty much meaningless because it's an event
> driven simulation where time runs fast when there is little to do
> and slow when there is lots to do.  This is an ARM instruction set
> simulator that does run fairly slowly on our x86_64 machines, but the
> "log clear" command stands out as running considerably more slowly
> than other commands, especially given that conceptually at least,
> it's not doing very much.  Overall, our existing implementation of
> "log clear" takes about 25 seconds, and 20 seconds of that (give
> or take) is from the lttng load command.

Big problem here. I'll take a look a how things get done in the load 
command.

Did you try replicating the session setup via the lttng command without 
using the load command?

I expect it the be pretty much the same performance in the end but it 
could be a base comparison.
It could also help identify the culprit, if any, inside the load command 
(file access, xml parsing, internal lttng communication, etc.) or the 
setup action responsible (create, enable-event, etc);

>
>> Could you give us more information regarding the configuration you are
>> loading? Number of channels, enabled events, type of session (live, to
>> disk, streaming).
> I hope the answers are all contained within the xcs_trace.lttng file included below.
>
>> So far I do not see any dumb questions. :P
> That's a relief :-)
>
> Here's the contents of xcs_trace.lttng, the file saved by the "lttng save"
> command:
>
> ======== begin xcs_trace.lttng ========
> <?xml version="1.0" encoding="UTF-8"?>
> <sessions>
>      <session>
>          <name>xcs_trace</name>
>          <shared_memory_path>/var/log/lttng/shm/current/xcs_trace-20000101-000000</shared_memory_path>
>          <domains>
>              <domain>
>                  <type>UST</type>
>                  <buffer_type>PER_UID</buffer_type>
>                  <channels>
>                      <channel>
>                          <name>ch1</name>
>                          <enabled>true</enabled>
>                          <overwrite_mode>OVERWRITE</overwrite_mode>
>                          <subbuffer_size>131072</subbuffer_size>
>                          <subbuffer_count>16</subbuffer_count>
>                          <switch_timer_interval>0</switch_timer_interval>
>                          <read_timer_interval>0</read_timer_interval>
>                          <output_type>MMAP</output_type>
>                          <tracefile_size>0</tracefile_size>
>                          <tracefile_count>0</tracefile_count>
>                          <live_timer_interval>0</live_timer_interval>
>                          <events>
>                              <event>
>                                  <name>*</name>
>                                  <enabled>true</enabled>
>                                  <type>TRACEPOINT</type>
>                                  <loglevel_type>RANGE</loglevel_type>
>                                  <loglevel>6</loglevel>
>                              </event>
>                              <event>
>                                  <name>com_ericsson_tri*</name>
>                                  <enabled>true</enabled>
>                                  <type>TRACEPOINT</type>
>                                  <loglevel_type>RANGE</loglevel_type>
>                                  <loglevel>14</loglevel>
>                              </event>
>                          </events>
>                          <contexts/>
>                      </channel>
>                  </channels>
>                  <trackers/>
>              </domain>
>              <domain>
>                  <type>JUL</type>
>                  <buffer_type>PER_UID</buffer_type>
>                  <channels/>
>              </domain>
>              <domain>
>                  <type>LOG4J</type>
>                  <buffer_type>PER_UID</buffer_type>
>                  <channels/>
>              </domain>
>              <domain>
>                  <type>PYTHON</type>
>                  <buffer_type>PER_UID</buffer_type>
>                  <channels/>
>              </domain>
>          </domains>
>          <started>true</started>
>          <attributes>
>              <snapshot_mode>true</snapshot_mode>
>          </attributes>
>          <output>
>              <snapshot_outputs>
>                  <output>
>                      <name>snapshot-1</name>
>                      <max_size>0</max_size>
>                      <consumer_output>
>                          <enabled>true</enabled>
>                          <destination>
>                              <path>/pramfs/log/lttng/snapshot</path>
>                          </destination>
>                      </consumer_output>
>                  </output>
>              </snapshot_outputs>
>          </output>
>      </session>
> </sessions>
> ========== end xcs_trace.lttng ==============

Since you are using a session in snapshot mode a "clear buffer" action 
could help your use case.

Allowing manual flushing of buffers for snapshot sessions might be 
something worth investigating.

Still the previous idea you had regarding the use of the trace file 
rotation feature triggered manually would fit better in the end.
You would be able to use a regular session to achieve the same result.

Cheers

>
> Regards,
> Doug
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Jonathan R. Julien
Efficios