[lttng-dev] Relayd trace drops
Aravind HT
aravind.ht at gmail.com
Wed Dec 9 04:34:59 EST 2015
Sorry about that, not sure how it got missed.
Here it is.
# This needs a two node set up (1. local current node 2. remote node )
# relayd runs on the current node where traces are captured from the remote
node
# remote node runs test applications which generate traces.
# the launch_script_RN is executed on the current node and uses ssh to
execute commands on the remote node. So this part may not work in every
case and may prompt for a password.
# if experiencing problems with ssh , kindly check
http://serverfault.com/questions/241588/how-to-automate-ssh-login-with-password
# ==================== To Run =============================
launch_script_RN.py self_profile -c /tmp/configFile.txt
# configFile.txt is the file which has configuration params that
launchScript
# needs to configure lttng sessions. Below is an explanation of the
different options.
# =================== configFile.txt =============================
[section1]
# final out put file path
OutputFile = /tmp/Final_report.txt
# the remote node hostname on which test applications run and the test
sessions will be created; this should be something that could be used with
ssh. Traces will be transported from this node to the lttng_relayd running
on the current node.
Node = MY_REMOTE_NODE
# Sub buffer size to start this with
SubBufSize = 16k
# Sub buffer count
SubBufCount = 4
# per uid buffer
BufferScheme = --buffers-uid
# yes
EnableTracing = yes
# Bit rate of the test applications. Comman seperated example "1, 3, 3, 50"
sayss 4 test applications producing 1, 3, 3, and 50 Kb/s traces.
# So with the below, we just start with 1 test application producing 10 kb/s
TestApps = 10
# session life time in seonds
TestTime = 10
# Max number of successive sessions to configure. if n then n-1 sessions
are run, ex MaxRun = 2 will run 1 session.
MaxRun = 100
# ==================== Place the following files under ===============
# /tmp on the remote node
clean_RemNode_apps.sh
report_lttng_script.sh
# rest of the scripts under /usr/sbin on the current local node on which
lttng_realyd runs
# Define a trace point MY_TRACE to take a single string arg with
LOG_TEST_APP_PROFILING as the provider, compile test lttng_testApp and
place it under /usr/sbin of the remote host
# in launch_script_RN.py change currentNodeIP to the IP address on which
relayd is receiving, default ports are used.
# lttng_relayd is started as
/usr/bin/lttng-relayd -o /var/log/lttng-traces -d
# lttng_sessiond is started as
/usr/bin/lttng-sessiond --consumerd32-path
/usr/lib/lttng/libexec/lttng-consumerd --consumerd32-libdir /usr/lib/
--consumerd64-path /usr/lib64/lttng/libexec/lttng-consumerd
--consumerd64-libdir /usr/lib64/ -b --no-kernel
Regards,
Aravind.
On Wed, Dec 9, 2015 at 12:57 AM, Jonathan Rajotte Julien <
Jonathan.rajotte-julien at efficios.com> wrote:
> Hi Aravind,
>
> There is no README in the archive you sent.
>
> Cheers
>
> On 2015-12-08 07:51 AM, Aravind HT wrote:
>
>> Hi,
>>
>> I am trying to upgrade in parallel, but this issue may still be present
>> after I upgrade or may be temporarily masked. So I need to find the root
>> cause for this and then see if its available on the latest before
>> committing to upgrade.
>>
>> There is another issue i'm hitting, the lttng list command hangs after
>> lttng destroy session when running the profiling.
>>
>> I found that consumerd 64 goes into an infinite loop waiting to flush
>> metadata in lttng_ustconsumer_recv_metadata() :: while
>> (consumer_metadata_cache_flushed(channel, offset + len, timer)) .
>> In consumer_metadata_cache, channel->metadata_stream->endpoint_status is
>> CONSUMER_ENDPOINT_ACTIVE, metadata_stream->ust_metadata_pushed is 0 with
>> offset having some value. This call always returns a 1 from the last else{}
>> block resulting in an infinite loop. Upon searching the forum, I found the
>> same issue being reported here :
>> https://www.mail-archive.com/lttng-dev@lists.lttng.org/msg07982.html
>>
>> Regards,
>> Aravind.
>>
>>
>> On Tue, Dec 8, 2015 at 12:43 AM, Jonathan Rajotte <
>> jonathan.r.julien at gmail.com> wrote:
>>
>> Hi Aravind,
>>
>> Did you have the chance to upgrade to 2.6.1.If so where you able
>> to reproduce?
>>
>> Cheers
>>
>> On Mon, Dec 7, 2015 at 1:07 PM, Aravind HT <aravind.ht at gmail.com>
>> wrote:
>>
>> Hi,
>>
>> I have attached the complete profiling scripts here, its a bit
>> shabby, im new to python.
>>
>> There is a README which has the details on how to execute it.
>> Im using a Yocto 1.6 on x86_64 platforms on both the nodes.
>>
>>
>> Running this script when there are other sessions running
>> seems to reproduce this problem easily.
>> Please try it and let me know if you have any issues
>> reproducing the problem.
>>
>> Regards,
>> Aravind.
>>
>> On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau
>> <jeremie.galarneau at efficios.com
>> <mailto:jeremie.galarneau at efficios.com>> wrote:
>>
>> On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT
>> <aravind.ht at gmail.com <mailto:aravind.ht at gmail.com>> wrote:
>> > I am using 2.6.0 .I will try to share the code that I'm
>> using here in some
>> > time. If there are any specific fixes that are relevant
>> to this issue, see
>> > if you can provide a link to them. I would ideally like
>> to try them out
>> > before trying a full upgrade to the latest versions.
>>
>> Hi,
>>
>> Can you provide more information on the system? Which
>> distribution,
>> architecture, kernel version?
>>
>> The verbose sessiond logs might help pinpoint any
>> unexpected behaviour
>> here (are all applications registering as expected?).
>>
>> Jérémie
>>
>> >
>> > On Fri, Dec 4, 2015 at 6:11 PM, Jérémie Galarneau
>> > <jeremie.galarneau at efficios.com
>> <mailto:jeremie.galarneau at efficios.com>> wrote:
>> >>
>> >> Hi Aravind,
>> >>
>> >> Can't say I have looked at everything you sent yet, but
>> as a
>> >> preemptive question, which version are we talking about
>> here? 2.6.0 or
>> >> 2.6.1? 2.6.1 contains a lot of relay daemon fixes.
>> >>
>> >> Thanks,
>> >> Jérémie
>> >>
>> >> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT
>> <aravind.ht at gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I am trying to obtain the performance characteristics
>> of lttng with the
>> >> > use
>> >> > of test applications. Traces are being produced on a
>> local node and
>> >> > delivered to relayd that is running on a separate
>> node for storage.
>> >> >
>> >> > An lttng session with the test applications producing
>> an initial bit
>> >> > rate of
>> >> > 10 kb/s is started and run for about 30 seconds. The
>> starting sub-buffer
>> >> > size is kept at 128 kb and sub-buf count at 4. The
>> session is then
>> >> > stopped
>> >> > and destroyed and traces are analyzed to see if there
>> are any drops.
>> >> > This is
>> >> > being done in a loop with every subsequent session
>> having an increment
>> >> > of 2
>> >> > kb/s as long as there are no drops. If there are
>> drops, I increase the
>> >> > buffer size by a factor of x2 without incrementing
>> the bit rate.
>> >> >
>> >> > I see trace drops happening consistently with test
>> apps producing traces
>> >> > at
>> >> > less than 40 kb/s, it doesnt seem to help even if I
>> started with 1mb x 4
>> >> > sub-buffers.
>> >> >
>> >> > Analysis :
>> >> >
>> >> > I have attached the lttng_relayd , lttng_consumerd_64
>> logs and the
>> >> > entire
>> >> > trace directory, hope you will be able to view it.
>> >> > I have modified lttng_relayd code to dump the traces
>> being captured in
>> >> > the
>> >> > lttng_relayd logs along with debug info.
>> >> >
>> >> > Each test app is producing logs in the form of :
>> >> > "TraceApp PID - 31940 THID - 31970 @threadRate - 1032
>> b/s appRate - 2079
>> >> > b/s
>> >> > threadTraceNum - 9 appTraceNum - 18 sleepTime - 192120"
>> >> >
>> >> > The test application PID, test application thread id,
>> thread bit rate,
>> >> > test
>> >> > app bit rate, thread trace number and application
>> trace number s are
>> >> > part of
>> >> > the trace. So in the above trace, the thread is
>> producing at 1 kb/s and
>> >> > the
>> >> > whole test app is producing at 2 kb/s.
>> >> >
>> >> > If we look at the babeltrace out put, we see that the
>> Trace with
>> >> > TraceApp
>> >> > PID - 31940 appTraceNum 2 is missing , with 1, 3, 4,
>> 5 and so on being
>> >> > successfully captured.
>> >> > I looked at the lttng_relayd logs and found that
>> trace of "appTraceNum
>> >> > 2" is
>> >> > not delivered/generated by the consumerd to the
>> relayd in sequence with
>> >> > other traces. To rule out that this is not a test
>> application problem,
>> >> > you
>> >> > can look at line ltttng_relayd log : 12778 and see
>> traces from
>> >> > appTraceNum -
>> >> > 1 to appTraceNum - 18 including the appTraceNum 2 are
>> "re-delivered" by
>> >> > the
>> >> > consumerd to the relayd.
>> >> > Essentially, I see appTraceNum 1 through appTraceNum
>> 18 being delivered
>> >> > twice, once individually where appTraceNum 2 is
>> missing and once as a
>> >> > group
>> >> > at line 12778 where its present.
>> >> >
>> >> >
>> >> > Request help with
>> >> > 1. why traces are delivered twice, is it by design or
>> a genuine problem
>> >> > ?
>> >> > 2. how to avoid traces being dropped even though
>> buffers are
>> >> > sufficiently
>> >> > large enough ?
>> >> >
>> >> >
>> >> > Regards,
>> >> > Aravind.
>> >> >
>> >> > _______________________________________________
>> >> > lttng-dev mailing list
>> >> > lttng-dev at lists.lttng.org
>> >> >
>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jérémie Galarneau
>> >> EfficiOS Inc.
>> >> http://www.efficios.com
>> >
>> >
>>
>>
>>
>> --
>> Jérémie Galarneau
>> EfficiOS Inc.
>> http://www.efficios.com
>>
>>
>>
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org <mailto:lttng-dev at lists.lttng.org>
>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>>
>>
>>
>> -- Jonathan Rajotte Julien
>>
>>
>>
>>
>> _______________________________________________
>> lttng-dev mailing list
>> lttng-dev at lists.lttng.org
>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>>
>
> --
> Jonathan R. Julien
> Efficios
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lttng.org/pipermail/lttng-dev/attachments/20151209/73afc96e/attachment-0001.html>
More information about the lttng-dev
mailing list