<div dir="ltr">Hi,<div><br></div><div>Did you get a chance to reproduce the problem with the scripts ? Let me know if you need any help running it.</div><div><br></div><div>Regards,</div><div>Aravind.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 9, 2015 at 3:04 PM, Aravind HT <span dir="ltr"><<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Sorry about that, not sure how it got missed.<div>Here it is.</div><div><br></div><div><br></div><div><div># This needs a two node set up (1. local current node 2. remote node )</div><div># relayd runs on the current node where traces are captured from the remote node</div><div># remote node runs test applications which generate traces.</div><div># the launch_script_RN is executed on the current node and uses ssh to execute commands on the remote node. So this part may not work in every case and may prompt for a password.</div><div># if experiencing problems with ssh , kindly check <a href="http://serverfault.com/questions/241588/how-to-automate-ssh-login-with-password" target="_blank">http://serverfault.com/questions/241588/how-to-automate-ssh-login-with-password</a></div><div><br></div><div># ==================== To Run =============================</div><div>launch_script_RN.py self_profile -c /tmp/configFile.txt</div><div><br></div><div><br></div><div><br></div><div># configFile.txt is the file which has configuration params that launchScript</div><div># needs to configure lttng sessions. Below is an explanation of the different options.</div><div># =================== configFile.txt =============================</div><div><br></div><div>[section1]</div><div># final out put file path</div><div>OutputFile = /tmp/Final_report.txt</div><div># the remote node hostname on which test applications run and the test sessions will be created; this should be something that could be used with ssh. Traces will be transported from this node to the lttng_relayd running on the current node.</div><div>Node = MY_REMOTE_NODE</div><div># Sub buffer size to start this with</div><div>SubBufSize = 16k</div><div># Sub buffer count</div><div>SubBufCount = 4</div><div># per uid buffer</div><div>BufferScheme = --buffers-uid</div><div># yes</div><div>EnableTracing = yes</div><div># Bit rate of the test applications. Comman seperated example "1, 3, 3, 50" sayss 4 test applications producing 1, 3, 3, and 50 Kb/s traces.</div><div># So with the below, we just start with 1 test application producing 10 kb/s</div><div>TestApps = 10</div><div># session life time in seonds</div><div>TestTime = 10</div><div># Max number of successive sessions to configure. if n then n-1 sessions are run, ex MaxRun = 2 will run 1 session.</div><div>MaxRun = 100</div><div><br></div><div><br></div><div># ==================== Place the following files under ===============</div><div><br></div><div># /tmp on the remote node</div><div>clean_RemNode_apps.sh</div><div>report_lttng_script.sh</div><div><br></div><div># rest of the scripts under /usr/sbin on the current local node on which lttng_realyd runs</div><div># Define a trace point MY_TRACE to take a single string arg with LOG_TEST_APP_PROFILING as the provider, compile test lttng_testApp and place it under /usr/sbin of the remote host</div><div><br></div><div># in launch_script_RN.py change currentNodeIP to the IP address on which relayd is receiving, default ports are used.<br></div><div><br></div><div># lttng_relayd is started as</div><div>/usr/bin/lttng-relayd -o /var/log/lttng-traces -d</div><div><br></div><div># lttng_sessiond is started as</div><div>/usr/bin/lttng-sessiond --consumerd32-path /usr/lib/lttng/libexec/lttng-consumerd --consumerd32-libdir /usr/lib/ --consumerd64-path /usr/lib64/lttng/libexec/lttng-consumerd --consumerd64-libdir /usr/lib64/ -b --no-kernel</div><div><br></div><div><br></div></div><div><br></div><div><br></div><div>Regards,</div><div>Aravind.</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 9, 2015 at 12:57 AM, Jonathan Rajotte Julien <span dir="ltr"><<a href="mailto:Jonathan.rajotte-julien@efficios.com" target="_blank">Jonathan.rajotte-julien@efficios.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Aravind,<br>
<br>
There is no README in the archive you sent.<br>
<br>
Cheers<span><br>
<br>
On 2015-12-08 07:51 AM, Aravind HT wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br><div><div>
I am trying to upgrade in parallel, but this issue may still be present after I upgrade or may be temporarily masked. So I need to find the root cause for this and then see if its available on the latest before committing to upgrade.<br>
<br>
There is another issue i'm hitting, the lttng list command hangs after lttng destroy session when running the profiling.<br>
<br>
I found that consumerd 64 goes into an infinite loop waiting to flush metadata in lttng_ustconsumer_recv_metadata() :: while (consumer_metadata_cache_flushed(channel, offset + len, timer)) .<br>
In consumer_metadata_cache, channel->metadata_stream->endpoint_status is CONSUMER_ENDPOINT_ACTIVE, metadata_stream->ust_metadata_pushed is 0 with offset having some value. This call always returns a 1 from the last else{} block resulting in an infinite loop. Upon searching the forum, I found the same issue being reported here :<br>
<a href="https://www.mail-archive.com/lttng-dev@lists.lttng.org/msg07982.html" rel="noreferrer" target="_blank">https://www.mail-archive.com/lttng-dev@lists.lttng.org/msg07982.html</a><br>
<br>
Regards,<br>
Aravind.<br>
<br>
<br>
On Tue, Dec 8, 2015 at 12:43 AM, Jonathan Rajotte <<a href="mailto:jonathan.r.julien@gmail.com" target="_blank">jonathan.r.julien@gmail.com</a>> wrote:<br>
<br>
Hi Aravind,<br>
<br>
Did you have the chance to upgrade to 2.6.1.If so where you able<br>
to reproduce?<br>
<br>
Cheers<br>
<br>
On Mon, Dec 7, 2015 at 1:07 PM, Aravind HT <<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>><br>
wrote:<br>
<br>
Hi,<br>
<br>
I have attached the complete profiling scripts here, its a bit<br>
shabby, im new to python.<br>
<br>
There is a README which has the details on how to execute it.<br>
Im using a Yocto 1.6 on x86_64 platforms on both the nodes.<br>
<br>
<br>
Running this script when there are other sessions running<br>
seems to reproduce this problem easily.<br>
Please try it and let me know if you have any issues<br>
reproducing the problem.<br>
<br>
Regards,<br>
Aravind.<br>
<br>
On Sat, Dec 5, 2015 at 5:23 PM, Jérémie Galarneau<br>
<<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a><br></div></div><span>
<mailto:<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a>>> wrote:<br>
<br>
On Fri, Dec 4, 2015 at 11:06 PM, Aravind HT<br></span><span>
<<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a> <mailto:<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>>> wrote:<br>
> I am using 2.6.0 .I will try to share the code that I'm<br>
using here in some<br>
> time. If there are any specific fixes that are relevant<br>
to this issue, see<br>
> if you can provide a link to them. I would ideally like<br>
to try them out<br>
> before trying a full upgrade to the latest versions.<br>
<br>
Hi,<br>
<br>
Can you provide more information on the system? Which<br>
distribution,<br>
architecture, kernel version?<br>
<br>
The verbose sessiond logs might help pinpoint any<br>
unexpected behaviour<br>
here (are all applications registering as expected?).<br>
<br>
Jérémie<br>
<br>
><br>
> On Fri, Dec 4, 2015 at 6:11 PM, Jérémie Galarneau<br>
> <<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a><br></span><div><div>
<mailto:<a href="mailto:jeremie.galarneau@efficios.com" target="_blank">jeremie.galarneau@efficios.com</a>>> wrote:<br>
>><br>
>> Hi Aravind,<br>
>><br>
>> Can't say I have looked at everything you sent yet, but<br>
as a<br>
>> preemptive question, which version are we talking about<br>
here? 2.6.0 or<br>
>> 2.6.1? 2.6.1 contains a lot of relay daemon fixes.<br>
>><br>
>> Thanks,<br>
>> Jérémie<br>
>><br>
>> On Thu, Dec 3, 2015 at 7:01 AM, Aravind HT<br>
<<a href="mailto:aravind.ht@gmail.com" target="_blank">aravind.ht@gmail.com</a>> wrote:<br>
>> > Hi,<br>
>> ><br>
>> > I am trying to obtain the performance characteristics<br>
of lttng with the<br>
>> > use<br>
>> > of test applications. Traces are being produced on a<br>
local node and<br>
>> > delivered to relayd that is running on a separate<br>
node for storage.<br>
>> ><br>
>> > An lttng session with the test applications producing<br>
an initial bit<br>
>> > rate of<br>
>> > 10 kb/s is started and run for about 30 seconds. The<br>
starting sub-buffer<br>
>> > size is kept at 128 kb and sub-buf count at 4. The<br>
session is then<br>
>> > stopped<br>
>> > and destroyed and traces are analyzed to see if there<br>
are any drops.<br>
>> > This is<br>
>> > being done in a loop with every subsequent session<br>
having an increment<br>
>> > of 2<br>
>> > kb/s as long as there are no drops. If there are<br>
drops, I increase the<br>
>> > buffer size by a factor of x2 without incrementing<br>
the bit rate.<br>
>> ><br>
>> > I see trace drops happening consistently with test<br>
apps producing traces<br>
>> > at<br>
>> > less than 40 kb/s, it doesnt seem to help even if I<br>
started with 1mb x 4<br>
>> > sub-buffers.<br>
>> ><br>
>> > Analysis :<br>
>> ><br>
>> > I have attached the lttng_relayd , lttng_consumerd_64<br>
logs and the<br>
>> > entire<br>
>> > trace directory, hope you will be able to view it.<br>
>> > I have modified lttng_relayd code to dump the traces<br>
being captured in<br>
>> > the<br>
>> > lttng_relayd logs along with debug info.<br>
>> ><br>
>> > Each test app is producing logs in the form of :<br>
>> > "TraceApp PID - 31940 THID - 31970 @threadRate - 1032<br>
b/s appRate - 2079<br>
>> > b/s<br>
>> > threadTraceNum - 9 appTraceNum - 18 sleepTime - 192120"<br>
>> ><br>
>> > The test application PID, test application thread id,<br>
thread bit rate,<br>
>> > test<br>
>> > app bit rate, thread trace number and application<br>
trace number s are<br>
>> > part of<br>
>> > the trace. So in the above trace, the thread is<br>
producing at 1 kb/s and<br>
>> > the<br>
>> > whole test app is producing at 2 kb/s.<br>
>> ><br>
>> > If we look at the babeltrace out put, we see that the<br>
Trace with<br>
>> > TraceApp<br>
>> > PID - 31940 appTraceNum 2 is missing , with 1, 3, 4,<br>
5 and so on being<br>
>> > successfully captured.<br>
>> > I looked at the lttng_relayd logs and found that<br>
trace of "appTraceNum<br>
>> > 2" is<br>
>> > not delivered/generated by the consumerd to the<br>
relayd in sequence with<br>
>> > other traces. To rule out that this is not a test<br>
application problem,<br>
>> > you<br>
>> > can look at line ltttng_relayd log : 12778 and see<br>
traces from<br>
>> > appTraceNum -<br>
>> > 1 to appTraceNum - 18 including the appTraceNum 2 are<br>
"re-delivered" by<br>
>> > the<br>
>> > consumerd to the relayd.<br>
>> > Essentially, I see appTraceNum 1 through appTraceNum<br>
18 being delivered<br>
>> > twice, once individually where appTraceNum 2 is<br>
missing and once as a<br>
>> > group<br>
>> > at line 12778 where its present.<br>
>> ><br>
>> ><br>
>> > Request help with<br>
>> > 1. why traces are delivered twice, is it by design or<br>
a genuine problem<br>
>> > ?<br>
>> > 2. how to avoid traces being dropped even though<br>
buffers are<br>
>> > sufficiently<br>
>> > large enough ?<br>
>> ><br>
>> ><br>
>> > Regards,<br>
>> > Aravind.<br>
>> ><br>
>> > _______________________________________________<br>
>> > lttng-dev mailing list<br>
>> > <a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a><br>
>> > <a href="http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" rel="noreferrer" target="_blank">http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><br>
>> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> Jérémie Galarneau<br>
>> EfficiOS Inc.<br>
>> <a href="http://www.efficios.com" rel="noreferrer" target="_blank">http://www.efficios.com</a><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Jérémie Galarneau<br>
EfficiOS Inc.<br>
<a href="http://www.efficios.com" rel="noreferrer" target="_blank">http://www.efficios.com</a><br>
<br>
<br>
<br>
_______________________________________________<br>
lttng-dev mailing list<br></div></div>
<a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a> <mailto:<a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a>><span><br>
<a href="http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" rel="noreferrer" target="_blank">http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><br>
<br>
<br>
<br>
<br>
-- Jonathan Rajotte Julien<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
lttng-dev mailing list<br>
<a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a><br>
<a href="http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" rel="noreferrer" target="_blank">http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><br>
</span></blockquote><span><font color="#888888">
<br>
-- <br>
Jonathan R. Julien<br>
Efficios<br>
<br>
</font></span></blockquote></div><br></div>
</div></div></blockquote></div><br></div>