[ltt-dev] problems with ust

Pierre-Marc Fournier pierre-marc.fournier at polymtl.ca
Wed Apr 14 12:02:15 EDT 2010


On 04/14/2010 05:26 AM, Sylvain Geneves wrote:
>>
>>>>> Here's what i see in the resulting trace directory:
>>>>>
>>>>> http://pastebin.com/raw.php?i=TqJWarkA
>>>>>
>>>>> it seems that all traces aren't recorded (some metadata and ust are
>>>>> zero
>>>>> size), i can't understand why, i must be missing something here... ?
>>>>
>>>> -f should fix this.
>>>>
>>>>>
>>>>> also, when using lttv, it says "Cannot open trace : maybe you should
>>>>> enter in the trace directory to select it ?"
>>>>> i note that lttv can open a subdirectory (like
>>>>> /root/.usttraces/californium-20100329172748911948191/8494_5454077942969735172
>>>>>
>>>>>
>>>>> in my example), but obviously all i can see is a subset of what really
>>>>> happenned...
>>
>> I just looked at this more closely.
>>
> thanks for your help
>
>
>> If your program does not fork (nor clone without the CLONE_VM flag),
>> it's still possible you are tracing several processes. This would be
>> caused by the fact that the command you are passing to usttrace is not
>> an elf executable but rather a shell script that does many things like
>> run commands, which starts new processes. For example, a shell script
>> that uses find, sed, ls etc. will start them in their distinct process.
>> The tracer will trace everything. To find out what these processes are,
>> you could run the same command with strace -f and see what is going on.
>>
>
> the command i launch is actually an executable, not a script. What it
> does is quite simple :
> - the application registers some events to be processed
> - it creates one thread per core on the machine
> - each thread executes a loop to execute events
> after all events has been processed, each created thread calls
> pthread_exit, and the application exits normally.

Interesting, I have a very similar program I use for benchmarking ust 
and it works well.

>
> the output of strace -f shows that there is no call to fork in there,
> but clone is called multiple times.
>
> that leads me to a question about how to correctly enable the CLONE_VM
> flag for ust : is enabling it through the shell environment (export
> CLONE_VM=1) sufficient ?
> is there another way to do it ?

CLONE_VM is a flag you pass to the clone() call. If you don't pass it, 
clone() acts more like fork() in the sense it creates a new address 
space for the new task. If you do pass it, the same address space is 
reused, effectively creating a thread. If you are using pthreads, then 
pthread is passing CLONE_VM, you can actually see it with strace.

The reason I was suggesting you use strace is to find out what the tons 
of processes we are seeing are. Are you using the -l option to usttrace? 
This could be another cause. If so, try to stop using it. It's going to 
simplify things.

>
>
>> The fact that some of the files are empty means that the daemon was
>> unable to connect to the processes at some point. Indeed, you are
>> getting printed errors about this. But it is initially able to because
>> we see there is at least one file that is not empty. This is unlikely
>> due to the fact the process stops existing too quickly. This could
>> happen because of an exec() or an exit(), but there is a keep alive
>> mechanism (triggered by ustctl -f in the case of the exec) that induces
>> a delay in these calls when the ustd did not have the time to connect to
>> all buffers yet.
>>
>> Either there is a bug in this mechanism, or something crashes. Could be
>> a bug in your program or in ust. When the traced program crashes, only
>> the buffers that were already connected can be recovered. Enabling core
>> dumps (ulimit -c) and checking if there are core files in the directory
>> after execution could help finding if you have segfaults or similar
>> crashes.
>
> That's a good point.
> nothing crashes explicitely (to be sure i ran my program after "ulimit
> -c unlimited" and they were no resulting core dump).
>
>>
>> Did you have a look at ustd.log and app.log in the trace directory? They
>> could give additional information.
>
> i did not looked at it enough, since there is some interesting lines
> near the end of the files :
>
> in ustd.log, when grepping with Error i get multiple lines like this :
>
>
> ustd[17643/17973]: Error: unable to parse response to get_pidunique (in
> connect_buffer() at ustd.c:268)
> ustd[17643/17973]: Error: failed to connect to buffer (in
> consumer_thread() at ustd.c:581)
> libustcomm[17643/17995]: Error: connect (path=/tmp/ust-app-socks/17992):
> Connection refused (in ustcomm_connect_path() at ustcomm.c:581)

The "unable to parse response to get_pidunique" is not supposed to 
happen. This would certainly result in empty files. Perhaps add some 
code to print the string that is received (near ustd.c:268). It would 
help debug.

>
>
> when grepping with die there are a lot of :
>
> ustd[17643/17710]: application died while putting subbuffer (in
> consumer_loop() at ustd.c:513)
> ustd[17643/17687]: app died while being traced (in get_subbuffer() at
> ustd.c:75)
> ustd[17643/17745]: For buffer metadata_2, the trace was not found. This
> llibustcomm[17643/17747]: sent message "put_subbuffer ust_0 0" (in
> send_messagustd[17643/17745]: application died while putting subbuffer
> (in consumer_loop() at ustd.c:513)
> ustd[17643/17995]: Warning: unable to connect to process, it probably
> died before we were able to connect (in connect_buffer() at ustd.c:250)
>
>
> in app.log, i can see some lines like this :
>
> libust[17924/17928]: Cannot find trace. It was likely destroyed by the
> user. (in do_cmd_put_subbuffer() at tracectl.c:717)
>
>
>
> sorry not to have seen this before...
> from what i understand, it seems ust is seeing my application dying
> unexpectedly. It seems weird to me because the application appears to be
> exiting gracefully (no segfault, nor else). What could i do to check
> that more thoroughly ?
> Are there some ust calls to add in my application to notify ust we will
> stop tracing, or the thread will exit ?


Most of these messages you pasted are actually not problems. Anything 
that does not start with "Error" or "Warning" is being displayed only 
because you are in debug mode and is being handled gracefully. Yes, ust 
finds the program dies "unexpectedly" because the trace is still 
running. There is an ltt_trace_stop() function you could call to stop 
the trace, but I do not recommend it. When the daemon detects the 
process ended, the buffer recovery kicks in and that's ok. Anyway, the 
problem seems not to be there but rather at buffer connection time.

Concerning the "Warning: unable to connect to process, it probably died 
before we were able to connect", that is really problematic because it 
means the buffers files will be empty. To understand what is going on 
there it would probably help to have the full log files.

Are you able to trace a small program with just one marker like the 
example in the manual?

Did you update to the latest git to get the -fno-strict-aliasing?

pmf




More information about the lttng-dev mailing list