[ltt-dev] problems with ust

Sylvain Geneves sylvain.geneves at inrialpes.fr
Thu Apr 15 11:43:58 EDT 2010


On 04/14/2010 06:02 PM, Pierre-Marc Fournier wrote:
> On 04/14/2010 05:26 AM, Sylvain Geneves wrote:
>>>
>>>>>> Here's what i see in the resulting trace directory:
>>>>>>
>>>>>> http://pastebin.com/raw.php?i=TqJWarkA
>>>>>>
>>>>>> it seems that all traces aren't recorded (some metadata and ust are
>>>>>> zero
>>>>>> size), i can't understand why, i must be missing something here... ?
>>>>>
>>>>> -f should fix this.
>>>>>
>>>>>>
>>>>>> also, when using lttv, it says "Cannot open trace : maybe you should
>>>>>> enter in the trace directory to select it ?"
>>>>>> i note that lttv can open a subdirectory (like
>>>>>> /root/.usttraces/californium-20100329172748911948191/8494_5454077942969735172
>>>>>>
>>>>>>
>>>>>>
>>>>>> in my example), but obviously all i can see is a subset of what
>>>>>> really
>>>>>> happenned...
>>>
>>> I just looked at this more closely.
>>>
>> thanks for your help
>>
>>
>>> If your program does not fork (nor clone without the CLONE_VM flag),
>>> it's still possible you are tracing several processes. This would be
>>> caused by the fact that the command you are passing to usttrace is not
>>> an elf executable but rather a shell script that does many things like
>>> run commands, which starts new processes. For example, a shell script
>>> that uses find, sed, ls etc. will start them in their distinct process.
>>> The tracer will trace everything. To find out what these processes are,
>>> you could run the same command with strace -f and see what is going on.
>>>
>>
>> the command i launch is actually an executable, not a script. What it
>> does is quite simple :
>> - the application registers some events to be processed
>> - it creates one thread per core on the machine
>> - each thread executes a loop to execute events
>> after all events has been processed, each created thread calls
>> pthread_exit, and the application exits normally.
>
> Interesting, I have a very similar program I use for benchmarking ust
> and it works well.
>
>>
>> the output of strace -f shows that there is no call to fork in there,
>> but clone is called multiple times.
>>
>> that leads me to a question about how to correctly enable the CLONE_VM
>> flag for ust : is enabling it through the shell environment (export
>> CLONE_VM=1) sufficient ?
>> is there another way to do it ?
>
> CLONE_VM is a flag you pass to the clone() call. If you don't pass it,
> clone() acts more like fork() in the sense it creates a new address
> space for the new task. If you do pass it, the same address space is
> reused, effectively creating a thread. If you are using pthreads, then
> pthread is passing CLONE_VM, you can actually see it with strace.

thanks for your explanation, i only use pthread_create, and don't call 
clone directly so it shouldn't be a problem.

>
> The reason I was suggesting you use strace is to find out what the tons
> of processes we are seeing are. Are you using the -l option to usttrace?
> This could be another cause. If so, try to stop using it. It's going to
> simplify things.

indeed i was using -l flag with usttrace, removing it simplifies a lot 
the output!
here's what i get when i list the trace directory without the -l option:

$ls ~/.usttraces/californium-20100415164805348661147/
14989_5460376113077649648  app.log  ustd.log


the only error message i see now is the following :

sh: symbol lookup error: /usr/local/lib/libustfork.so: undefined symbol: 
ust_before_fork

the simlink is valid and points to the librairy.

$readelf -a /usr/local/lib/libustfork.so.0.0.0 | grep ust_before_fork
00001ba4  00000607 R_386_JUMP_SLOT   00000000   ust_before_fork
      6: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND ust_before_fork
     64: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND ust_before_fork


it seems that the function is declared somewhere, but not defined ?

>
>>
>>
>>> The fact that some of the files are empty means that the daemon was
>>> unable to connect to the processes at some point. Indeed, you are
>>> getting printed errors about this. But it is initially able to because
>>> we see there is at least one file that is not empty. This is unlikely
>>> due to the fact the process stops existing too quickly. This could
>>> happen because of an exec() or an exit(), but there is a keep alive
>>> mechanism (triggered by ustctl -f in the case of the exec) that induces
>>> a delay in these calls when the ustd did not have the time to connect to
>>> all buffers yet.
>>>
>>> Either there is a bug in this mechanism, or something crashes. Could be
>>> a bug in your program or in ust. When the traced program crashes, only
>>> the buffers that were already connected can be recovered. Enabling core
>>> dumps (ulimit -c) and checking if there are core files in the directory
>>> after execution could help finding if you have segfaults or similar
>>> crashes.
>>
>> That's a good point.
>> nothing crashes explicitely (to be sure i ran my program after "ulimit
>> -c unlimited" and they were no resulting core dump).
>>
>>>
>>> Did you have a look at ustd.log and app.log in the trace directory? They
>>> could give additional information.
>>
>> i did not looked at it enough, since there is some interesting lines
>> near the end of the files :
>>
>> in ustd.log, when grepping with Error i get multiple lines like this :
>>
>>
>> ustd[17643/17973]: Error: unable to parse response to get_pidunique (in
>> connect_buffer() at ustd.c:268)
>> ustd[17643/17973]: Error: failed to connect to buffer (in
>> consumer_thread() at ustd.c:581)
>> libustcomm[17643/17995]: Error: connect (path=/tmp/ust-app-socks/17992):
>> Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
>
> The "unable to parse response to get_pidunique" is not supposed to
> happen. This would certainly result in empty files. Perhaps add some
> code to print the string that is received (near ustd.c:268). It would
> help debug.
>

apparently, i can't reproduce it...here are the only errors left when 
using the -l flag with the latest version of ust :

libustcomm[19070/19396]: Error: connect (path=/tmp/ust-app-socks/19391): 
Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
ustd[19070/19398]: Error: failed to connect to buffer (in 
consumer_thread() at ustd.c:582)
ustd[19070/19396]: Error: failed to connect to buffer (in 
consumer_thread() at ustd.c:582)
ustd[19070/19397]: Error: problem in ustcomm_send_request(get_shmid) (in 
connect_buffer() at ustd.c:280)
ustd[19070/19397]: Error: failed to connect to buffer (in 
consumer_thread() at ustd.c:582)


>>
>>
>> when grepping with die there are a lot of :
>>
>> ustd[17643/17710]: application died while putting subbuffer (in
>> consumer_loop() at ustd.c:513)
>> ustd[17643/17687]: app died while being traced (in get_subbuffer() at
>> ustd.c:75)
>> ustd[17643/17745]: For buffer metadata_2, the trace was not found. This
>> llibustcomm[17643/17747]: sent message "put_subbuffer ust_0 0" (in
>> send_messagustd[17643/17745]: application died while putting subbuffer
>> (in consumer_loop() at ustd.c:513)
>> ustd[17643/17995]: Warning: unable to connect to process, it probably
>> died before we were able to connect (in connect_buffer() at ustd.c:250)
>>
>>
>> in app.log, i can see some lines like this :
>>
>> libust[17924/17928]: Cannot find trace. It was likely destroyed by the
>> user. (in do_cmd_put_subbuffer() at tracectl.c:717)
>>
>>
>>
>> sorry not to have seen this before...
>> from what i understand, it seems ust is seeing my application dying
>> unexpectedly. It seems weird to me because the application appears to be
>> exiting gracefully (no segfault, nor else). What could i do to check
>> that more thoroughly ?
>> Are there some ust calls to add in my application to notify ust we will
>> stop tracing, or the thread will exit ?
>
>
> Most of these messages you pasted are actually not problems. Anything
> that does not start with "Error" or "Warning" is being displayed only
> because you are in debug mode and is being handled gracefully. Yes, ust
> finds the program dies "unexpectedly" because the trace is still
> running. There is an ltt_trace_stop() function you could call to stop
> the trace, but I do not recommend it. When the daemon detects the
> process ended, the buffer recovery kicks in and that's ok. Anyway, the
> problem seems not to be there but rather at buffer connection time.
>
> Concerning the "Warning: unable to connect to process, it probably died
> before we were able to connect", that is really problematic because it
> means the buffers files will be empty. To understand what is going on
> there it would probably help to have the full log files.
>
> Are you able to trace a small program with just one marker like the
> example in the manual?

i should have i tried that in the first place, i will do that as soon as 
i can

>
> Did you update to the latest git to get the -fno-strict-aliasing?
>

i'm sorry i didn't have time to do this until now, everything compiles 
fine now :)


thanks,
Sylvain

> pmf
>
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev





More information about the lttng-dev mailing list