[ltt-dev] LTT UserSpace Tracer, broken?

jpaul at gdrs.com jpaul at gdrs.com
Mon Jun 7 15:31:31 EDT 2010


Greetings:

Thought I'd provide an update here on efforts to get the user space tracer to work. Based on this web site:

http://www.kernel.org/doc/man-pages/online/pages/man2/getcpu.2.html

I do see that getcpu() can accept up to three arguments (this detail is somewhat hidden when using syscall), depending on the kernel version. Per my modified code below, I've tried both of the following:

  int r = syscall(SYS_getcpu,&cpu,NULL);

or

  int r = syscall(SYS_getcpu,&cpu,NULL,NULL);

Either of the above calls still result in a segmentation fault. As an FYI: I'm working with a 2.6.18 RedHat base kernel that has glibc 2.5-49 installed. Also installed on this same machine is a couple of 2.6.33 kernels (RT and non-RT), of which I have also tried. There is no glibc update from RedHat that goes beyond what I have listed (per a recent note from tech support). I have put in a request for an update that would specifically include sched_getcpu() ... I don't expect an update anytime soon.

I spent a good part of a day trying to manually update glibc to v2.11. No luck .. my attempts usually resulting in having to restore /usr/local due to corrupted systems calls. I've also tried installing glibc into a different location and building the ust library (by either changing arguments to configure or hand editing Makefiles) to just look at that new location ... this has not been successful either. Might still be some work that could be done to allow this type of build.

The above web site that I listed seems to indicate that getcpu was added in glibc 2.6 and appeared in kernels starting at 2.6.19 ... not really sure how we are seeing some indications of that working in the versions that I listed above.

Given these details, unless anyone has any suggestions, I'm currently under the assumption that there is a hard/fast dependence on glibc 2.6 for the user space tracer.

JP

-----Original Message-----
From: Pierre-Marc Fournier [mailto:pierre-marc.fournier at polymtl.ca]
Sent: Fri 5/28/2010 12:51 PM
To: John P. Paul
Cc: ltt-dev at lists.casi.polymtl.ca; Kenneth R. Macfarlane
Subject: Re: [ltt-dev] LTT UserSpace Tracer, broken?
 
On 05/24/2010 01:11 PM, jpaul at gdrs.com wrote:
> Thanks Pierre-Marc.  That will teach me to post something to a public board without double checking the interface first. The "3" below was a cut/paste issue from some glibc code (sched_getcpu.c) and I've replaced that coding line with:
>
>    int r = syscall(SYS_getcpu,&cpu);
>
> I've verified the proper operation of the above call in a separate test program. I've rebuild everything after making that change. Unfortunately, that does not get rid of the segmentation fault with usttrace:
>
> # usttrace ./ustTest
> /usr/local/bin/usttrace: line 156: 20724 Segmentation fault      $CMD 2>&1
> Waiting for ustd to shutdown...
> Trace was output in:  /root/.usttraces/machineName-20100524100514656225139
>
> Nor does this resolve the issue with the application seg-faulting with ustd:
>
> # export UST_AUTOPROBE=1
> # gcc -o ustTest ustTest.c -lust
> # mkdir /tmp/trace<- ust-app-socks already present
> # ustd&
> # ./ustTest&
>
> # ustctl --create-trace 20798
> # ustctl --start-trace 20798
>
> libustcomm[20795/20812]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustd[20795/20812]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
> ustd[20795/20812]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> libustcomm[20795/20813]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustd[20795/20813]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
> ustd[20795/20813]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> libustcomm[20795/20814]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustd[20795/20814]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
> ustd[20795/20814]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> libustcomm[20795/20815]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustd[20795/20815]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
> ustd[20795/20815]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> ustd[20795/20810]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> ustd[20795/20811]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> [6]+  Segmentation fault      ./ustTest
>
> # ls /tmp/ust-app-socks/
> 20798  ustd
>
> I'm guessing that ustd is complaining as my test application dumped and is no longer active. Looking at a core dump of my test app, it appears that the seg fault occurred at the following line of _rcu_read_unlock():
>
>    _STORE_SHARED(rcu_reader->ctr, rcu_reader->ctr - RCU_GP_COUNT);
>
> Which was called from ltt_vtrace(). But that only seems to fail when the syscall(getcpu) returns with a -1. I actually changed ltt_vtrace() code as follows:
>
> {
> //	cpu = ust_get_cpu();
>    int r = syscall(SYS_getcpu,&cpu);
>    if (r == -1)
>      cpu = r;
>    if (cpu == -1)
>      printf(".. invalid cpu %s (%d)\n", strerror(errno), errno);
> }
>
> And had the following print out:
>
> .. invalid cpu Bad address (14)
>
> So ... it appears that something isn't working correctly to make that
> syscall here. Not really sure why this is failing .. maybe a thread
> related issue? It doesn't fail every time. Maybe best to upgrade the
> latest glibc and try again with the inline methods? It is important
> to note that the following code comes directly from glibc-2.11 and
> sched_getcpu() can return a -1 upon a failed INLINE_SYSCALL. Would
> suggest that ltt_vtrace() be changed to properly handle a -1 cpu
> value:
>

I believe the above code is failing some of the time with an "invalid 
address" because some pointers are missing in the call. You have only 1 
argument and you need 3.

I am not too enthousiastic at the idea of adding error checking for the 
getcpu call. The call should never fail and this is in the critical path 
of the tracer.

I would consider a patch with some preprocessor logic that chooses the 
right call based on the one available on the system. However, this patch 
must take into account the latest kernels which provide getcpu as a vdso.

By the way, you will get considerable performance penalty with this old 
libc. UST tries very hard not to make system calls in the tracing 
critical path because they are slow. The recent kernels/glibc's provide 
getcpu/sched_getcpu as a vdso, which helps a lot. If you are doing a 
real system call in the tracing path, this will result in a penalty.

pmf




--
This is an e-mail from General Dynamics Robotic Systems. It is for the intended recipient only and may contain confidential and privileged information. No one else may read, print, store, copy, forward or act in reliance on it or its attachments. If you are not the intended recipient, please return this message to the sender and delete the message and any attachments from your computer. Your cooperation is appreciated.





More information about the lttng-dev mailing list