[ltt-dev] LTT UserSpace Tracer, broken?

jpaul at gdrs.com jpaul at gdrs.com
Mon May 24 13:11:10 EDT 2010


Thanks Pierre-Marc.  That will teach me to post something to a public board without double checking the interface first. The "3" below was a cut/paste issue from some glibc code (sched_getcpu.c) and I've replaced that coding line with:

  int r = syscall(SYS_getcpu, &cpu);

I've verified the proper operation of the above call in a separate test program. I've rebuild everything after making that change. Unfortunately, that does not get rid of the segmentation fault with usttrace:

# usttrace ./ustTest
/usr/local/bin/usttrace: line 156: 20724 Segmentation fault      $CMD 2>&1
Waiting for ustd to shutdown...
Trace was output in:  /root/.usttraces/machineName-20100524100514656225139

Nor does this resolve the issue with the application seg-faulting with ustd:

# export UST_AUTOPROBE=1
# gcc -o ustTest ustTest.c -lust
# mkdir /tmp/trace                 <- ust-app-socks already present
# ustd &
# ./ustTest &

# ustctl --create-trace 20798
# ustctl --start-trace 20798

libustcomm[20795/20812]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
ustd[20795/20812]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
ustd[20795/20812]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
libustcomm[20795/20813]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
ustd[20795/20813]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
ustd[20795/20813]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
libustcomm[20795/20814]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
ustd[20795/20814]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
ustd[20795/20814]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
libustcomm[20795/20815]: Error: connect (path=/tmp/ust-app-socks/20798): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
ustd[20795/20815]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
ustd[20795/20815]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
ustd[20795/20810]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
ustd[20795/20811]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
[6]+  Segmentation fault      ./ustTest

# ls /tmp/ust-app-socks/
20798  ustd

I'm guessing that ustd is complaining as my test application dumped and is no longer active. Looking at a core dump of my test app, it appears that the seg fault occurred at the following line of _rcu_read_unlock():

  _STORE_SHARED(rcu_reader->ctr, rcu_reader->ctr - RCU_GP_COUNT);

Which was called from ltt_vtrace(). But that only seems to fail when the syscall(getcpu) returns with a -1. I actually changed ltt_vtrace() code as follows:

{
//	cpu = ust_get_cpu();
  int r = syscall(SYS_getcpu, &cpu);
  if (r == -1)
    cpu = r;
  if (cpu == -1)
    printf(".. invalid cpu %s (%d)\n", strerror(errno), errno);
}

And had the following print out:

.. invalid cpu Bad address (14)

So ... it appears that something isn't working correctly to make that syscall here. Not really sure why this is failing .. maybe a thread related issue? It doesn't fail every time. Maybe best to upgrade the latest glibc and try again with the inline methods? It is important to note that the following code comes directly from glibc-2.11 and sched_getcpu() can return a -1 upon a failed INLINE_SYSCALL. Would suggest that ltt_vtrace() be changed to properly handle a -1 cpu value:

int
sched_getcpu (void)
{
#ifdef __NR_getcpu
  unsigned int cpu;
  int r = INLINE_SYSCALL (getcpu, 3, &cpu, NULL, NULL);

  return r == -1 ? r : cpu;
#else
  __set_errno (ENOSYS);
  return -1;
#endif
}

Other suggestions?

Thanks

JP

-----Original Message-----
From: Pierre-Marc Fournier [mailto:pierre-marc.fournier at polymtl.ca]
Sent: Fri 5/21/2010 2:49 PM
To: John P. Paul
Cc: ltt-dev at lists.casi.polymtl.ca; Kenneth R. Macfarlane; Mathieu Desnoyers
Subject: Re: [ltt-dev] LTT UserSpace Tracer, broken?
 
jpaul at gdrs.com wrote:
> So, I'm switching modes and attempting to get the "user space tracer" up and running. I'm following the instructions provided here:
> 
> http://lttng.org/files/ust/manual/ust.html
> 
> I've downloaded and built the following packages:
> 
>     libkcompat-0.2.tar.gz
>     ust-0.4.tar.gz
>     userspace-rcu-0.4.3.tar.bz2
>     lttv-0.12.31-04072010.tar.gz
> 
> First issue: ust-0.4 will not compile due to an undefined reference to sched_getcpu(). I'm using the latest version of Redhat Enterprise 5.5 and my installed glibc doesn't have that method defined (gcc version=4.3.4, glibc version=2.5-49 .. believe this method came into being around glibc-2.6?). Knowing that upgrading glibc is not a trivial process (as I've actually taken down a machine by doing so incorrectly), I'm not confident that most users may not want to go that route. I've downloaded glib2-2.11 and pulled some code out of that library to place the following in serialize.c:
> 
> // Added to avoid upgrading glibc
> int
> sched_getcpu (void)
> {
> #ifdef __NR_getcpu
>   unsigned int cpu;
> //  int r = INLINE_SYSCALL (getcpu, 3, &cpu, NULL, NULL);
>   int r = syscall(SYS_getcpu, 3, &cpu, NULL, NULL);
> 
>   return r == -1 ? r : cpu;
> #else
>   __set_errno (ENOSYS);
>   return -1;
> #endif
> }

Yes you need glibc 2.6 to have sched_getcpu. However, on my system, it 
is a vsyscall, therefore you cannot call it with syscall().

> 
> That may or may not be the cause of some follow-on issues. Maybe glibc "really" needs to be upgraded to effectively use the userspace tracer .. please comment if so and I go that route next. Also, note that I'm doing all this work at "root", so there should be no permissions issues.
> 
> Next, I attempted to build and run the "quick start" code as detailed in the instructions (copying that code into ustTest.c):
> 
> gcc -o ustTest -g -Wall ustTest.c -lust
> 
> # usttrace ./ustTest
> /usr/local/bin/usttrace: line 156: 23606 Segmentation fault      $CMD 2>&1
> Waiting for ustd to shutdown...
> Trace was output in:  /root/.usttraces/machineName-20100520172047665580280

Hmmm... what is the use of the "3" you are passing as argument to 
syscall()? This appears to be an invalid pointer, which results in -1 
being returned by the syscall and consequently a segfault.

> 
> # lttv -m textDump -t  /root/.usttraces/machineName-20100520172047665580280
> ** (process:23613): WARNING **: Trace /root/.usttraces/machineName-20100520172047665580280 has no metadata tracefile
> ** (process:23613): CRITICAL **: cannot open trace /root/.usttraces/machineName-20100520172047665580280
> /usr/local/bin/lttv: line 15: 23613 Segmentation fault      $0.real $*
> 
> So, next I gave up on the command line attempts and tried to get things up and running manually (still using the provided instructions).
> 
> mkdir /tmp/ustsocks   <- not sure this is correct anymore, may have changed to: /tmp/ust-app-socks

Thanks, fixed in the manual.

> mkdir /tmp/trace
> ustd &
> ./ustTest &
> [7] 23652
> 
> My ustTest was changed as follows before running that executable:
> 
> int main(int argc, char **argv)
> {
>     int v = 23;
>     char *st = "test23";
> 
>     for(;;)
>     {
>     trace_mark(main, 23, "firstarg %d secondarg %s", v, st);
> 
>     /* another marker without arguments: */
>     trace_mark(main, 24, MARK_NOARGS);
>     sleep(30);
>     }
> 
>     return 0;
> }
> 
> # ustctl --list-markers 23652
> {PID: 23652, channel/marker: metadata/core_marker_id, state: 1, fmt: "channel %s name %s event_id %hu int #1u%zu long #1u%zu pointer #1u%zu size_t #1u%zu alignment #1u%u" 0x3301d0}
> {PID: 23652, channel/marker: metadata/core_marker_format, state: 1, fmt: "channel %s name %s format %s" 0x3301e3}
> {PID: 23652, channel/marker: metadata/core_marker_format, state: 1, fmt: "channel %s name %s format %s" 0x330d63}
> {PID: 23652, channel/marker: metadata/core_marker_id, state: 1, fmt: "channel %s name %s event_id %hu int #1u%zu long #1u%zu pointer #1u%zu size_t #1u%zu alignment #1u%u" 0x331483}
> {PID: 23652, channel/marker: metadata/core_marker_format, state: 1, fmt: "channel %s name %s format %s" 0x332b35}
> {PID: 23652, channel/marker: metadata/core_marker_id, state: 1, fmt: "channel %s name %s event_id %hu int #1u%zu long #1u%zu pointer #1u%zu size_t #1u%zu alignment #1u%u" 0x332be6}
> {PID: 23652, channel/marker: ust/potential_exec, state: 0, fmt: " " 0x33d9b6}
> {PID: 23652, channel/marker: main/23, state: 0, fmt: "firstarg %d secondarg %s" 0x8048763}
> {PID: 23652, channel/marker: main/24, state: 0, fmt: " " 0x80487aa}
> {PID: 23652, channel/marker: ust/dummymark, state: 0, fmt: " " 0x8048893}
> 
> ustctl --enable-marker main/23 23652
> 
> # ustctl --create-trace 23652
> # ustctl --start-trace 23652
> ustd[23630/23663]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> libustcomm[23630/23666]: Error: connect (path=/tmp/ust-app-socks/23652): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustd[23630/23666]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
> ustd[23630/23666]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> ustd[23630/23664]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> libustcomm[23630/23665]: Error: connect (path=/tmp/ust-app-socks/23652): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustd[23630/23665]: Warning: unable to connect to process, it probably died before we were able to connect (in connect_buffer() at ustd.c:250)
> ustd[23630/23665]: Error: failed to connect to buffer (in consumer_thread() at ustd.c:581)
> [7]+  Segmentation fault      ./ustTest
> 
> # ustctl --stop-trace 23652
> libustcomm[23668/23668]: Error: connect (path=/tmp/ust-app-socks/23652): Connection refused (in ustcomm_connect_path() at ustcomm.c:581)
> ustctl[23668/23668]: Error: could not connect to PID 23652 (in ustcmd_send_cmd() at ../libustcmd/ustcmd.c:439)
> ustctl[23668/23668]: Error: error while trying to stop trace for PID 23652
>  (in main() at ustctl.c:230)
> 
> Machine info (non-RT kernel): Intel Core2 Duo
> # uname -a
> Linux machineName 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 EDT 2010 i686 i686 i386 GNU/Linux
> 
> So ... could the above be related to my glibc patch (or not having a later version of glibc) or are things more broken as they appear?  Maybe I'm missing additional steps that are not documented?
> 
> Also a suggestion: for large programs, of many developers and lots of code, it might be somewhat of a hassle to have to enable each and every marker. I would suggest adding a "--enable-all-markers" capability to ustctl.

Good idea. In the meantime, you can start your program with the 
UST_AUTOPROBE=1 environment var, which will have the same effect. When 
you use usttrace, this is done automatically.

> 
> Thanks
> 
> JP
> 
> 
> 
> 
> --
> This is an e-mail from General Dynamics Robotic Systems. It is for the intended recipient only and may contain confidential and privileged information. No one else may read, print, store, copy, forward or act in reliance on it or its attachments. If you are not the intended recipient, please return this message to the sender and delete the message and any attachments from your computer. Your cooperation is appreciated.
> 
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev






--
This is an e-mail from General Dynamics Robotic Systems. It is for the intended recipient only and may contain confidential and privileged information. No one else may read, print, store, copy, forward or act in reliance on it or its attachments. If you are not the intended recipient, please return this message to the sender and delete the message and any attachments from your computer. Your cooperation is appreciated.





More information about the lttng-dev mailing list