[lttng-dev] [rp] [RFC] Userspace RCU library internal error handling

Thu Jun 21 15:28:24 EDT 2012

On Thu, Jun 21, 2012 at 03:03:06PM -0400, Mathieu Desnoyers wrote:
> * Josh Triplett (josh at joshtriplett.org) wrote:
> > On Thu, Jun 21, 2012 at 12:41:13PM -0400, Mathieu Desnoyers wrote:
> > > Currently, liburcu calls "exit(-1)" upon internal consistency error.
> > > This is not pretty, and usually frowned upon in libraries.
> > 
> > Agreed.
> > 
> > > One example of failure path where we use this is if pthread_mutex_lock()
> > > would happen to fail within synchronize_rcu(). Clearly, this should
> > > _never_ happen: it would typically be triggered only by memory
> > > corruption (or other terrible things like that). That being said, we
> > > clearly don't want to make "synchronize_rcu()" return errors like that
> > > to the application, because it would complexify the application error
> > > handling needlessly.
> > 
> > I think you can safely ignore any error conditions you know you can't
> > trigger.  pthread_mutex_lock can only return an error under two
> > conditions: an uninitialized mutex, or an error-checking mutex already
> > locked by the current thread.  Neither of those can happen in this case.
> > Given that, I'd suggest either calling pthread_mutex_lock and ignoring
> > any possibility of error, or adding an assert.
> > 
> > > So instead of calling exit(-1), one possibility would be to do something
> > > like this:
> > > 
> > > #include <signal.h>
> > > #include <pthread.h>
> > > #include <stdio.h>
> > > 
> > > #define urcu_die(fmt, ...)                      \
> > >         do {    \
> > >                 fprintf(stderr, fmt, ##__VA_ARGS__);    \
> > >                 (void) pthread_kill(pthread_self(), SIGBUS);    \
> > >         } while (0)
> > > 
> > > and call urcu_die(); in those "unrecoverable error" cases, instead of
> > > calling exit(-1). Therefore, if an application chooses to trap those
> > > signals, it can, which is otherwise not possible with a direct call to
> > > exit().
> > 
> > It looks like you want to use signals as a kind of exception mechanism,
> > to allow the application to clean up (though not to recover).  assert
> > seems much clearer to me for "this can't happen" cases, and assert also
> > generates a signal that the application can catch and clean up.
> 
> Within discussions with other LTTng developers, we considered the the
> assert, but the thought that this case might be silently ignored on
> production systems (which compile with assertions disabled) makes me
> uncomfortable. This is why I would prefer a SIGBUS to an assertion.
> 
> Using assert() would be similar to turning the Linux kernel BUG_ON()
> mechanism into no-ops on production systems because "it should never
> happen" (tm) ;-)

Just don't define NDEBUG then. :)

- Josh Triplett