[lttng-dev] [PATCH liburcu] Fix pthread_atfork() behaviour
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Thu Apr 17 09:12:39 EDT 2014
----- Original Message -----
> From: "Mathieu Desnoyers" <mathieu.desnoyers at efficios.com>
> To: "Keir Fraser" <keir at cohodata.com>
> Cc: lttng-dev at lists.lttng.org, "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> Sent: Thursday, April 17, 2014 8:43:19 AM
> Subject: Re: [lttng-dev] [PATCH liburcu] Fix pthread_atfork() behaviour
>
> ----- Original Message -----
> > From: "Mathieu Desnoyers" <mathieu.desnoyers at efficios.com>
> > To: "Keir Fraser" <keir at cohodata.com>
> > Cc: lttng-dev at lists.lttng.org, "Paul E. McKenney"
> > <paulmck at linux.vnet.ibm.com>
> > Sent: Thursday, April 17, 2014 8:20:00 AM
> > Subject: Re: [lttng-dev] [PATCH liburcu] Fix pthread_atfork() behaviour
> >
> > ----- Original Message -----
> > > From: "Keir Fraser" <keir at cohodata.com>
> > > To: lttng-dev at lists.lttng.org
> > > Sent: Monday, April 14, 2014 9:31:57 AM
> > > Subject: [lttng-dev] [PATCH liburcu] Fix pthread_atfork() behaviour
> > >
> > > In the process of integrating liburcu into a multi-threaded codebase
> > > with fork()s I found a couple of problems with liburcu that I could not
> > > work around without fixing the library. Hence I present the two required
> > > fixes here (as attachments, sorry!) with some background info about them.
> >
> > This is interesting!
> >
> > An initial question: Which URCU flavor are you using ?
> >
> > >
> > > After fork() the child process has no pthreads but the one that called
> > > fork(). Unfortunately call_rcu_after_fork_child() does not update URCU's
> > > thread registry to reflect this -- if fork() is called with any threads
> > > registered with URCU then the child process will inherit a corrupted
> > > registry containing a linked list through per-thread TLS state which is
> > > no longer valid allocated memory.
> >
> > You appear to be using the mb/memb URCU flavor.
> >
> > If we look at URCU README file:
> >
> > Interaction with fork()
> >
> > Special care must be taken for applications performing fork()
> > without
> > any following exec(). This is caused by the fact that Linux only
> > clones
> > the thread calling fork(), and thus never replicates any of the
> > other
> > parent thread into the child process. Most liburcu implementations
> > require that all registrations (as reader, defer_rcu and call_rcu
> > threads) should be released before a fork() is performed, except
> > for
> > the
> > rather common scenario where fork() is immediately followed by
> > exec()
> > in
> > the child process. The only implementation not subject to that rule
> > is
> > liburcu-bp, which is designed to handle fork() by calling
> > rcu_bp_before_fork, rcu_bp_after_fork_parent and
> > rcu_bp_after_fork_child.
> >
> > Applications that use call_rcu() and that fork() without
> > doing an immediate exec() must take special action. The parent
> > must invoke call_rcu_before_fork() before the fork() and
> > call_rcu_after_fork_parent() after the fork(). The child
> > process must invoke call_rcu_after_fork_child().
> > Even though these three APIs are suitable for passing to
> > pthread_atfork(), use of pthread_atfork() is *STRONGLY
> > DISCOURAGED* for programs calling the glibc memory allocator
> > (malloc(), calloc(), free(), ...) within call_rcu callbacks.
> > This is due to limitations in the way glibc memory allocator
> > handles calls to the memory allocator from concurrent threads
> > while the pthread_atfork() handlers are executing.
> > Combining e.g.:
> > * call to free() from callbacks executed within call_rcu worker
> > threads,
> > * executing call_rcu atfork handlers within the glibc pthread
> > atfork mechanism,
> > will sometimes trigger interesting process hangs. This usually
> > hangs on a memory allocator lock within glibc.
> >
> >
> > > Crash or hang soon after is the
> > > result. Patch 1 therefore simply clears the registry list in the child
> > > process. Caveats here are that (a) the calling thread cannot be
> > > registered (it must unregister/re-register itself in the atfork
> > > handlers); and (b) some flavours of URCU may have more complex
> > > registries than a simple linked list and so this patch may not be
> > > sufficient for those. I only tested the memb/mb flavour myself.
> >
> > Firstly, your use-case seems to be only supported by the urcu-bp
> > flavor, and explicitly unsupported by other flavors. If you want
> > to do this with other urcu flavors, you need to explicitly
> > unregister all your other RCU threads before doing the fork().
> >
> > Also, if we want to eventually support this in the other flavors,
> > we would need to implement the flavor-specific teardown within each
> > flavor implementation rather than in the call_rcu implementation.
> >
> > Moreover, as stated in the README file, call_rcu() pre/post fork
> > handlers don't behave well with glibc's pthread_atfork(), due to
> > assumptions done within the glibc memory allocator.
> >
> > >
> > > A second problem is that although call_rcu threads are paused across
> > > fork(), the handshaking PAUSED flag is not cleared when their execution
> > > resumes. Hence a second fork() invocation in the original parent process
> > > will not spin-wait for call_rcu threads to quiesce (as the atfork
> > > handler will observe all PAUSED flags already set). Patch 2 fixes this
> > > with the appropriate clearing handshake on resume, post-fork.
> >
> > I'd be very interested to see a test-case for this problem against
> > the urcu-bp flavor. I think you have indeed caught a real bug here.
> >
> > >
> > > Please feel free to modify or rewrite these patches, or solve the
> > > described problems in a different way, as you see fit! Cc me on replies
> > > as I am not a subscriber to this list.
> >
> > Could you provide a small test-case that we could add to the call_rcu
> > regression (double fork) under tests/regression to make sure this behavior
> > is covered in the future ? A small refactoring of test_urcu_fork.c to
> > extend it should do the trick.
>
> I'm currently doing the test_urcu_fork.c refactoring, and testing your
> patch for call_rcu and double-fork. (letting you know so we don't
> duplicate the effort)
Your fix for the call_rcu handling of double-fork is merged into master,
stable-0.7, stable-0.8 branches.
I modified test_urcu_fork.c in master so it tests a sequence of 10 forks,
3 children deep (recursively). It seems to be solid with your patch. Without,
it hangs very quickly.
I opened this bug tracker entry for this bug:
http://bugs.lttng.org/issues/786
Thanks!
Mathieu
>
> Thanks,
>
> Mathieu
>
> >
> > Thanks!
> >
> > Mathieu
> >
> > >
> > > Regards,
> > > Keir Fraser
> > >
> > > _______________________________________________
> > > lttng-dev mailing list
> > > lttng-dev at lists.lttng.org
> > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> > >
> >
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
> >
> > _______________________________________________
> > lttng-dev mailing list
> > lttng-dev at lists.lttng.org
> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev at lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the lttng-dev
mailing list