[lttng-dev] Race condition between lttng destroy and lttng load commands
douglas.graham at ericsson.com
Mon Jun 11 15:50:49 EDT 2018
it may not be obvious that the reason he is bringing that up is that we (Ericsson)
need that functionality to implement the clear log functionality that we have
mentioned to you before. To clear the logs, we currently do:
$ lttng save <save_file>
$ lttng stop
$ lttng destroy
$ lttng load <save_file>
The lttng destroy is supposed to shut down the lttng session and clear all the
existing logs. Then the lttng load restores the session as it was before the
destroy. This works most of the time but it seems that there is a race.
Lttng destroy isn't guaranteed to have finished shutting down and destroying
the session by the time control returns to the shell. We occasionally see the
"lttng load" fail because it is trying to recreate a directory tree that is still
in the process of being removed by "lttng destroy".
The problem seems to be that the directory tree is removed with a call to
call_rcu(®p->node.head, rcu_free_buffer_reg_uid). The function
rcu_free_buffer_reg_uid() is run asynchronously and is not guaranteed to have
finished by the time call_rcu() returns. I think this should probably be
considered a deficiency in the implementation of "lttng destroy". Is this
something that can be easily fixed or should we be looking for a workaround?
More information about the lttng-dev