[lttng-dev] High memory consumption issue on RCU side

Sat Sep 24 19:22:07 UTC 2016

On Sat, Sep 24, 2016 at 03:34:47PM +0000, Mathieu Desnoyers wrote:
> ----- On Sep 24, 2016, at 11:22 AM, Paul E. McKenney paulmck at linux.vnet.ibm.com wrote:
> 
> > On Sat, Sep 24, 2016 at 10:42:24AM +0300, Evgeniy Ivanov wrote:
> >> Hi Mathieu,
> >> 
> >> On Sat, Sep 24, 2016 at 12:59 AM, Mathieu Desnoyers
> >> <mathieu.desnoyers at efficios.com> wrote:
> >> > ----- On Sep 22, 2016, at 3:14 PM, Evgeniy Ivanov lolkaantimat at gmail.com wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I'm investigating high memory usage of my program: RSS varies between
> >> >> executions in range 20-50 GB, though it should be determenistic. I've
> >> >> found that all the memory is allocated in this stack:
> >> >>
> >> >> Allocated 17673781248 bytes in 556 allocations
> >> >>        cds_lfht_alloc_bucket_table3     from liburcu-cds.so.2.0.0
> >> >>        _do_cds_lfht_resize      from liburcu-cds.so.2.0.0
> >> >>        do_resize_cb             from liburcu-cds.so.2.0.0
> >> >>        call_rcu_thread          from liburcu-qsbr.so.2.0.0
> >> >>        start_thread             from libpthread-2.12.so
> >> >>        clone                    from libc-2.12.so
> >> >>
> >> >> According pstack it should be quiescent state.  Call thread waits on syscall:
> >> >> syscall
> >> >> call_rcu_thread
> >> >> start_thread
> >> >> clone
> >> >>
> >> >> We use urcu-0.8.7, only rculfhash (QSBR). Is it some kind of leak in
> >> >> RCU or any chance I misuse it? What would you recommend to
> >> >> troubleshoot the situation?
> >> >
> >> > urcu-qsbr is the fastest flavor of urcu, but it is rather tricky to use well.
> >> > Make sure that:
> >> >
> >> > - Each registered thread periodically reach a quiescent state, by:
> >> >   - Invoking rcu_quiescent_state periodically, and
> >> >   - Making sure to surround any blocking for relatively large amount of
> >> >     time by rcu_thread_offline()/rcu_thread_online().
> >> >
> >> > In urcu-qsbr, the "default" state of threads is to be within a RCU read-side.
> >> > Therefore, if you omit any of the two advice above, you end up in a situation
> >> > where grace periods never complete, and therefore no call_rcu() callbacks can
> >> > be processed. This effectively acts like a big memory leak.
> >> 
> >> It was the original assumption, but in memory stacks I don't see such
> >> allocations for my data. Instead huge allocations happen right in
> >> call_rcu_thread. Memory footprint for my app is about 20 GB, erasing
> >> RCU data is a rare operation, so almost 20 GB in rcu thread looks
> >> suspecios. I'll try to not erase any RCU protected data and reproduce
> >> the issue (complicated thing is that under memory tracer it happens
> >> not so often).
> > 
> > Interesting.  Trying to figure out why your call_rcu_thread() would
> > ever allocate memory.
> > 
> > Ah!  Do your RCU callbacks allocate memory?
> 
> In this case yes: urculfhash allocates memory within a call rcu worker
> thread when a hash table resize is performed.

Is this then expected behavior?

Though I must admit that 20GB sounds like some serious resizing...

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> >							Thanx, Paul
> > 
> >> > Hoping this helps,
> >> >
> >> > Thanks,
> >> >
> >> > Mathieu
> >> >
> >> >
> >> > --
> >> > Mathieu Desnoyers
> >> > EfficiOS Inc.
> >> > http://www.efficios.com
> >> 
> >> 
> >> 
> >> --
> >> Cheers,
> >> Evgeniy
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>