[ltt-dev] [PATCH 06/10 round10] add LFHT_MEMORY_CHUNK low level memory management

Wed Nov 23 04:21:05 EST 2011

* Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> On 11/22/2011 06:20 PM, Mathieu Desnoyers wrote:
> > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote:
> >> The old memory mangement is named as LFHT_MEMORY_ORDER.
> >>
> >> Signed-off-by: Lai Jiangshan <laijs at cn.fujitsu.com>
> >> ---
> >>  rculfhash.c |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >>  1 files changed, 111 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/rculfhash.c b/rculfhash.c
> >> index b7be893..ae1177f 100644
> >> --- a/rculfhash.c
> >> +++ b/rculfhash.c
> >> @@ -172,6 +172,10 @@
> >>  #define dbg_printf(fmt, args...)
> >>  #endif
> >>  
> >> +#if !defined(LFHT_MEMORY_ORDER) && !defined(LFHT_MEMORY_CHUNK)
> >> +#define LFHT_MEMORY_ORDER
> >> +#endif
> >> +
> >>  /*
> >>   * Split-counters lazily update the global counter each 1024
> >>   * addition/removal. It automatically keeps track of resize required.
> >> @@ -195,6 +199,8 @@
> >>  #define MAX_TABLE_ORDER			64
> >>  #endif
> >>  
> >> +#define MAX_CHUNK_TABLE			(1UL << 10)
> >> +
> >>  /*
> >>   * Minimum number of bucket nodes to touch per thread to parallelize grow/shrink.
> >>   */
> >> @@ -247,6 +253,7 @@ struct rcu_table {
> >>  	unsigned long resize_target;
> >>  	int resize_initiated;
> >>  
> >> +#ifdef LFHT_MEMORY_ORDER
> >>  	/*
> >>  	 * Contains the per order-index-level bucket node table. The size
> >>  	 * of each bucket node table is half the number of hashes contained
> >> @@ -255,6 +262,7 @@ struct rcu_table {
> >>  	 * levels to improve cache locality for small index orders.
> >>  	 */
> >>  	struct cds_lfht_node *tbl[MAX_TABLE_ORDER];
> >> +#endif
> >>  };
> >>  
> >>  /*
> >> @@ -289,6 +297,16 @@ struct cds_lfht {
> >>  	pthread_attr_t *resize_attr;	/* Resize threads attributes */
> >>  	long count;			/* global approximate item count */
> >>  	struct ht_items_count *split_count;	/* split item count */
> >> +
> >> +#ifdef LFHT_MEMORY_CHUNK
> >> +	/*
> >> +	 * Contains the bucket node chunks. The size of each bucket node
> >> +	 * chunk is ->min_alloc_size(avoid to alloc chunks with different
> >> +	 * size). chunks improve cache locality for small index orders
> >> +	 * and improve lookup_bucket() for the archs without hardware fls().
> >> +	 */
> >> +	struct cds_lfht_node *tbl[0];
> >> +#endif
> >>  };
> >>  
> >>  /*
> >> @@ -753,6 +771,93 @@ unsigned long _uatomic_xchg_monotonic_increase(unsigned long *ptr,
> >>  	return old2;
> >>  }
> >>  
> >> +#ifdef LFHT_MEMORY_CHUNK
> >> +static
> >> +struct cds_lfht *_cds_lfht_alloc(unsigned long min_alloc_size,
> >> +		unsigned long max_size)
> >> +{
> >> +	struct cds_lfht *ht;
> >> +
> >> +	min_alloc_size = max(min_alloc_size, max_size / MAX_CHUNK_TABLE);
> >> +	ht = calloc(1, sizeof(struct cds_lfht) +
> >> +			sizeof(ht->tbl[0]) * (max_size / min_alloc_size));
> > 
> > Hrm, this first level of indirection quickly becomes much larger than
> > the order-based table. Have you measured gains with this approach ?
> > 
> > Can you detail the use-case for this approach ?
> 
> I guess some times max_bucket_size/normal_bucket_size is not very high,
> (I also guess it is quit real/often in practice)
> so the users can use bigger min_alloc_size(bigger chunk size) and smaller
> chunk pointer table.
> 
> (see "History" of memory management)
> 
> > 
> > Have you measured the overhead of software fls() compared to the
> > overhead of increases cache footprint of the larger table for the
> > chunk-based approach ?
> 
> Sorry, I also guessed. So I just said this management is simpler and better
> when there is no hardware fls(). not "better performance", let the users
> make the decision.
> 
> It seems it is very hard to measure.
> 
> "History" of memory management:
> 
> I was thinking what is not good when this rculfhash is merged into linux kernel.
> the bigger order memory allocation will be an obstacle. we can not allocate
> big memory in kernel, we have to allocate small pieces(pages), and we have two
> choices:
> 1) (v)map the pages as continuous order memory.
> 2) use discontinuous pages as chunks.
> 
> 2) ===========> I got this LFHT_MEMORY_CHUNK.
> 
> 1) ===========> since we need to do mapping, why not map them all continuous,
> 		and map them when demanded, I got LFHT_MEMORY_RESERVED.
> 		(it introduces a new obstacle, we require big memory space,
> 		 but 64BITS can provide it easier.)

Ah, I see. So chunk on 32-bit (with page allocation), and reserved on
64-bit. Yep, this makes sense. So having these as plugins would be
really interesting.

Thanks,

Mathieu

> 
> The paper of Ori Shalev and Nir Shavit are actually a special LFHT_MEMORY_RESERVED.
> 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com