<html><body><div style="font-family: arial, helvetica, sans-serif; font-size: 12pt; color: #000000"><div><br></div><div><br></div><span id="zwchr" data-marker="__DIVIDER__">----- On Mar 27, 2018, at 5:55 AM, Denis Yevgenyevich <denis.yevgenyevich@gmail.com> wrote:<br></span><div data-marker="__QUOTED_TEXT__"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div><div><div>> The "accounting" option is meant to be enabled, otherwise you're using the hash table only with</div>> the bucket-length resize threshold, which has the imperfections you noticed<br></div>Well, given that my software has to maintain precise counts of items inserted into some of hashtables,<br>I'd like to be able to avoid any performance overhead due to accounting.<br></div><div><br><div><div>> only for small tables. Beyond a certain size, the numbers of elements in the hash table are used<br><div>> as a criterion for resize.<br></div><div>> Why is it unexpected that the table stops growing beyond 4096 after 900 insertions ?<br></div><div>Ah, thanks. Turns out that the table stopped growing because of the check<br> if (count >= (1UL << (COUNT_COMMIT_ORDER + split_count_order)))</div></div></div></div></div></blockquote><div><br></div><div>If both (CDS_LFHT_ACCOUNTING | CDS_LFHT_AUTO_RESIZE) flags are set, the table will<br data-mce-bogus="1"></div><div>still grow after it has gone beyond this condition.<br></div><div><br data-mce-bogus="1"></div><div>When both flags are set, check_resize() is only for the bucket-local chain length algorithm that deals</div><div>with very small tables. Then, when the table size goes above that threshold, split-counters are used to</div><div>adjust the required table size. We don't use the split-counters approximation of the number of elements</div><div>in the table for small tables because they are not accurate enough for small tables. It's a tradeoff</div><div>between accuracy and accounting overhead here.<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Notice that ht_count_add() and ht_count_del() both invoke cds_lfht_resize_lazy_count(). It's that<br data-mce-bogus="1"></div><div>function which is responsible for the split-counter based resize algorithm.<br data-mce-bogus="1"></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div><div><div><div><br></div><div><br>I am a bit disappointed about that. Why can't we have tables that resize<br>1. automatically<br></div><div>2. from 10s to 100,000s of buckets<br></div><div>3. without becoming too large too soon?</div></div></div></div></div></blockquote><div>The only thing here is that the bucket-length algorithm used for small tables can grow the</div><div>table more quickly than you would expect if you have hash collisions. But it should not<br></div><div>matter that much because it only applies to small tables. Do you experience "too aggressive"</div><div>hash table growth even beyond the threshold selecting the split-counters resize algorithm ?<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>That threshold is calculated as:<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>(1UL << (COUNT_COMMIT_ORDER + split_count_order)<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Where COUNT_COMMIT_ORDER = 10<br data-mce-bogus="1"></div><div>and split_count_order = log2(nr_cpus ceiled to next power of 2)<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Assuming a 64-core machine:</div><div><br data-mce-bogus="1"></div><div>log2(64) = 6<br data-mce-bogus="1"></div><div>2 ^ (10 + 6) = 65536<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Considering that you created your hash table with an initial size of 16 buckets, specified a minimum<br data-mce-bogus="1"></div><div>number of buckets of 16, and a maximum number of buckets of 1 << 16 (65536), you'll always be</div><div>using the "hash-chain length" algorithm for resize on a 64-core machine. The split-counter algorithm<br></div><div>would never kick in.<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>You should therefore try tweaking the @max_nr_buckets parameter to a larger value when</div><div>calling cds_lfht_new().<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>How many cores do you have ?<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Let me know how is goes,<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Thanks,<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><div>Mathieu<br data-mce-bogus="1"></div><div><br data-mce-bogus="1"></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><div dir="ltr"><div><div><div><div><br></div><br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 26 March 2018 at 18:32, Mathieu Desnoyers <span dir="ltr"><<a href="mailto:mathieu.desnoyers@efficios.com" target="_blank">mathieu.desnoyers@efficios.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:arial,helvetica,sans-serif;font-size:12pt;color:#000000"><div><span id="m_-6782962235010934370zwchr">----- On Mar 26, 2018, at 11:01 AM, Denis Yevgenyevich <<a href="mailto:denis.yevgenyevich@gmail.com" target="_blank">denis.yevgenyevich@gmail.com</a>> wrote:<br></span></div><div><blockquote style="border-left:2px solid #1010ff;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div><div><div><div>Hello,<br><br>I think rculfhash autoresizing (as enabled by CDS_LFHT_AUTO_RESIZE) does not behave well. Each time _cds_lfht_add finds a hash bucket with more than 2 items, it basically tries to quadruple the hash table size. But even with a perfect hash function hash collisions are not that rare, given a large enough table. That is, every now and then the algorithm will come across a 3-long chain and will grow the table unnecessarily.<br><br>I created a table with cds_lfht_new(16, 16, 1<<16, CDS_LFHT_AUTO_RESIZE, NULL) and began adding an item each 10ms. Each item had a unique key and the hash function was very good (full 64-bit values of Siphash). Judging by messages I added to rculfhash.c, this was happening:<br></div>after about 100 insertions, the table was resized to 512<br>after about 500 insertions, the table was resized to 4096<br></div>after about 1300 insertions, the table was resized to 16384 and immediately to 32768<br><br></div><div>With CDS_LFHT_AUTO_RESIZE|CDS_LFHT_ACCOUNTING the behaviour was different but still unacceptable: the table got resized to 4096 after 900 insertions and then stopped growing!</div></div></div></blockquote><br><div>The "accounting" option is meant to be enabled, otherwise you're using the hash table only with</div><div>the bucket-length resize threshold, which has the imperfections you noticed.<br></div><br><div>With accounting enabled, the bucket length is used as an approximation for when resize is needed<br></div><div>only for small tables. Beyond a certain size, the numbers of elements in the hash table are used<br></div><div>as a criterion for resize.<br></div><br><div>Why is it unexpected that the table stops growing beyond 4096 after 900 insertions ? How many<br></div><div>more insertions did you do beyond 900 ?<br></div><br><div>Thanks,<br></div><br><div>Mathieu</div><blockquote style="border-left:2px solid #1010ff;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><div dir="ltr"><div><div><br><br></div>Could someone please implement saner resize conditions, or better yet, add a customisable hook for decisions as to when and how much should a table grow?<br><br></div>Cheers, Denis.<br></div><br>_______________________________________________<br>lttng-dev mailing list<br><a href="mailto:lttng-dev@lists.lttng.org" target="_blank">lttng-dev@lists.lttng.org</a><br><a href="https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev" target="_blank">https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev</a><span class="HOEnZb"><span data-mce-style="color: #888888;" style="color: #888888;" color="#888888"><br></span></span></blockquote></div><span class="HOEnZb"><span data-mce-style="color: #888888;" style="color: #888888;" color="#888888"><br><div>-- <br></div><div>Mathieu Desnoyers<br>EfficiOS Inc.<br><a href="http://www.efficios.com" target="_blank">http://www.efficios.com</a><br data-mce-bogus="1"></div></span></span></div></div></blockquote></div></div><br></blockquote></div><br><div data-marker="__SIG_POST__">-- <br></div><div>Mathieu Desnoyers<br>EfficiOS Inc.<br>http://www.efficios.com</div></div></body></html>