[lttng-dev] urcu/rculist.h clarifications - for implementing LRU

Ondřej Surý ondrej at sury.org
Sat Mar 11 01:04:26 EST 2023


Hey,

so, we are integrating userspace-rcu to BIND 9 (yay!) and as experiment,
I am rewriting the internal address database (keeps the infrastructure
information about names and addresses).

There's a hashtable to keep the entries and there's associated LRU list.

For the hashtable the cds_lfht seems to work well, but I am kind of struggling
with cds_list (the urcu/rculist.h variant).

The names and entries works in pretty much similar way, so I am going to
describe just one.

The workhorse is get_attached_and_locked_name() function (I am going to
skip the parts where we create keys, checks if LRU needs to be updated, etc.)

So this is part with the hashtable lookup which seems to work well:

        rcu_read_lock();

        struct cds_lfht_iter iter;
        struct cds_lfht_node *ht_node;

        cds_lfht_lookup(adb->names_ht, hashval, names_match, &key, &iter);

        ht_node = cds_lfht_iter_get_node(&iter);

        bool unlink = false;

        if (ht_node == NULL) {
                /* Allocate a new name and add it to the hash table. */
                adbname = new_adbname(adb, name, start_at_zone);

                ht_node = cds_lfht_add_unique(adb->names_ht, hashval,
                                              names_match, &key,
                                              &adbname->ht_node);
                if (ht_node != &adbname->ht_node) {
                        /* ISC_R_EXISTS */
                        destroy_adbname(adbname);
                        adbname = NULL;
                }
        }
        if (adbname == NULL) {
                INSIST(ht_node != NULL);
                adbname = caa_container_of(ht_node, dns_adbname_t, ht_node);
                unlink = true;
        }

        dns_adbname_ref(adbname);

        rcu_read_unlock();

and here's the part where LRU gets updated:

        LOCK(&adbname->lock); /* Must be unlocked by the caller */
        if (NAME_DEAD(adbname)) {
                UNLOCK(&adbname->lock);
                dns_adbname_detach(&adbname);
                goto again;
        }

        if (adbname->last_used + ADB_CACHE_MINIMUM <= last_update) {
                adbname->last_used = now;

                LOCK(&adb->names_lru_lock);
                if (unlink) {
                        cds_list_del_rcu(&adbname->list_node);
                }
                cds_list_add_tail_rcu(&adbname->list_node, &adb->names_lru);
                UNLOCK(&adb->names_lru_lock);
        }

The NAME_DEAD gets updated under the adbname->lock in expire_name():

        if (!NAME_DEAD(adbname)) {
                adbname->flags |= NAME_IS_DEAD;

                /* Remove the adbname from the hashtable... */
                (void)cds_lfht_del(adb->names_ht, &adbname->ht_node);

                /* ... and LRU list */
                LOCK(&adb->names_lru_lock);
                cds_list_del_rcu(&adbname->list_node);
                UNLOCK(&adb->names_lru_lock);
        }

So, now the problem is that sometimes I get a crash under load:

(gdb) bt
#0  0x00007fae87a34c96 in cds_list_del_rcu (elem=0x7fae37e78880) at /usr/include/x86_64-linux-gnu/urcu/rculist.h:71
#1  get_attached_and_locked_name (adb=adb at entry=0x7fae830142a0, name=name at entry=0x7fae804fc9b0, start_at_zone=true, now=<optimized out>) at adb.c:1446
#2  0x00007fae87a392bf in dns_adb_createfind (adb=0x7fae830142a0, loop=0x7fae842c3a20, cb=cb at entry=0x7fae87b28d9f <fctx_finddone>, cbarg=0x7fae7c679000, name=name at entry=0x7fae804fc9b0, qname=0x7fae7c679010, qtype=1, options=63, now=<optimized out>, target=0x0, port=53, depth=1, qc=0x7fae7c651060, findp=0x7fae804fc698) at adb.c:2149

(gdb) frame 0
#0  0x00007fae87a34c96 in cds_list_del_rcu (elem=0x7fae37e78880) at /usr/include/x86_64-linux-gnu/urcu/rculist.h:71
71		elem->next->prev = elem->prev;
(gdb) print elem->next
$1 = (struct cds_list_head *) 0x0
(gdb) print elem
$2 = (struct cds_list_head *) 0x7fae37e78880

So, I suspect, I am doing something wrong when updating the position of the the name in the LRU list.

There are couple of places where we iterate through the LRU list (overmem cleaning can kick-in, the user initiated cleaning can start, shutdown can be happening...)

Is there perhaps already some LRU implementation using Userspace-RCU that I can take look at?

Thank you!
Ondrej
--
Ondřej Surý (He/Him)
ondrej at sury.org



More information about the lttng-dev mailing list