Changes between Version 1 and Version 2 of Internals/LockFreeHashTables


Ignore:
Timestamp:
Sep 21, 2008, 1:31:01 PM (11 years ago)
Author:
gz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Internals/LockFreeHashTables

    v1 v2  
    1919
    2020 ||State      ||Key               ||Value
    21  ||DELETED1   ||object            ||free-marker ||
    22  ||DELETED2   ||deleted-marker    ||free-marker ||
     21 ||DELETED1   ||object            ||`free-marker` ||
     22 ||DELETED2   ||`deleted-marker`  ||`free-marker` ||
    2323 ||IN-USE     ||object            ||object ||
    24  ||FREE       ||free-marker       ||free-marker ||
    25  ||REHASHING  ||object            ||rehashing-marker ||
    26  ||REHASHING  ||free-marker       ||rehashing-marker ||
    27  ||REHASHING  ||deleted-marker    ||rehashing-marker ||
     24 ||FREE       ||`free-marker`     ||`free-marker` ||
     25 ||REHASHING  ||object            ||`rehashing-marker` ||
     26 ||REHASHING  ||`free-marker`     ||`rehashing-marker` ||
     27 ||REHASHING  ||`deleted-marker`  ||`rehashing-marker` ||
    2828
    2929
    3030No other states are allowed - at no point in time can a hash table entry be in any
    3131other state.   In addition, the only transitions allowed on the key slot are
    32 free-marker -> object/deleted-marker -> deleted-marker.  Once a key slot
    33 is claimed, it must never change to free or another key value (even after the hash
     32`free-marker` -> object/`deleted-marker` -> `deleted-marker`.  Once a key slot
     33is claimed, it must never change to free or another key value, and once it's marked
     34deleted, it cannot be reused.  All this is true even after the hash
    3435vector has been discarded after rehashing -- because some process might still
    35 be looking at it).[[br]]
    36 In particular, rehashing in place is not an option.  All rehashing creates a new
    37 vector and copies into it.  This means it's kinda risky to use lock-free hash
    38 tables with address-based keys, because they will thrash in low-memory situations,
    39 but we don't disallow it because a particular use might not have this problem.
     36be looking at it.  In particular this means rehashing in place is not an option.
     37All rehashing creates a new vector and copies into it.  This means it's kinda risky to use lock-free hash tables with address-based keys, because they will thrash in low-memory situations, but we don't disallow it because a particular use might not have this problem.
     38
     39In addition to the state, a hash table maintains a count of available entries, i.e. number of new entries that can be added to the table.  This is initialized each time the table is resized, based on the size of the table and the the rehash threshold specified by the user, and then decremented as entries are added.  The count maintained is always less than or equal to the actual number of available entries - it doesn't account for deletions, and it might be decremented for an addition that doesn't actually happen.  This means the table might be rehashed before strictly necessary, but it will never overflow.
    4040
    4141The following operations may take place:
    4242
    43  * gethash: find matching key - if no match, return not found.  Else fetch value, if value is rehashing-marker then maybe-rehash and try again; if value is free-marker, return not found, else return found value.
     43 * gethash: find matching key - if no match, return not found.  Else fetch value, if value is `rehashing-marker` then maybe-rehash and try again; if value is `free-marker`, return not found, else return found value.
    4444
    4545 * puthash: find matching key or FREE slot.
    46   * If found key, fetch value. if value is rehashing-marker then maybe-rehash and try again; else store-conditional the value -> new value, if fails try again.
    47   * Else have FREE slot, store-key-conditional free-marker -> key, and if that succeeds, store-conditional free-marker -> new value, if either fails, maybe-rehash and try again.
     46  * If found key, fetch value. if value is `rehashing-marker` then maybe-rehash and try again; else store-conditional the value -> new value, if fails try again.
     47  * Else have FREE slot, atomically decrement available count, if negative then maybe-rehash, else store-key-conditional `free-marker` -> key, and if that succeeds, store-conditional `free-marker` -> new value, if either fails, maybe-rehash and try again.
    4848
    49  * remhash: find matching key - if no match, done.  Else fetch value, if value is rehashing-marker then maybe-rehash and try again; else store-conditional the value -> free-marker, if fails try again.
     49 * remhash: find matching key - if no match, done.  Else fetch value, if value is `rehashing-marker` then maybe-rehash and try again; else store-conditional the value -> `free-marker`, if fails try again.
    5050
    51  * rehash: grab a lock, estimate number of entries, make a new vector.  loop over old vector, at each entry fetch the old value with atomic swap of rehashing-marker.  This blocks any further state changes involving the value.  It doesn't block state changes involving the key, but the only ones that can happen is FREE -> DELETED, and DELETED1 <-> DELETED2, all of which are equivalent from the point of view of rehashing.  Anyway, if the old value was rehashing-marker then bug (because we have a lock).  If the old value is free-marker then do nothing, else get the entry key and rehash into the new vector -- if no more room, start over.  When done, store the new vector in the hash table and release lock.
     51 * maybe-rehash: grab a lock (which will wait for any concurrent rehashing to complete), check whether the table still needs rehashing (either the available count is less than 0 or some address-based keys may have moved), if not, release the lock and return.  Else rehash: estimate number of entries, make a new vector.  Loop over the old vector, at each entry fetch the old value with atomic swap of `rehashing-marker`.  This blocks any further state changes involving the value.  It doesn't block state changes involving the key, but the only ones that can happen is FREE -> DELETED, and DELETED1 <-> DELETED2, all of which are equivalent from the point of view of rehashing.  Anyway, if the old value was `rehashing-marker` then bug (because we have a lock).  If the old value is `free-marker` then do nothing, else get the entry key and rehash into the new vector -- if no more room, start over.  When done, store the new available count and then store the new vector in the hash table and release lock.
    5252
    53  * gc: for weak tables, gc may convert IN-USE states to DELETED2 states.  For all lock-free hash tables (weak or not) gc converts DELETED1 states to DELETED2 - gc can do that because the entire gc takes place atomically wrt the lisp runtime.  REMHASH cannot do that without violating the invariants.
     53 * gc: for weak tables, gc may convert IN-USE states to DELETED2 states.  For all lock-free hash tables (weak or not) gc converts DELETED1 states to DELETED2 - gc can do that because the entire gc takes place atomically wrt the lisp runtime.  REMHASH cannot do it without violating the invariants.
    5454
     55It's left as an exercise to the reader to prove that this results in GETHASH always getting the correct value, no matter what other threads are doing.
    5556