Opened 9 years ago

Closed 9 years ago

#809 closed enhancement (fixed)

non-lock-free hash table performance

Reported by: rme Owned by:
Priority: normal Milestone:
Component: Performance Version: trunk
Keywords: Cc:

Description (last modified by rme)

[reported on #ccl]

Before actually growing the hash table, %grow-hash-table checks to see whether there are any deleted elements in the table -- any at all, i.e. deleted-count > 0. If so, it does an in-place rehash instead, without resizing. This is presumably about linear time. However, it only frees up deleted-count slots in the table.

So suppose we have a table whose capacity is 100. If we insert 90 elements in the table, then alternate adding and removing new (distinct) elements, then we do a rehash every 10 or so of these adds. If we insert 99 elements and then begin the alternation, we're doing a rehash every time. Here are some benchmarks:

(defun run-with-percent-full (n)
  (let ((tab (make-hash-table :test #'eq :size 100 :lock-free nil)))
    (loop for i from 1 to n do
          (setf (gethash (cons i i) tab) i))
    (time (loop for i from 1 to 100000 do
                (let ((cons (cons i i)))
                  (setf (gethash cons tab) i)
                  (remhash cons tab))))))

(run-with-percent-full 0)   ;; 0.121 due to remhash slowness
(run-with-percent-full 1)   ;; 0.063
(run-with-percent-full 10)  ;; 0.058
(run-with-percent-full 20)  ;; 0.062
(run-with-percent-full 30)  ;; 0.059
(run-with-percent-full 40)  ;; 0.066
(run-with-percent-full 50)  ;; 0.068
(run-with-percent-full 60)  ;; 0.069
(run-with-percent-full 70)  ;; 0.070
(run-with-percent-full 80)  ;; 0.085
(run-with-percent-full 85)  ;; 0.095
(run-with-percent-full 90)  ;; 0.114
(run-with-percent-full 95)  ;; 0.171
(run-with-percent-full 96)  ;; 0.208
(run-with-percent-full 97)  ;; 0.262
(run-with-percent-full 98)  ;; 0.384
(run-with-percent-full 99)  ;; 0.728 !!
(run-with-percent-full 100) ;; 0.069 (resized at first alternation, presumably.)

It seems with bigger tables the slowdown is about the same at a given percentage, but it can get much worse with, say, all but 1 slot filled.

It probably would make more sense to grow the table somewhat even if there are a few deleted elements, unless there are so many that it doesn't make sense to grow it. Probably what makes sense is to grow the table to (# of non-deleted elements * rehash-size) unless that's less than or equal(ish) to the current size. Or maybe there's some threshold at which the target size is so close to the current size that it's not worth growing instead of rehashing.

Anyway, probably for now we should have the fal-ht lock-free, since it doesn't seem to have these problems. Actually, it seems like lock-free tables go even further than my suggestion: I think their grow-hash-table will actually shrink a table if it has enough deleted elements (haven't looked at the code, though.)

Change History (3)

comment:2 Changed 9 years ago by rme

  • Description modified (diff)

comment:3 Changed 9 years ago by gb

  • Resolution set to fixed
  • Status changed from new to closed

(In [14625]) REMHASH: it's not our job to effectively rehash the table when the last key's removed. (That should happen in grow/rehash code, anyway.)

%GROW-HASH-TABLE now calls new function %GROW-HASH-TABLE-IN-PLACE-P to decide whether or not to grow the table or do an in-place rehash. We might want to make this customizable; the default implementation suggests an in-place rehash if there are at least half as many deleted entries, and that seems to give good results for the test in ticket:809.

Fixes ticket:809, AFAICT.

Note: See TracTickets for help on using tickets.