Ticket #717 (closed defect: fixed)

Opened 4 years ago

Last modified 22 months ago

gethash causes infinite rehash loop

Reported by: wws Owned by: gz
Priority: normal Milestone:
Component: Runtime (threads, GC) Version: unspecific
Keywords: Cc: svspire@…

Description (last modified by rme) (diff)

Our busy server (6 processes pulling and parsing XML feeds plus Hunchentoot serving web pages), got an infinite loop while interning a string.

We've seen this only once after a week of running this code 24/7, so it's likely hard to reproduce.

(lisp-implementation-version)
"Version 1.5-r13695M  (LinuxX8664)"

Relevant code:

(defun make-weak-hash-table (&rest rest)
  (declare (dynamic-extent rest))
  #-ccl                                                                         
  (apply #'make-hash-table rest)
  #+ccl
  (apply #'make-hash-table :weak :value rest))

(defvar *equal-object-hash*
  (make-weak-hash-table :test 'equal))

(defun equal-object-p (object)
  (or (stringp object)
      (and (numberp object) (not (fixnump object)))))

(defun intern-equal-object (object)
  (cond ((equal-object-p object)
         (or (gethash object *equal-object-hash*)
             (setf (gethash object *equal-object-hash*) object)))
        (t object)))

Error: Stack overflow: "CCL has entered the kernel debugger, due to an exception 11"

Backtrace:

(#x00007F7502647CA0) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
(#x00007F7502647CE8) #x00003000000B2F9C : #<Function %LOCK-FREE-REHASH #x00003000000B2F0F> + 141
(#x00007F7502647D40) #x00003000000B29F4 : #<Function LOCK-FREE-REHASH #x00003000000B291F> + 213
(#x00007F7502647D80) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
(#x00007F7502647DC8) #x00003000000B2F9C : #<Function %LOCK-FREE-REHASH #x00003000000B2F0F> + 141
(#x00007F7502647E20) #x00003000000B29F4 : #<Function LOCK-FREE-REHASH #x00003000000B291F> + 213
(#x00007F7502647E60) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
(#x00007F7502647EA8) #x00003000000B2F9C : #<Function %LOCK-FREE-REHASH #x00003000000B2F0F> + 141
(#x00007F7502647F00) #x00003000000B29F4 : #<Function LOCK-FREE-REHASH #x00003000000B291F> + 213
(#x00007F7502647F40) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
[...]
(#x00007F7502655680) #x00003000000960B4 : #<Function GETHASH #x0000300000095E4F> + 613
(#x00007F75026556D0) #x000030200297F85C : #<Function INTERN-EQUAL-OBJECT #x000030200297F7CF> + 141
(#x00007F75026556E8) #x0000302002B0659C : #<Function (:INTERNAL BUILD-FN12015 BWIN-XML-PARSER) #x0000302002B0654F> + 77
(#x00007F7502655748) #x0000302002A09714 : #<Function MAYBE-BUILD-GAME-INTERNAL #x0000302002A096AF> + 101
(#x00007F7502655780) #x0000302002B07974 : #<Function (:INTERNAL ***-XML-PARSER) #x0000302002B0750F> + 1125
(#x00007F7502655828) #x0000302002A8D9FC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 493
(#x00007F7502655888) #x0000302002A8E1AC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 2461
(#x00007F7502655948) #x0000302002A8E1AC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 2461
(#x00007F7502655A08) #x0000302002A8E1AC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 2461
(#x00007F7502655AC8) #x0000302002A8C9FC : #<Function DOM-TRAVERSAL #x0000302002A8C82F> + 461
(#x00007F7502655B60) #x0000302002A04A04 : #<Function PROCESS-FEED-INTERNAL #x0000302002A043BF> + 1605
(#x00007F7502655C30) #x0000302002A05D04 : #<Function PROCESS-FEED #x0000302002A05A6F> + 661
(#x00007F7502655CB0) #x00003020029FC43C : #<Function (:INTERNAL PROCESS-NEXT-FEED) #x00003020029FB7AF> + 3213
(#x00007F7502655D50) #x0000302002414C24 : #<Function (:INTERNAL MAKE-TASK) #x000030200241471F> + 1285
(#x00007F7502655DE8) #x000030200240BE3C : #<Function (:INTERNAL NEW-WORKER-THREAD) #x000030200240BADF> + 861
(#x00007F7502655E68) #x0000302000F1EFDC : #<Function (:INTERNAL BINDING-DEFAULT-SPECIALS) #x0000302000F1ECBF> + 797
(#x00007F7502655EB0) #x00003000004718E4 : #<Function RUN-PROCESS-INITIAL-FORM #x000030000047160F> + 725
(#x00007F7502655F48) #x0000300000472204 : #<Function (:INTERNAL (%PROCESS-PRESET-INTERNAL (PROCESS))) #x000030000047207F> + 389
(#x00007F7502655F98) #x000030000044F66C : #<Function (:INTERNAL THREAD-MAKE-STARTUP-FUNCTION) #x000030000044F53F> + 301

Change History

comment:1 Changed 4 years ago by wws

Here's the backtrace with line breaks:

(#x00007F7502647CA0) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
(#x00007F7502647CE8) #x00003000000B2F9C : #<Function %LOCK-FREE-REHASH #x00003000000B2F0F> + 141
(#x00007F7502647D40) #x00003000000B29F4 : #<Function LOCK-FREE-REHASH #x00003000000B291F> + 213
(#x00007F7502647D80) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
(#x00007F7502647DC8) #x00003000000B2F9C : #<Function %LOCK-FREE-REHASH #x00003000000B2F0F> + 141
(#x00007F7502647E20) #x00003000000B29F4 : #<Function LOCK-FREE-REHASH #x00003000000B291F> + 213
(#x00007F7502647E60) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
(#x00007F7502647EA8) #x00003000000B2F9C : #<Function %LOCK-FREE-REHASH #x00003000000B2F0F> + 141
(#x00007F7502647F00) #x00003000000B29F4 : #<Function LOCK-FREE-REHASH #x00003000000B291F> + 213
(#x00007F7502647F40) #x00003000000B2294 : #<Function LOCK-FREE-COUNT-ENTRIES #x00003000000B220F> + 133
[...]
(#x00007F7502655680) #x00003000000960B4 : #<Function GETHASH #x0000300000095E4F> + 613
(#x00007F75026556D0) #x000030200297F85C : #<Function INTERN-EQUAL-OBJECT #x000030200297F7CF> + 141
(#x00007F75026556E8) #x0000302002B0659C : #<Function (:INTERNAL BUILD-FN12015 BWIN-XML-PARSER) #x0000302002B0654F> + 77
(#x00007F7502655748) #x0000302002A09714 : #<Function MAYBE-BUILD-GAME-INTERNAL #x0000302002A096AF> + 101
(#x00007F7502655780) #x0000302002B07974 : #<Function (:INTERNAL ***-XML-PARSER) #x0000302002B0750F> + 1125
(#x00007F7502655828) #x0000302002A8D9FC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 493
(#x00007F7502655888) #x0000302002A8E1AC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 2461
(#x00007F7502655948) #x0000302002A8E1AC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 2461
(#x00007F7502655A08) #x0000302002A8E1AC : #<Function DOM-TRAVERSAL-HELPER #x0000302002A8D80F> + 2461
(#x00007F7502655AC8) #x0000302002A8C9FC : #<Function DOM-TRAVERSAL #x0000302002A8C82F> + 461
(#x00007F7502655B60) #x0000302002A04A04 : #<Function PROCESS-FEED-INTERNAL #x0000302002A043BF> + 1605
(#x00007F7502655C30) #x0000302002A05D04 : #<Function PROCESS-FEED #x0000302002A05A6F> + 661
(#x00007F7502655CB0) #x00003020029FC43C : #<Function (:INTERNAL PROCESS-NEXT-FEED) #x00003020029FB7AF> + 3213
(#x00007F7502655D50) #x0000302002414C24 : #<Function (:INTERNAL MAKE-TASK) #x000030200241471F> + 1285
(#x00007F7502655DE8) #x000030200240BE3C : #<Function (:INTERNAL NEW-WORKER-THREAD) #x000030200240BADF> + 861
(#x00007F7502655E68) #x0000302000F1EFDC : #<Function (:INTERNAL BINDING-DEFAULT-SPECIALS) #x0000302000F1ECBF> + 797
(#x00007F7502655EB0) #x00003000004718E4 : #<Function RUN-PROCESS-INITIAL-FORM #x000030000047160F> + 725
(#x00007F7502655F48) #x0000300000472204 : #<Function (:INTERNAL (%PROCESS-PRESET-INTERNAL (PROCESS))) #x000030000047207F> + 389
(#x00007F7502655F98) #x000030000044F66C : #<Function (:INTERNAL THREAD-MAKE-STARTUP-FUNCTION) #x000030000044F53F> + 301

comment:2 Changed 4 years ago by wws

  • Owner set to gz

l0-hash.lisp hasn't changed since @13279, so the fact that I'm a little behind on my updates shouldn't matter. It hasn't even changed on trunk, except for a one-line change in @14119 adding arm-target to a read-time conditional.

Gail will probably grok this more quickly than I, since she wrote the non-locking hash table code, so I'm assigning it to her.

comment:3 Changed 4 years ago by gz

Do you happen to still have the full backtrace? I'm interested in just how the GETHASH transitions into the infinite loop. It's very curious that GETHASH is even on the stack at all, as the first thing it does is tail-call LOCK-FREE-GETHASH...

comment:4 Changed 4 years ago by gz

FWIW, GETHASH is on the stack because the call to LOCK-FREE-GETHASH is inlined.

comment:5 Changed 3 years ago by gz

  • Component changed from IDE to Runtime (threads, GC)

From Gary, the root cause of above:

? (setq ccl::*gc-event-status-bits* 4)
4
? (let* ((ht (make-hash-table :weak :value)))
   (dotimes (i 8)
    (process-run-function (format nil "test~d" i)
                          (lambda ()
                            (dotimes (i 10000000) (setf (gethash i ht) (cons nil nil)))
                            (format t "~&~s" *current-process*)))))
                            
NIL
;;; After about 1 minute ...
? unmarked object being forwarded!

? for help
[32590] Clozure CL kernel debugger: 

comment:6 Changed 3 years ago by svspire

  • Cc svspire@… added

comment:7 Changed 3 years ago by rme

  • Description modified (diff)

comment:8 Changed 3 years ago by gb

  • Status changed from new to closed
  • Resolution set to fixed

r14723 tries to fix the symptom (by not processing weak-on-value hash vectors in the EGC.) In the commit message, I erroneously referred to ticket 817 instead of this ticket.

With that change in effect, the test case above runs to completion.

I'll close this since it seems to fix the symptom; I may open another ticket to address the underlying problem.

comment:9 Changed 3 years ago by gb

See ticket:852.

comment:10 Changed 22 months ago by gz

See ticket:993, which has a test case that shows the same symptoms with (non-weak) lock-free hash tables.

Note: See TracTickets for help on using tickets.