Opened 9 years ago

Closed 9 years ago

#696 closed defect (fixed)

ccl and slime don't play nicely together (windows 64bit)

Reported by: laynor Owned by: gb
Priority: normal Milestone:
Component: Runtime (threads, GC) Version: trunk
Keywords: Cc:

Description

ccl often hangs when I TAB-complete something in the slime repl. I start the swank server doing

(push #p"path/to/slime/" asdf:*central-registry*) ;in my init file

(require :swank) ; these two from the repl in the cmd window
(swank:create-server)

in the slime repl, I can reliably reproduce the bug typing "swank" and then hitting Tab and Backspace at a fast pace. The same happens when I run ccl as an inferior lisp in emacs.

In both cases, i run ccl with -K utf-8.

Note that it's not slime that hangs, but ccl. When this bug occurs, I can't type on the repl in the "terminal" anymore. I have to close the terminal or hit Ctrl+Break loads of times to kill it.

For what is worth, my slime setup follows:

;;;;;;;;;;;;;;;;;;;
;;; Slime stuff ;;;
;;;;;;;;;;;;;;;;;;;


(add-to-list 'load-path (concat *my-site-lisp-directory* "slime/"))  ;;    load-path for slime
(add-to-list 'load-path (concat *my-site-lisp-directory* "slime/contrib/"))
(require 'slime)                                              ;;    
(setq slime-multiprocessing t)

(setq slime-autodoc-use-multiline-p t)
(setq slime-truncate-lines nil)
(set-language-environment "UTF-8")
(setq slime-net-coding-system 'utf-8-unix)  
(slime-setup '(slime-fancy slime-asdf))  

(setq slime-lisp-implementations
      '((clisp ("c:/Program Files (x86)/clisp-2.48/clisp.exe" "-K" "full"))
	(ccl64 ("C:/Users/ale/MySw/ccl/wx86cl64.exe" "-K" "utf-8"))
	(ccl ("C:/Users/ale/MySw/ccl/wx86cl.exe" "-K" "utf-8"))
	(sbcl ("sbcl"))))

Change History (5)

comment:1 Changed 9 years ago by gb

  • Component changed from Slime to Runtime (threads, GC)
  • Owner set to gb
  • Status changed from new to assigned

This almost certainly has nothing to do with SLIME per se; it's much more likely that it's another manifestation of whatever causes the deadlock described in ticket:677. When the GC is entered on some thread, it needs to suspend the execution of all other lisp threads and determine whether each suspended thread was executing lisp code (in which case the thread's stack and registers reference lisp objects that the GC must traverse) or foreign code (in which case the thread's stack and registers can't reference GCable lisp objects.) On Windows, a thread can be suspended while transitioning between these states (when entering or exiting from an exception handler); if the suspending thread detects that the suspended thread is in the process of entering an exception handler, it sets a bit in the target thread's private data, resumes the suspended thread, and waits on a semaphore; when the target thread enters the exception handler, it checks the bit, raises the semaphore, and waits to be told to resume.

For reasons unknown, it seems that it's possible for a thread to appear to be suspended while about to enter an exception handler but to actually be in some other (unknown) state. The suspending (GC) sets the bit and waits for the semaphore to be raised, but that never happens and everything deadlocks.

Using TAB completion SLIME could well be a good way to trigger this: that code conses substantially (causing the GC to be be invoked frequently) and SLIME/SWANK use a few long-lived (and occasional short-lived) threads as part of its implementation, and that combination of factors increases the likelyhood that a thread will be suspended while in this unknown state and that that'll result in deadlock.

Unfortunately, I haven't been able to use SLIME to reproduce the bug; I don't type very fast, but what I'd really need to do is just cause things to happen at the right (er, "wrong") time. The symptom's been reported often enough (in ticket:677 and elsewhere) that it's clear that there's a bug there and that the bug likely involves the scenario described above, but I'm not sure how to fix it unless/until I can reproduce it reliably and see what's going on in a debugger. So far, I haven't been able to use the tab/backspace technique to trigger the bug.

A likely workaround - which doesn't really fix anything but decreases the chance of being affected by the bug - is to turn off the ephemeral GC via:

? (ccl:egc nil)

comment:2 Changed 9 years ago by gb

Someone on the #ccl IRC channel ('maetbag', I don't know their real name) may have found the cause of this a little while ago; I checked in the changes that they suggested to the trunk.

If you can, could you please do an svn update and see if the problem persists ?

comment:3 Changed 9 years ago by laynor

Hi, it seems solved :-) Or at least, I wasn't able to trigger it again like I did before. I'll let you know if I encounter the problem again. Thanks for the fast fix!

comment:4 Changed 9 years ago by millejoh

And as a second (belated) data point, I ran some code that cons'ed fairly vigorously and would reliably trigger the deadlock (even without slime loaded) within a minute or so running on my puny Dell laptop and ... five minutes or so later and I still have yet to see a deadlock.

In case that last run-on is a little difficult to wade through, the executive summary: By golly, it seems to have worked!

comment:5 Changed 9 years ago by rme

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.