Opened 9 years ago

Closed 9 years ago

#669 closed defect (fixed)

Frequent but unpredictable terminating in Win32

Reported by: fusss Owned by: gb
Priority: normal Milestone:
Component: Slime Version: 1.4
Keywords: Cc:

Description (last modified by gb)

Hi,

CCL terminates inexplicably and randomly in Win32. It leave no tangible error messages behind, except for:


Process inferior-lisp exited abnormally with code 5

(progn (load "c:/slime-2009-10-15/swank-loader.lisp" :verbose t) (funcall (read-from-string "swank-loader:init")) (funcall (read-from-string "swank:start-server") "c:/DOCUME~1/bob/LOCALS~1/Temp/slime.5340" :coding-system "utf-8-unix"))

*Messages*: Lisp connection closed unexpectedly: connection broken by remote peer


That's all I get. I have upgraded to a later version, and it keeps happening still. The versions affected are

Version 1.4-RC1-r13031 (WindowsX8632) and Version 1.4-r13122 (WindowsX8632)

It happens under various conditions. Not only when I am hacking on elaborate projects with threads and FFI, but also when using the repl as a simple arithmetic calculator.

I am not able to test this without Emacs/Slime?, it's under active use that this issue comes up, and I can't use the naked CLI version for more than 1 minute.

I am willing to build a debug version of CCL and send more info if this issue is being looked at actively.

Regards. '

Change History (4)

comment:1 Changed 9 years ago by rme

I've tried a little bit to duplicate this, but I'm not having any luck.

I wonder if you're seeing the problem described in ticket:649. If so, r13431 tries to address that. That change is only in the trunk; could you try a trunk lisp to see if that solves the issue for you?

comment:2 Changed 9 years ago by gb

  • Description modified (diff)

One other windows thread-related bug that was reported and fixed around the same time had to do with stack-overflow detection; something like:

(defun foo (x) (abs (foo x)))

(foo 0)

would crash (after a few seconds of Windows trying to run an exception handler on an overflowed stack) rather than being detected and reported before the stack is completely exhausted.

That was also fixed in r13431; the source to the fix (some changes in the kernel) seem to have been checked into the 1.4 branch, but it doesn't look like I remembered to check in an updated wx86cl.exe. (We usually do this, since many Windows users don't have toolchains installed, but it looks like I forgot.)

If a simple stack overflow test like FOO above goes undetected for you, then that might be related to the problem. (The next issue would be to try to determine why the stack's overflowing; that'd be easier to do if the lisp catches the overflow and a lisp condition gets signaled.)

I don't know for sure if this is the problem, but most other things (random memory accesses, etc.) that can cause an exception either get reported to lisp or cause the kernel debugger to be entered. I don't remember how a hard stack overflow gets reported (whether as "error 5" or something else), but it's one of the few things that I can think of that bypasses CCL's own exception-handling mechanisms.

(If I try to run the released 1.4 kernel under Cygwin, the stack overflow test causes a long pause, then causes the lisp to start up again; I assume that that's some sort of Cygwin nonsense. If I do it in a Windows command prompt, I get a dialog box whose "details" pane shows an "exception code" of c0000005. Hmm.)

comment:3 Changed 9 years ago by rme

I checked up-to-date Windows lisp kernel binaries into the 1.4 release branch.

Doing "svn up" should get these binaries. Then, start up, do (rebuild-ccl :clean t), and you should end up with a 100% up-to-date version of ccl-1.4.

comment:4 Changed 9 years ago by fusss

  • Resolution set to fixed
  • Status changed from new to closed

I have SVN'ed up as suggested by rme and for the entire day, nearly 9 hours, it hasn't crashed.

I can safely say this issue has been resolved for me, and I am both impressed and grateful for the professionalism you have shown guys.

Now my lisp-implementation-version is "Version 1.4 (WindowsX8632)"; I am guessing it was something that was repaired in the recent pass. I will let you guys close this ticket issue, right now I will just mark it as "fixed".

Thanks rme and gbyers.

Note: See TracTickets for help on using tickets.