Ticket #464 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

GC problem in standalone Windows executables

Reported by: hrapof Owned by: gb
Priority: critical Milestone:
Component: Runtime (threads, GC) Version: 1.3
Keywords: GC Windows Cc:

Description

With CCL "Version 1.2-r11583M (LinuxPPC32)" my test

(defun test ()

(format t "Hello, world!"))

(defun make ()

(save-application "tst.exe" :toplevel-function #'test

:error-handler :quiet :prepend-kernel t))

happily prints "Hello, world!" and exits.

But with Windows 32bit CCL fresh from SVN (updated and rebuilt) it prints

Hello, world!

Error: value NIL is not of the expected type CCL::RECURSIVE-LOCK. While executing: CCL::RECURSIVE-LOCK-PTR, in process Initial(0). Type :POP to abort, :R for a list of available restarts. Error: value NIL is not of the expected type CCL::RECURSIVE-LOCK. While executing: CCL::RECURSIVE-LOCK-PTR, in process Initial(0). Type :POP to abort, :R for a list of available restarts. Error: value NIL is not of the expected type CCL::RECURSIVE-LOCK.

.... ....

and then hangs, eating CPU.

Further tinkering showed that my TEST function could be as simple as

(defun test () (ccl:gc))

Called from REPL - everything is OK; called as top-level function in a exe - program busy-waits for something.

Also, CCL prints at start-up

Can't get desired heap address at 0x4000000

May it have something to do with this GC problem? In REPL everything is OK, though: it succefully finishes a hour-long computation.

Windows: XP SP2. CPU: Intel Core Duo.

Change History

comment:1 follow-up: ↓ 4 Changed 6 years ago by gb

  • Status changed from new to closed
  • Resolution set to fixed

The first part of this - the attempt to lock NIL, repeatedly - was caused by bugs in some functions including the one used to implement WRITE-CHAR with newline->CRLF translation on shared streams; the functions were erroneously trying to lock the stream's input lock (which was NIL) instead of its output lock. This affected the stream that was bound to *ERROR-OUTPUT*, so attempts to report the error triggered the error. All of this was ordinarily masked by the fact that the default toplevel-function accidentally set the standard stream's line-termination to :UNIX when setting their character encoding, and it was triggered in your example by the fact that code called by QUIT does a FRESH-LINE and a FORCE-OUTPUT on the standard output stream just before exiting; the FRESH-LINE (when you had some buffered output) wanted to write a #\newline, and that tried to lock the output stream's input lock (over and over and over ...).

The "can't get desired heap address" message seems to be unrelated. A few weeks ago, I had convinced myself that :PREPEND-KERNEL didn't work under Windows (because it was difficult to open a running kernel for reading in order to copy it.) When I tried to reproduce your test case, that part of it seemed to work, so I'm a bit confused ...

The lisp does try to map its heap image at a fixed address (0x4000000) in the address space; if the image can't be loaded there (because a shared library or something has already been loaded at a conflicting address), it should try to load the image somewhere else and relocate it; the relocation might take measurable time (a second ? probably less, but depends on CPU speed and image size.)

The code that tries to reserve memory for the heap tries to allocate about 1GB at that address; failure means that something else (a shared library or something) is mapped somewhere between 0x4000000 and 0x4000000+1GB. If that attempt fails, it then tries 1GB "wherever there's room", then tries smaller sizes until it eventually succeeds. The message is just a diagnostic (leftover debugging code), but it's still a little unusual for anything to be mapped at a conflicting address. I don't think that I've ever seen that, but it may occur under VMWare, or may be the result of some security measure which tries to randomize library load addresses. (Randomization is good; cutting into the contiguous address space available to the application is bad.)

I suppose that it might also be the result of some confusion related to :PREPEND-KERNEL. I don't know ...

In any case, the attempt to lock NIL should be fixed in both the trunk and 1.3 as of r11981.

comment:2 follow-up: ↓ 3 Changed 6 years ago by gb

  • Status changed from closed to reopened
  • Resolution fixed deleted

I hadn't even tried setting the toplevel function to GC, and there may be a separate issue there.

comment:3 in reply to: ↑ 2 Changed 6 years ago by hrapof

Replying to gb:

I hadn't even tried setting the toplevel function to GC, and there may be a separate issue there.

Thank you for such a fast answer. I confirm that FORMAT works, and GC just hangs the program. I tried to turn GC off, but unfortunately, half a gig of heap is not sufficient for my program. :(

comment:4 in reply to: ↑ 1 ; follow-up: ↓ 5 Changed 6 years ago by hrapof

Replying to gb:

I suppose that it might also be the result of some confusion related to :PREPEND-KERNEL.

Just FYI: it occurs everytime I start wx86cl.exe, also.

comment:5 in reply to: ↑ 4 Changed 6 years ago by hrapof

Replying to hrapof:

Just FYI: it occurs everytime I start wx86cl.exe, also.

I mean, Can't get desired heap address at 0x4000000

comment:6 follow-up: ↓ 7 Changed 6 years ago by gb

The GC hang should be fixed in both the trunk and 1.3 as of r11984. There isn't much that we can do if Windows doesn't let us map the heap where we want to, except to try to map it where it'll let us, so the debugging message is suppressed as of r11986.

comment:7 in reply to: ↑ 6 Changed 6 years ago by hrapof

  • Status changed from reopened to closed
  • Resolution set to fixed

Replying to gb:

The GC hang should be fixed in both the trunk and 1.3 as of r11984.

Great! Works like a charm! You rule!

Note: See TracTickets for help on using tickets.