Opened 12 years ago
Closed 12 years ago
#385 closed defect (fixed)
CCL fails to start on FreeBSD-6.3/i386 in VMware
Reported by: | hans | Owned by: | gb |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Runtime (threads, GC) | Version: | |
Keywords: | Cc: |
Description
CCL does not start for me on 32bit FreeBSD, running in VMware 6.5.1-build126130 hosted on Windows Vista/x64.
dentinox 8_> uname -a FreeBSD dentinox.huebner.org 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 04:1 8:52 UTC 2008 root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i38 6 dentinox 9_> svn info . Path: . URL: http://svn.clozure.com/publicsvn/openmcl/trunk/freebsdx86/ccl Repository Root: http://svn.clozure.com/publicsvn/openmcl Repository UUID: 8bdb8f1d-59ce-db11-9e41-0016172a54ae Revision: 11462 Node Kind: directory Schedule: normal Last Changed Author: gb Last Changed Rev: 11404 Last Changed Date: 2008-11-18 21:55:55 +0000 (Tue, 18 Nov 2008) dentinox 10_> ./fx86cl Unhandled exception 11 at 0x805613d, context->regs at #xbfbfe410 ? for help [2652] OpenMCL kernel debugger: b current thread: tcr = 0x48098240, native thread ID = 0x806b000, interrupts disab led
Change History (6)
comment:1 Changed 12 years ago by gb
- Owner changed from rme to gb
- Status changed from new to assigned
comment:2 Changed 12 years ago by hans
Here are the registers:
dentinox 18_> ./fx86cl Unhandled exception 11 at 0x805613d, context->regs at #xbfbfe410 ? for help [4261] OpenMCL kernel debugger: R %eax = 0xffffff00 %ecx = 0x4cb59e46 %edx = 0x00000000 %ebx = 0x4c1b013e %esp = 0x882f2ff4 %ebp = 0x882f2ff8 %esi = 0x4cb59e46 %edi = 0x349a9e46 %eip = 0x0805613d %eflags = 0x00010246
and here is the complete map:
0x12000 0x14000 2 0 0xc4bcaa50 rwx 1 0 0x2180 COW NNC vnode /usr/home/hans/ccl/fx86cl.image 0x15000 0x16000 1 0 0xc4c5039c rwx 1 0 0x2180 NCOW NNC default - 0x8048000 0x8069000 33 0 0xc4addbdc r-x 1 0 0x0 COW NC vnode /usr/home/hans/ccl/fx86cl 0x8069000 0x806a000 1 0 0xc4c2ee70 rw- 2 0 0x2180 NCOW NNC default - 0x806a000 0x8073000 9 0 0xc4c2ee70 rwx 2 0 0x2180 NCOW NNC default - 0x48069000 0x4808b000 31 0 0xc104e210 r-x 71 34 0x4 COW NC vnode /libexec/ld-elf.so.1 0x4808b000 0x4808d000 2 0 0xc4c3edec rw- 1 0 0x2180 COW NNC vnode /libexec/ld-elf.so.1 0x4808d000 0x48092000 4 0 0xc4ae8948 rw- 2 0 0x2180 NCOW NNC default - 0x48092000 0x4809a000 8 0 0xc4ae8948 rwx 2 0 0x2180 NCOW NNC default - 0x4809a000 0x480ae000 20 0 0xc104f8c4 r-x 22 14 0x0 COW NC vnode /lib/libm.so.4 0x480ae000 0x480af000 1 0 0xc4c3c840 r-x 1 0 0x2180 COW NNC vnode /lib/libm.so.4 0x480af000 0x480b0000 1 0 0xc765d528 rwx 1 0 0x2180 COW NNC vnode /lib/libm.so.4 0x480b0000 0x480bf000 15 0 0xc102ea50 r-x 6 4 0x0 COW NC vnode /usr/lib/libthr.so.2 0x480bf000 0x480c0000 1 0 0xc4b0339c r-x 1 0 0x2180 COW NNC vnode /usr/lib/libthr.so.2 0x480c0000 0x480c1000 1 0 0xc4c3ead4 rwx 1 0 0x2180 COW NNC vnode /usr/lib/libthr.so.2 0x480c1000 0x480c2000 1 0 0xc4c2e738 rwx 1 0 0x2180 NCOW NNC default - 0x480c2000 0x4818c000 153 0 0xc104ebdc r-x 105 68 0x4 COW NC vnode /lib/libc.so.6 0x4818c000 0x4818d000 1 0 0xc4c1b318 r-x 1 0 0x2180 COW NNC vnode /lib/libc.so.6 0x4818d000 0x48193000 6 0 0xc1029318 rwx 1 0 0x2180 COW NNC vnode /lib/libc.so.6 0x48193000 0x481aa000 5 0 0xc4c2ec60 rwx 1 0 0x2180 NCOW NNC default - 0x481b0000 0x485ec000 2 0 0xc4c50dec r-x 3 2 0x0 COW NC vnode /usr/home/hans/ccl/fx86cl.image 0x485ec000 0x4c1b0000 0 0 0xc4c3f000 --- 8 0 0x2180 NCOW NNC default - 0x4c1b0000 0x4cb63000 28 0 0xc4bfa948 rwx 1 0 0x2180 COW NNC vnode /usr/home/hans/ccl/fx86cl.image 0x4cb63000 0x4db70000 0 0 0xc4acddec rwx 1 0 0x2000 NCOW NNC default - 0x4db70000 0x861b0000 0 0 0xc4c3f000 --- 8 0 0x2180 NCOW NNC default - 0x861b0000 0x86218000 0 0 0 rwx 0 0 0x0 NCOW NNC none - 0x86218000 0x871b0000 0 0 0xc4c3f000 --- 8 0 0x2180 NCOW NNC default - 0x871b0000 0x87218000 39 0 0xc4c3cb58 rwx 1 0 0x2180 NCOW NNC default - 0x87218000 0x881b0000 0 0 0xc4c3f000 --- 8 0 0x2180 NCOW NNC default - 0x881b0000 0x881c1000 1 0 0xc4c3f000 r-x 8 0 0x2180 NCOW NNC default - 0x881c1000 0x882f4000 1 0 0xc4c3f000 rwx 8 0 0x2180 NCOW NNC default - 0x882f4000 0x88304000 1 0 0xc4c3f000 r-x 8 0 0x2180 NCOW NNC default - 0x88304000 0x88345000 1 0 0xc4c3f000 rwx 8 0 0x2180 NCOW NNC default - 0xbf9ff000 0xbfa00000 0 0 0 --- 0 0 0x0 NCOW NNC none - 0xbfa9b000 0xbfadf000 1 0 0xc4c1f210 rwx 1 0 0x2180 NCOW NNC default - 0xbfbe0000 0xbfc00000 3 0 0xc4c3ec60 rwx 1 0 0x2180 NCOW NNC default -
To me, it seems as if the files got mapped at the wrong address, so there should be some hope.
comment:3 Changed 12 years ago by gb
The lisp expects there to be 1GB free starting at #x30000000. The map above indicates that the shared libraries get mapped at #x48069000 - a little nearer to the middle of the address space than one would like, and the lisp heap got put at #x481b0000. The code that loads the image is supposed to notice that it's not occupying the preferred address range and relocate it (basically, find every pointer that's pointing into the preferred address range and make it point into the actual address range, but this is broken on x8632 (ticket:331(.
If it tried to map less than 1GB at #x30000000 and the libraries are consistently where they are for you, that might work. You could try calling the lisp with
shell> fx86cl --heap-reserve 384M
which might get a smaller chunk of address space to wind up at the "right" address.
You'd likely wind up with > 300MB in that case (64MB is reserved for "pure" objects, and a few % are used for GC overhead. That's basically an upper bound on heap size; it's what shows up as "reserved for heap expansion" in ROOM's output.
If the library load addresses are an artifact of vmware ... well, it'd be good to get them to fix that. Some platforms try to randomize shared library load addresses in the hope that this will make applications less vulnerable to securiity exploits. That's a noble goal, but unless it's done carefully it limits the application's ability to allocate large contiguous chunks of its own address space, which CCL certainly wants to do.
Even if the relocation code worked (ticket:331), it's undesirable to have to run it: it increases startup time and largely defeats the purpose of file-mapping the image.
comment:4 Changed 12 years ago by hans
shell> fx86cl --heap-reserve 384M
did not work for me, even though the shared libraries have been mapped in the same address range as before.
comment:5 Changed 12 years ago by gb
That suggests that FreeBSD running under the version of VMWare that you're using just refuses to let a process map regions of its address space that it is able to map when running under other versions of VMWare (and on real hardware: I have x86 FreeBSD installed on a real machine here, and the lisp starts up without incident.)
r11468 purports to fix ticket:331, which might work around this: if an image can't be loaded at the address from which it was saved, we can (expensively) walk the image and adjust every pointer that we find.
When you can, please update to r11468 or later, rebuild the 32-bit FreeBSD lisp kernel via:
shell> cd ccl/lisp-kernel/freebdsx8632 shell> make
and see if the lisp starts up. (It's hard to say whether there'd be a visible startup delay while the image relocates itself; it's fairly easy to say that there'd be nothing that we could do about that, if so.)
comment:6 Changed 12 years ago by hans
- Resolution set to fixed
- Status changed from assigned to closed
Awesome, Gary! The fix works fine, and the startup delay is not worth mentioning (for me, that is).
dentinox 13_> time ./fx86cl -n -e '(ccl::quit)' 0.000u 0.371s 0:00.37 100.0% 178+211k 0+0io 0pf+0w
I actually develop and run the 32-bit FreeBSD version on VMWare Fusion (Version 2.0 (116369).
This isn't the same bug that's affected running under VMWare in the past. My best bet is that the lisp side of this is ticket:331, and the VMWare side of it is "can't map the heap where we want it to go", but that's just a guess. Seeing the value of the registers (R in the kernel debugger) might help to test that hypothesis.
If "VMWare Tools" are installed (which isn't always easy to do under FreeBSD), it'd be interesting to know whether turning them off affects this.
I don't know whether or not this is just random VMWare lossage; if it is, there wouldn't be too much that we could do about it.
If you have the FreeBSD /proc file system mounted (via "sudo mount -t procfs proc /proc"), then at the point where the crash occurs:
2) In another shell, do:
This will show some information about memory regions in the process. The regions that are most interesting should look like:
e.g. a region at 0x30000000 mapped to the image file with read and execute permissions, some allocated but unmapped memory, another writable file-mapped region at 0x34000000, some mapped memory, lots of unmapped memory.
If the file-mapped regions exist but are at different addresses, we might be able to work around that by fixing ticket:331. (If there's a shared library or something occupying 0x30000000, it might be interesting to know what that is.)
If the file-mapped regions exist at the addresses that they're supposed to, then this is more likely to be random VMWare lossage.