Opened 10 years ago

Closed 10 years ago

#635 closed defect (fixed)

CCL enters kernel debugger during recompilation of system

Reported by: sctb Owned by: gb
Priority: normal Milestone:
Component: Runtime (threads, GC) Version: trunk
Keywords: kernel debugger, thread, allocation Cc:

Description (last modified by gb)

Platform: Mac OS X 10.6.2 (dx86cl64) Version: 1.4-dev-r12860M-trunk

Error message:

"? Can't determine state of thread 0x147a50, interrupted during memory allocation ? for help [672] Clozure CL kernel debugger: "

I am able to fairly consistently enter the kernel debugger by re- loading my current ASDF system (a multi-threaded web server app) and reading ~150MB data from disk into a bunch of vectors. This is done within a SLIME session.

This issue did not seem to come up when running CCL (1.4 and trunk) under 10.5.

Show output from the debugger:

"[672] Clozure CL kernel debugger: B
current thread: tcr = 0x14a9d0, native thread ID = 0x19ae23, interrupts disabled


Bogus  frame 3217cb0
[672] Clozure CL kernel debugger: T
Current Thread Context Record (tcr) = 0x14a9d0
Control (C) stack area:  low = 0xb15d7000, high = 0xb183b000
Value (lisp) stack area: low = 0x3007000, high = 0x3218000
Exception stack pointer = 0x2bcfb98
[672] Clozure CL kernel debugger: R
%rax = 0x0000000000000536      %r8  = 0x00003000406eac8d
%rcx = 0x0000000000000040      %r9  = 0x0000000000000000
%rdx = 0x0000000000000023      %r10 = 0x00003000406eacf4
%rbx = 0x00003000406ea9de      %r11 = 0x0000000000147a50
%rsp = 0x0000000002bcfb98      %r12 = 0x0000300040717abe
%rbp = 0x0000000002bcfbd8      %r13 = 0x00003000406eacbf
%rsi = 0x0000000002ce1750      %r14 = 0x000030004b61cf7d
%rdi = 0xfffffffffffffff8      %r15 = 0x000030004071632e
%rip = 0x00003000406ead10   %rflags = 0x00000216
[672] Clozure CL kernel debugger: L
%rsi (arg_z) = 5882602
%rdi (arg_y) = -1
%r8  (arg_x) = #(:TYPE :ARG :FRAG :POS)
------
%r13 (fn) = #<Function MAKE-RELOC #x00003000406EACBF>
------
%r15 (save0) = :LONG
%r14 (save1) = #<(#<(CLASS-CELL . #<# @#x00003000401EBBAD>) @#x000030004028621D> #<(CLASS-CELL . #<# @#x00003000401EBBAD>) @#x000030004028742D>) @#x000030004B61CF7D>
%r12 (save2) = :EXPR32
%r11 (save3) = 167754
------
%rbx (temp0) = MAKE-RELOC
%r9  (temp1) = 0
%r10 (temp2) = tagged return address: #<Function MAKE-RELOC #x00003000406EACBF> + 53
------
%rcx (nargs) = 8 (maybe)
[672] Clozure CL kernel debugger: F
f00: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f01: 0x428d2d2d (7.058823e+01), 0x00000000428d2d2d (5.516482e-315)
f02: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f03: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f04: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f05: 0x72657355 (4.544736e+30), 0x63732f7372657355 (1.158477e+171)
f06: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f07: 0xfefa39ef (-1.663039e+38), 0x3fe62e42fefa39ef (6.931472e-01)
f08: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f09: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f10: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f11: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f12: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f13: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f14: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
f15: 0x00000000 (0.000000e+00), 0x0000000000000000 (0.000000e+00)
mxcsr = 0x00001920
[672] Clozure CL kernel debugger: "

Change History (4)

comment:1 Changed 10 years ago by gb

  • Description modified (diff)
  • Status changed from new to assigned

comment:2 Changed 10 years ago by gb

If we take the error message at face value, it means that some thread (the one that the kernel debugger describes) tried to allocate memory and entered the GC; the GC stopped all other threads, found that one of them was also in the middle of a sequence of instructions used to allocate memory, but didn't recognize the instruction(s) at which the stopped thread has stopped. (All memory allocations have to use a stylized sequence of instructions, and the GC thread knows how these instructions are encoded; this is all necessary so that the stopped thread's state can be changed so that it's either about to start an allocation or has just finished an allocation and isn't manipulating a half-allocated object.)

A few months ago, we changed the way that threads track thread-local data on x86-64 Darwin (and therefore incidentally changed the instructions used in the canonical allocation sequence); it was another few weeks before anyone noticed that the descriptions of the instructions in the canonical sequence hadn't been updated. (I believe that I noticed that because I got the same sort of error that you did.) The descriptions were updated in r12770; the dx86cl64 binary in svn wasn't updated until r12862.

If you did an svn checkout/update around that time and the lisp kernel hasn't been rebuilt, that would explain why you're getting this error. (Note that rebuilding the kernel on 10.6 can fail unless XCode's optional 10.4 support is installed.) If you could update to the current trunk and do a (REBUILD-CCL :FULL T) after having done so, that would either fix the problem or eliminate the possibility that the problem's been fixed, and at the moment that possibility seems pretty high.

comment:3 Changed 10 years ago by sctb

After rebuilding 1.5-dev-r13246M-trunk, the issues seems to have stayed away after several days of development. Since it was coming up every few minutes previously, I would say that the issue is resolved.

comment:4 Changed 10 years ago by rme

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.