Ticket #442 (closed defect: invalid)

Opened 6 years ago

Last modified 6 years ago

Exception in networking code (Linux/AMD64)

Reported by: emarsden Owned by: gb
Priority: normal Milestone: 1.3
Component: Runtime (threads, GC) Version: trunk
Keywords: crash Cc:

Description

On a long running server process, ClozureCL crashed in networking code. The server uses Hunchentoot.

Welcome to Clozure Common Lisp Version 1.3-dev-r11784M-trunk (LinuxX8664)!

Exception occurred while executing foreign code
? for help
[12821] Clozure CL kernel debugger: B
current thread: tcr = 0x404c77c0, native thread ID = 0x3217, interrupts enabled
(#x00002AAAAC6D99E8) #x0000300040085F04 : #<Function MALLOC #x0000300040085E6F> + 149
(#x00002AAAAC6D9A00) #x00003000402D540C : #<Function %MAKE-HEAP-IVECTOR #x00003000402D536F> + 157
(#x00002AAAAC6D9A40) #x00003000402D4504 : #<Function MAKE-HEAP-IVECTOR #x00003000402D440F> + 245
(#x00002AAAAC6D9A78) #x0000300040334E94 : #<Function INIT-STREAM-IOBLOCK #x0000300040334ACF> + 965
(#x00002AAAAC6D9BE8) #x000030004033332C : #<Function MAKE-IOBLOCK-STREAM #x00003000403331AF> + 381
(#x00002AAAAC6D9C20) #x00003000403314C4 : #<Function MAKE-FD-STREAM #x00003000403310BF> + 1029
(#x00002AAAAC6D9D38) #x00003000406091D4 : #<Function ACCEPT-SOCKET-CONNECTION #x000030004060905F> + 373
(#x00002AAAAC6D9DA8) #x000030004018F5F4 : #<Function %%CHECK-KEYWORDS #x000030004018F50F> + 229
(#x00002AAAAC6D9E20) #x0000300041A1CCD4 : #<Function (:INTERNAL OPEN-SOCKET-AND-ACCEPT START-UP-SERVER) #x0000300041A1C7CF> + 1285
(#x00002AAAAC6D9EB8) #x00003000404AB97C : #<Function RUN-PROCESS-INITIAL-FORM #x00003000404AB6AF> + 717
(#x00002AAAAC6D9F48) #x00003000404AC584 : #<Function (:INTERNAL (%PROCESS-PRESET-INTERNAL (PROCESS))) #x00003000404AC3FF> + 389
(#x00002AAAAC6D9F98) #x000030004048D10C : #<Function (:INTERNAL THREAD-MAKE-STARTUP-FUNCTION) #x000030004048CFDF> + 301
[12821] Clozure CL kernel debugger: T
Current Thread Context Record (tcr) = 0x404c77c0
Control (C) stack area:  low = 0x40274000, high = 0x404c8000
Value (lisp) stack area: low = 0x2aaaac4c9000, high = 0x2aaaac6da000
Exception stack pointer = 0x404c6ca0
[12821] Clozure CL kernel debugger: R
%rax = 0x0000000000000051      %r8  = 0x0000000000000001
%rcx = 0x00002aaab0000090      %r9  = 0x00002aaab0000048
%rdx = 0x00002aaab0008810      %r10 = 0x00002aaab0000078
%rbx = 0x0000000000000001      %r11 = 0x00000000004204d0
%rsp = 0x00000000404c6ca0      %r12 = 0x0000000000000050
%rbp = 0x00002aaab0005b20      %r13 = 0x0000000000000020
%rsi = 0x0000000000000419      %r14 = 0x00002aaab0000080
%rdi = 0x00002aaab0000020      %r15 = 0x00002aaab0000020
%rip = 0x00002aaaab3dea1b   %rflags = 0x00010202

Change History

comment:1 Changed 6 years ago by gb

Lisp code called malloc (trying to allocate a buffer for a stream socket) and you crashed in some foreign code. It'd be reasonable to assume that that foreign code had something to do with the implementation of malloc, but CCL's kernel debugger doesn't offer much support for debugging C code.

GDB, on the other hand, offers lots of support for both source-level C debugging and for machine-level debugging.  http://trac.clozure.com/openmcl/wiki/CclUnderGdb describes some of this; the most useful thing to try to do in cases like this is to try to set a breakpoint in GDB at the point where the exception occurred.

  1. Note the value of %rip (the instruction pointer register/program counter) in the kernel debugger's (r) output.
  2. start gdb (when the lisp is in the kernel debugger) and attach to the ccl process:
shell> gdb /path/to/lisp-kernel
(gdb) source lisp-kernel/linuxx8664/.gdbinit
(gdb) attach <pid>    # pid is printed in brackets in the kernel debugger prompt
(gdb) br *<%rip-value, 0x00002aaaab3dea1b in the case above>
(gdb) continue

Back in the kernel debugger:

[pid] Clozure CL kernel debugger: x

That should immediately break into gdb at the instruction that caused the fault. At that point:

(gdb) x/i $pc   # disassembles the instruction at the pc/%rip
(gdb) bt        # do a C backtrace, which will probably be valid back to the point of the ff-call

Some Linux distributions provide debugging information and library source for the standard libraries; on Fedora, this information is contained in optional "debuginfo" packages. If it's available, the information is often very useful; if it isn't, ... well, it's generally hard to debug something like this at the machine-code level.

Even if it's clear what the symptom is, determining what causes that symptom may be difficult. If this happens reproducibly, then someone might be able to look into the problem and determine that cause.

The lisp is certainly one of many suspects here, and if it turns out to be the guilty party it'd obviously be worth trying to fix whatever the problem might be. We'd need a lot more information than we have in order to say anything less vague than that.

comment:2 Changed 6 years ago by rme

  • Status changed from new to closed
  • Resolution set to invalid

If you happen to come up with a test case or more information, please re-open the ticket.

Note: See TracTickets for help on using tickets.