|Version 7 (modified by gz, 4 years ago)|
Running CCL under GDB
The lisp wants to handle most of the signals that can be raised to indicate an exception. If the lisp's exception-handling code doesn't know how to handle an exception, it enters the CCL kernel debugger (there's no good and direct way to pass it to another debugger). When an exception occurs in foreign code, the kernel debugger tries to note that fact.
GDB's much more likely to be able to make at least some sense out of the state of things in the exception-in-foreign-code case than the lisp's kernel debugger is.
Loading GDB init file
Before doing anything with lisp in GDB, you need to load (or "source", as a verb) the file ccl/lisp-kernel/linuxx8664/.gdbinit (replace linuxx8664 with whatever OS you're running). This file tells GDB about signals that need to be passed to lisp for handling, and defines some macros (most of which have to do with printing lisp object values).
That file will be sourced automatically if it (or a link to it) is in the same directory as the executable (or, IIRC, in your home directory.) Otherwise, once in GDB, just do:
(gdb) source ccl/lisp-kernel/linuxx8664/.gdbinint
When lisp is in the kernel debugger following an exception in foreign code:
(*) Note the PID, printed in brackets in the kernel debugger prompt, say it's 
(*) Do the R command to display raw (hex) register values and note the value in RIP (the program counter/instruction pointer), say it's 0x12345678.
(*) If GDB is already running, drop into it (via ^C). Otherwise, get a shell and do:
shell> gdb /path/to/ccl/lx86cl64 # location of lisp kernel (gdb) source lisp-kernel/linuxx8664/.gdbinit (gdb) attach 1234 # or whatever the PID is
(*) set a breakpoint at the exception:
(gdb) br *0x12345678 # or whatever the RIP value is
The leading asterisk is necessary to prevent GDB from interpreting the integer as a line number.
(*) tell GDB to let lisp run:
The kernel debugger will likely still be waiting for input. All other lisp threads should be suspended.
(*) Back in the kernel debugger, use the x command, which exits from the kernel debugger and resumes other threads.
 Clozure CL kernel debugger: x
That should immediately break into gdb at the instruction that caused the fault.
More generally, the next time any thread reaches the address of the breakpoint, GDB will be entered. It's hard to guarantee that the first thread that reaches that point will be the one that got the exception, but it's usually very likely (other threads usually require some time to wake up after being suspended.)
Debugging in GDB
will do a C backtrace (at least as far back as the foreign function call from lisp)
(gdb) info regs
will show register values.
(gdb) x/i $pc
disassembles the instruction at the pc/%rip.
If the foreign code has symbolic debugging information and wasn't heavily optimized, you can do a lot more (show argument and local variable values, see argument names and values in backtrace, etc.) at that point. If the problem is in some library code (either in its behavior or in the parameters that lisp is passing it) and it's possible to build the library with debugging enabled and optimization toned down, you'll probably find the problem much more quickly than you would otherwise.
Some Linux distributions provide debugging information and library source for the standard libraries; on Fedora, this information is contained in optional "debuginfo" packages. If it's available, the information is often very useful.
As far as other tips and tricks ... I'm not sure what I could say that'd be meaningful without a long explanation of how the lisp is implemented. The manual actually does explain quite a bit of that. If you want to use GDB to step through/set breakpoints in compiled lisp code it's certainly possible to do that (I do it all the time ...), but explaining the issues and details might take a while. (From GDB's point of view, this is like debugging machine code or debugging C code that you don't have the source to and don't have symbolic information for; it's OK at that and there isn't anything better at it widely available under Linux, but that's not really its primary are of focus.)
Here are some hints for linuxx8664:
To find the address corresponding to a lisp symbol, first tell GDB to call the "find_symbol" function, which walks memory until it finds a symbol with a matching pname and returns the symbol tagged as a vector:
(gdb) call find_symbol("FIND-IF-NOT") $1 = 52777632305533
You can then look at the slots of the symbol, which are a header followed by the pname, value and function. You have to subtract the tag from the address returned by find_symbol, which is 13 on x8664:
(gdb) x/gx 52777632305533-13 ; subtract fulltag_misc = 13 0x300040069170: 0x0000000000000715 ; header (gdb) 0x300040069178: 0x00003000000a995d ; pname (gdb) 0x300040069180: 0x0000000000000012 ; value (gdb) 0x300040069188: 0x000030004006970f ; function
You can set a breakpoint on entry to the function:
(gdb) br *0x000030004006970f
Note that you don't need to subtract any tags - the code starts right at the address of the function.
To enter GDB when lisp is starting up, set a breakpoint at *_SPfuncall, which is called soon after the image is loaded (and is rarely called thereafter, since funcall is inlined).
To cause GC (including the EGC) to run integrity checks on entry, add -DGC_INTEGRITY_CHECKING to the CDEFINES in the kernel Makefile and rebuild the kernel. Alternately you can (setq ccl::*gc-event-status-bits* 4) at any time for the same effect.
If you look at the .gdbinit file, there are a number of useful lisp-related commands defined there. Try them...
The "handle" forms in the .gdbinit file enumerate all of the signals that the lisp handles. The general idea is to say something like:
handle SIGQUIT pass nostop noprint
which tells GDB that if the target process gets a SIGQUIT, it should let the application handle it (GDB should "pass" it to the application) without stopping or printing anything.
A SIGINT by default causes entry to GDB and is not passed to the application. I sometimes find it useful to be able be able interrupt the lisp via SIGINT (after entering GDB). Doing something like
handle SIGINT pass stop print
causes GDB to ask for confirmation because "SIGINT is used by the debugger". (It's not used in the same way that breakpoints and single-step exceptions are used, so I usually just sigh and give it the confirmation it craves.)