wiki:CclUnderGdb

Version 1 (modified by alms, 6 years ago) (diff)

--

> Any tips and tricks to share?
>
> What do I do about segfaults?  What about SIG40?

You basically want to tell GDB to load (or "source", as a verb) a file 
that tells it about signals that're handled by the application
and defines some macros (most of which have to do with printing
lisp object values)

(gdb) source ccl/lisp-kerenel/linuxx8664/.gdbinint

That file will be sourced automatically if it (or a link to it)
is in the same directory as the executable (or, IIRC, in your
home directory.)

I think that the "handle" forms in the .gdbinit file enumerate
all of the signals that the lisp handles; there was a time
last fall when at least one case was missing from the checked-in
.gdbinit file.  The general idea is to say something like:

handle SIGQUIT pass nostop noprint

which tells GDB that if the target process gets a SIGQUIT, it should
let the application handle it (GDB should "pass" it to the application)
without stopping or printing anything.

A SIGINT by default causes entry to GDB and is not passed to the
application.  I sometimes find it useful to be able be able interrupt
the lisp via SIGINT (after entering GDB).  Doing something like

handle SIGINT pass stop print

causes GDB to ask for confirmation because "SIGINT is used by the
debugger".  (It's not used in the same way that breakpoints and
single-step exceptions are used, so I usually just sigh and give
it the confirmation it craves.)

As far as other tips and tricks ... I'm not sure what I could
say that'd be meaningful without a long explanation of how the
lisp is implemented.

<http://ccl.clozure.com/manual/chapter16.html#Implementation-Details-of-CCL>

actually does explain quite a bit of that.  If you want to use GDB
to step through/set breakpoints in compiled lisp code it's certainly
possible to do that (I do it all the time ...), but explaining the
issues and details might take a while.  (From GDB's point of view,
this is like debugging machine code or debugging C code that you
don't have the source to and don't have symbolic information for;
it's OK at that and there isn't anything better at it widely available
under Linux, but that's not really its primary are of focus.)

GDB's very good at debugging C code that was compiled with debugging
enabled and for which you have the source code.  (It's even better
if optimization's toned down.)  If you're trying to debug C library
code for which you have the source and for which debugging information
was generated and not stripped, GDB's sort of in its element and
offers lots of useful features.

The lisp wants to handle most of the signals that can be raised to
indicate an exception.  On x86-64 Linux, SIGSEGV means lots of
things, and those things in turn mean different things when you're
executing lisp code when the occur than they would if you were executing
C ("foreign") code.  If the lisp's exception-handling code doesn't
know how to handle an exception, it enters the kernel debugger (there's
no good and direct way to pass it to another debugger.)  When an
exception occurs in foreign code, the kernel debugger tries to
note that fact; ideally, it would also disable some debugging
commands that only make sense if the exception occurred while
executing lisp code, but it leaves them enabled.  (The "L"
kernel debugger command is very useful for seeing the values
of lisp objects in registers at the point of the exception,
but it will crash or misbehave if those registers don't contain
lisp objects, as they wouldn't if the exception occurred in foreign
code.)

GDB's much more likely to be able to make at least some sense
out of the state of things in the exception-in-foreign-code case
than the lisp's kernel debugger is.  If GDB's already running
(as opposed to having been attached after the fact), you can
do this via the same technique that I described a few weeks ago
(but it's a little easier if you don't have to play "guess
which thread was in the kernel debugger.)  The general idea
is:

a) In the kernel debugger do R to display raw (hex) register values
  and note the value in RIP (the program counter/instruction pointer.)
b) Drop into GDB (via ^C) and set a breakpoint at that address.
  If the address is 0x87654321, the GDB command to set that breakpoint
  would be:

(gdb) br *0x87654321

  The leading asterisk is necessary to prevent GDB from interpreting
  the integer as a line number.

c)  Tell GDB to let the interrupted process continue

(gdb) continue

   The kernel debugger will likely still be waiting for input.
   All other lisp threads should be suspended.

d)  In the kernel debugger, use the "x" command, which exits
   from the kernel debugger resumes other threads.

The next time any thread reaches the address of the breakpoint,
GDB will be entered.  It's hard to guarantee that the first thread
that reaches that point will be the one that got the exception,
but it's usually very likely (other threads usually require some
time to wake up after being uspended.)

In GDB at that point,

(gdb) bt

will do a backtrace (at least as far back as the foreign function
call from lisp)

(gdb) info regs

will show register values.

If the foreign code has symbolic debugging information and wasn't
heavily optimized, you can do a lot more (show argument and local
variable values, see argument names and values in backtrace, etc.)
at that point.  If the problem is in some library code (either in
its behavior or in the parameters that lisp is passing it) and
it's possible to build the library with debugging enabled and
optimization toned down, you'll probably find the problem much
more quickly than you would otherwise.

(If the problem goes away when optimization is disabled, it might be
time to start thinking about replacing the installed GCC at ITA with
something newer. That version of GCC doesn't compile the lisp kernel
correctly at -O2, and it's believable that it might have problems
with other code as well.)