Changes between Initial Version and Version 1 of CclUnderGdb


Ignore:
Timestamp:
May 29, 2008, 10:01:48 PM (12 years ago)
Author:
alms
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CclUnderGdb

    v1 v1  
     1{{{
     2> Any tips and tricks to share?
     3>
     4> What do I do about segfaults?  What about SIG40?
     5
     6You basically want to tell GDB to load (or "source", as a verb) a file
     7that tells it about signals that're handled by the application
     8and defines some macros (most of which have to do with printing
     9lisp object values)
     10
     11(gdb) source ccl/lisp-kerenel/linuxx8664/.gdbinint
     12
     13That file will be sourced automatically if it (or a link to it)
     14is in the same directory as the executable (or, IIRC, in your
     15home directory.)
     16
     17I think that the "handle" forms in the .gdbinit file enumerate
     18all of the signals that the lisp handles; there was a time
     19last fall when at least one case was missing from the checked-in
     20.gdbinit file.  The general idea is to say something like:
     21
     22handle SIGQUIT pass nostop noprint
     23
     24which tells GDB that if the target process gets a SIGQUIT, it should
     25let the application handle it (GDB should "pass" it to the application)
     26without stopping or printing anything.
     27
     28A SIGINT by default causes entry to GDB and is not passed to the
     29application.  I sometimes find it useful to be able be able interrupt
     30the lisp via SIGINT (after entering GDB).  Doing something like
     31
     32handle SIGINT pass stop print
     33
     34causes GDB to ask for confirmation because "SIGINT is used by the
     35debugger".  (It's not used in the same way that breakpoints and
     36single-step exceptions are used, so I usually just sigh and give
     37it the confirmation it craves.)
     38
     39As far as other tips and tricks ... I'm not sure what I could
     40say that'd be meaningful without a long explanation of how the
     41lisp is implemented.
     42
     43<http://ccl.clozure.com/manual/chapter16.html#Implementation-Details-of-CCL>
     44
     45actually does explain quite a bit of that.  If you want to use GDB
     46to step through/set breakpoints in compiled lisp code it's certainly
     47possible to do that (I do it all the time ...), but explaining the
     48issues and details might take a while.  (From GDB's point of view,
     49this is like debugging machine code or debugging C code that you
     50don't have the source to and don't have symbolic information for;
     51it's OK at that and there isn't anything better at it widely available
     52under Linux, but that's not really its primary are of focus.)
     53
     54GDB's very good at debugging C code that was compiled with debugging
     55enabled and for which you have the source code.  (It's even better
     56if optimization's toned down.)  If you're trying to debug C library
     57code for which you have the source and for which debugging information
     58was generated and not stripped, GDB's sort of in its element and
     59offers lots of useful features.
     60
     61The lisp wants to handle most of the signals that can be raised to
     62indicate an exception.  On x86-64 Linux, SIGSEGV means lots of
     63things, and those things in turn mean different things when you're
     64executing lisp code when the occur than they would if you were executing
     65C ("foreign") code.  If the lisp's exception-handling code doesn't
     66know how to handle an exception, it enters the kernel debugger (there's
     67no good and direct way to pass it to another debugger.)  When an
     68exception occurs in foreign code, the kernel debugger tries to
     69note that fact; ideally, it would also disable some debugging
     70commands that only make sense if the exception occurred while
     71executing lisp code, but it leaves them enabled.  (The "L"
     72kernel debugger command is very useful for seeing the values
     73of lisp objects in registers at the point of the exception,
     74but it will crash or misbehave if those registers don't contain
     75lisp objects, as they wouldn't if the exception occurred in foreign
     76code.)
     77
     78GDB's much more likely to be able to make at least some sense
     79out of the state of things in the exception-in-foreign-code case
     80than the lisp's kernel debugger is.  If GDB's already running
     81(as opposed to having been attached after the fact), you can
     82do this via the same technique that I described a few weeks ago
     83(but it's a little easier if you don't have to play "guess
     84which thread was in the kernel debugger.)  The general idea
     85is:
     86
     87a) In the kernel debugger do R to display raw (hex) register values
     88  and note the value in RIP (the program counter/instruction pointer.)
     89b) Drop into GDB (via ^C) and set a breakpoint at that address.
     90  If the address is 0x87654321, the GDB command to set that breakpoint
     91  would be:
     92
     93(gdb) br *0x87654321
     94
     95  The leading asterisk is necessary to prevent GDB from interpreting
     96  the integer as a line number.
     97
     98c)  Tell GDB to let the interrupted process continue
     99
     100(gdb) continue
     101
     102   The kernel debugger will likely still be waiting for input.
     103   All other lisp threads should be suspended.
     104
     105d)  In the kernel debugger, use the "x" command, which exits
     106   from the kernel debugger resumes other threads.
     107
     108The next time any thread reaches the address of the breakpoint,
     109GDB will be entered.  It's hard to guarantee that the first thread
     110that reaches that point will be the one that got the exception,
     111but it's usually very likely (other threads usually require some
     112time to wake up after being uspended.)
     113
     114In GDB at that point,
     115
     116(gdb) bt
     117
     118will do a backtrace (at least as far back as the foreign function
     119call from lisp)
     120
     121(gdb) info regs
     122
     123will show register values.
     124
     125If the foreign code has symbolic debugging information and wasn't
     126heavily optimized, you can do a lot more (show argument and local
     127variable values, see argument names and values in backtrace, etc.)
     128at that point.  If the problem is in some library code (either in
     129its behavior or in the parameters that lisp is passing it) and
     130it's possible to build the library with debugging enabled and
     131optimization toned down, you'll probably find the problem much
     132more quickly than you would otherwise.
     133
     134(If the problem goes away when optimization is disabled, it might be
     135time to start thinking about replacing the installed GCC at ITA with
     136something newer. That version of GCC doesn't compile the lisp kernel
     137correctly at -O2, and it's believable that it might have problems
     138with other code as well.)
     139}}}