Ignore:
Timestamp:
Apr 17, 2008, 10:28:07 PM (12 years ago)
Author:
mikel
Message:

additional mechaincal and style edits in implementation and modifying

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/source/doc/src/implementation.xml

    r8981 r9176  
    2424    <sect1 id="Threads-and-exceptions">
    2525      <title>Threads and exceptions</title>
     26
    2627      <para>&CCL;'s threads are "native" (meaning that they're
    27       scheduled and controlled by the operating system.)  Most of the
    28       implications of this are discussed elsewhere; this section tries
    29       to describe how threads look from the lisp kernel's perspective
    30       (and especially from the GC's point of view.)</para>
     28        scheduled and controlled by the operating system.)  Most of the
     29        implications of this are discussed elsewhere; this section tries
     30        to describe how threads look from the lisp kernel's perspective
     31        (and especially from the GC's point of view.)</para>
    3132      <para>&CCL;'s runtime system tries to use machine-level
    32       exception mechanisms (conditional traps when available, illegal
    33       instructions, memory access protection in some cases) to detect
    34       and handle ... exceptional situations.  These situations
    35       include some TYPE-ERRORs and PROGRAM-ERRORS (notably
    36       wrong-number-of-args errors), and also include cases like "not
    37       being able to allocate memory without GCing or obtaining more
    38       memory from the OS."  The general idea is that it's usually
    39       faster to pay (very occasional) exception-processing overhead
    40       and figure out what's going on in an exception handler than it
    41       is to maintain enough state and context to handle an exceptional
    42       case via a lighter-weight mechanism when that exceptional case
    43       (by definition) rarely occurs.</para>
     33        exception mechanisms (conditional traps when available,
     34        illegal instructions, memory access protection in some cases)
     35        to detect and handle exceptional situations.  These situations
     36        include some TYPE-ERRORs and PROGRAM-ERRORS (notably
     37        wrong-number-of-args errors), and also include cases like "not
     38        being able to allocate memory without GCing or obtaining more
     39        memory from the OS."  The general idea is that it's usually
     40        faster to pay (very occasional) exception-processing overhead
     41        and figure out what's going on in an exception handler than it
     42        is to maintain enough state and context to handle an
     43        exceptional case via a lighter-weight mechanism when that
     44        exceptional case (by definition) rarely occurs.</para>
    4445      <para>Some emulated execution environments (the Rosetta PPC
    45       emulator on x86 versions of Mac OS X) don't provide accurate
    46       exception information to exception handling functions. &CCL;
    47       can't run in such environments.</para>
     46        emulator on x86 versions of Mac OS X) don't provide accurate
     47        exception information to exception handling functions. &CCL;
     48        can't run in such environments.</para>
    4849
    4950      <sect2 id="The-Thread-Context-Record">
    50         <title>The Thread Context Record</title>
    51 
    52         <para>When a lisp thread is first created (or when a thread
    53         created by foreign code first calls back to lisp), a data
    54         structure called a Thread Context Record (or TCR) is allocated
    55         and initialized.  On modern versions of Linux and FreeBSD, the
    56         allocation actually happens via a set of thread-local-storage
    57         ABI extensions, so a thread's TCR is created when the thread
    58         is created and dies when the thread dies.  (The World's Most
    59         Advanced Operating System - as Apple's marketing literature
    60         refers to Darwin - is not very advanced in this regard, and I
    61         know of no reason to assume that advances will be made in this
    62         area anytime soon.)</para>
     51            <title>The Thread Context Record</title>
     52
     53            <para>When a lisp thread is first created (or when a thread
     54          created by foreign code first calls back to lisp), a data
     55          structure called a Thread Context Record (or TCR) is
     56          allocated and initialized.  On modern versions of Linux and
     57          FreeBSD, the allocation actually happens via a set of
     58          thread-local-storage ABI extensions, so a thread's TCR is
     59          created when the thread is created and dies when the thread
     60          dies.  (The World's Most Advanced Operating System&mdash;as
     61          Apple's marketing literature refers to Darwin&mdash;is not
     62          very advanced in this regard, and I know of no reason to
     63          assume that advances will be made in this area anytime
     64          soon.)</para>
    6365        <para>A TCR contains a few dozen fields (and is therefore a
    64         few hundred bytes in size.)  The fields are mostly
    65         thread-specific information about the thread's stacks'
    66         locations and sizes, information about the underlying (POSIX)
    67         thread, and information about the thread's dynamic binding
    68         history and pending CATCH/UNWIND-PROTECTs.  Some of this
    69         information could be kept in individual machine registers
    70         while the thread is running (and the PPC - which has more
    71         registers available - keeps a few things in registers that the
    72         X86-64 has to access via the TCR), but it's important to
    73         remember that the information is thread-specific and can't
    74         (for instance) be kept in a fixed global memory
    75         location.</para>
     66          few hundred bytes in size.)  The fields are mostly
     67          thread-specific information about the thread's stacks'
     68          locations and sizes, information about the underlying (POSIX)
     69          thread, and information about the thread's dynamic binding
     70          history and pending CATCH/UNWIND-PROTECTs.  Some of this
     71          information could be kept in individual machine registers
     72          while the thread is running (and the PPC - which has more
     73          registers available - keeps a few things in registers that the
     74          X86-64 has to access via the TCR), but it's important to
     75          remember that the information is thread-specific and can't
     76          (for instance) be kept in a fixed global memory
     77          location.</para>
    7678        <para>When lisp code is running, the current thread's TCR is
    77         kept in a register.  On PPC platforms, a general purpose
    78         register is used; on x86-64, an (otherwise nearly useless)
    79         segment register works well (prevents the expenditure of a
    80         more generally useful general- purpose register for this
    81         purpose.)</para>
     79          kept in a register.  On PPC platforms, a general purpose
     80          register is used; on x86-64, an (otherwise nearly useless)
     81          segment register works well (prevents the expenditure of a
     82          more generally useful general- purpose register for this
     83          purpose.)</para>
    8284        <para>The address of a TCR is aligned in memory in such a way
    83         that a FIXNUM can be used to represent it.  The lisp function
    84         CCL::%CURRENT-TCR returns the calling thread's TCR as a
    85         fixnum; actual value of the TCR's address is 4 or 8 times the
    86         value of this fixnum.</para>
     85          that a FIXNUM can be used to represent it.  The lisp function
     86          CCL::%CURRENT-TCR returns the calling thread's TCR as a
     87          fixnum; actual value of the TCR's address is 4 or 8 times the
     88          value of this fixnum.</para>
    8789        <para>When the lisp kernel initializes a new TCR, it's added
    88         to a global list maintained by the kernel; when a thread
    89         exits, its TCR is removed from this list.</para>
     90          to a global list maintained by the kernel; when a thread
     91          exits, its TCR is removed from this list.</para>
    9092        <para>When a thread calls foreign code, lisp stack pointers
    91         are saved in its TCR, lisp registers (at least those whose
    92         value should be preserved across the call) are saved on the
    93         thread's value stack, and (on x86-64) RSP is switched to the
    94         control stack.  A field in the TCR (tcr.valence) is then set
    95         to indicate that the thread is running foreign code, foreign
    96         argument registers are loaded from a frame on the foreign
    97         stack, and the foreign function is called. (That's a little
    98         oversimplified and possibly inaccurate, but the important
    99         things to note are that the thread "stops following lisp stack
    100         and register usage conventions" and that it advertises the
    101         fact that it's done so.  Similar transitions in a thread's
    102         state ("valence") occur when it enters of exits an exception
    103         handler (which is sort of an OS/hardware-mandated foreign
    104         function call where the OS thoughtfully saves the thread's
    105         register state for it beforehand.)</para>
     93          are saved in its TCR, lisp registers (at least those whose
     94          value should be preserved across the call) are saved on the
     95          thread's value stack, and (on x86-64) RSP is switched to the
     96          control stack.  A field in the TCR (tcr.valence) is then set
     97          to indicate that the thread is running foreign code, foreign
     98          argument registers are loaded from a frame on the foreign
     99          stack, and the foreign function is called. (That's a little
     100          oversimplified and possibly inaccurate, but the important
     101          things to note are that the thread "stops following lisp
     102          stack and register usage conventions" and that it advertises
     103          the fact that it's done so.  Similar transitions in a
     104          thread's state ("valence") occur when it enters or exits an
     105          exception handler (which is sort of an OS/hardware-mandated
     106          foreign function call where the OS thoughtfully saves the
     107          thread's register state for it beforehand.)</para>
    106108      </sect2>
    107109
    108110      <sect2 id="Exception-contexts-comma---and-exception-handling-in-general">
    109         <title>Exception contexts, and exception-handling in general</title>
     111            <title>Exception contexts, and exception-handling in general</title>
    110112        <para>Unix-like OSes tend to refer to exceptions as "signals";
    111         the same general mechanism ("signal handling") is used to
    112         process both asynchronous OS-level events (such as the result
    113         of the keyboard driver noticing that ^C or ^Z has been
    114         pressed) and synchronous hardware-level events (like trying to
    115         execute an illegal instruction or access protected memory.)
    116         It makes some sense to defer ("block") handling of
    117         asynchronous signals so that some critical code sequences
    118         complete without interruption; since it's generally not
    119         possible for a thread to proceed after a synchronous exception
    120         unless and until its state is modified by an exception
    121         handler, it makes no sense to talk about blocking synchronous
    122         signals (though some OSes will let you do so and doing so can
    123         have mysterious effects.)</para>
     113          the same general mechanism ("signal handling") is used to
     114          process both asynchronous OS-level events (such as the result
     115          of the keyboard driver noticing that ^C or ^Z has been
     116          pressed) and synchronous hardware-level events (like trying to
     117          execute an illegal instruction or access protected memory.)
     118          It makes some sense to defer ("block") handling of
     119          asynchronous signals so that some critical code sequences
     120          complete without interruption; since it's generally not
     121          possible for a thread to proceed after a synchronous exception
     122          unless and until its state is modified by an exception
     123          handler, it makes no sense to talk about blocking synchronous
     124          signals (though some OSes will let you do so and doing so can
     125          have mysterious effects.)</para>
    124126        <para>On OSX/Darwin, the POSIX signal handling facilities
    125         coexist with lower-level Mach-based exception handling
    126         facilities.  Unfortunately, the way that this is implemented
    127         interacts poorly with debugging tools: GDB will generally stop
    128         whenever the target program encounters a Mach-level exception
    129         and offers no way to proceed from that point (and let the
    130         program's POSIX signal handler try to handle the exception);
    131         Apple's CrashReporter program has had a similar issue and,
    132         depending on how it's configured, may bombard the user with
    133         alert dialogs which falsely claim that an application has
    134         crashed (when in fact the application in question has
    135         routinely handled a routine exception.)  On Darwin/OSX,
    136         &CCL; uses Mach thread-level exception handling facilities
    137         which run before GDB or CrashReporter get a chance to confuse
    138         themselves; &CCL;'s Mach exception handling tries to force
    139         the thread which received a synchronous exception to invoke a
    140         signal handling function ("as if" signal handling worked more
    141         usefully under Darwin.)  Mach exception handlers run in a
    142         dedicated thread (which basically does nothing but wait for
    143         exception messages from the lisp kernel, obtain and modify
    144         information about the state of threads in which exceptions
    145         have occurred, and reply to the exception messages with an
    146         indication that the exception has been handled.  The reply
    147         from a thread-level exception handler keeps the exception from
    148         being reported to GDB or CrashReporter and avoids the problems
    149         related to those programs.  Since &CCL;'s Mach exception
    150         handler doesn't claim to handle debugging-related exceptions
    151         (from breakpoints or single-step operations), it's possible to
    152         use GDB to debug &CCL;.</para>
     127          coexist with lower-level Mach-based exception handling
     128          facilities.  Unfortunately, the way that this is implemented
     129          interacts poorly with debugging tools: GDB will generally stop
     130          whenever the target program encounters a Mach-level exception
     131          and offers no way to proceed from that point (and let the
     132          program's POSIX signal handler try to handle the exception);
     133          Apple's CrashReporter program has had a similar issue and,
     134          depending on how it's configured, may bombard the user with
     135          alert dialogs which falsely claim that an application has
     136          crashed (when in fact the application in question has
     137          routinely handled a routine exception.)  On Darwin/OSX,
     138          &CCL; uses Mach thread-level exception handling facilities
     139          which run before GDB or CrashReporter get a chance to confuse
     140          themselves; &CCL;'s Mach exception handling tries to force
     141          the thread which received a synchronous exception to invoke a
     142          signal handling function ("as if" signal handling worked more
     143          usefully under Darwin.)  Mach exception handlers run in a
     144          dedicated thread (which basically does nothing but wait for
     145          exception messages from the lisp kernel, obtain and modify
     146          information about the state of threads in which exceptions
     147          have occurred, and reply to the exception messages with an
     148          indication that the exception has been handled.  The reply
     149          from a thread-level exception handler keeps the exception from
     150          being reported to GDB or CrashReporter and avoids the problems
     151          related to those programs.  Since &CCL;'s Mach exception
     152          handler doesn't claim to handle debugging-related exceptions
     153          (from breakpoints or single-step operations), it's possible to
     154          use GDB to debug &CCL;.</para>
    153155        <para>On platforms where signal handling and debugging don't
    154         get in each other's way, a signal handler is entered with all
    155         signals blocked.  (This behavior is specified in the call to
    156         the sigaction() function which established the signal
    157         handler.)  The signal handler receives three arguments from
    158         the OS kernel; the first is an integer which identifies the
    159         signal, the second is a pointer to an object of type
    160         "siginfo_t", which may or may not contain a few fields that
    161         would help to identify the cause of the exception, and the
    162         third argument is a pointer to a data structure (called a
    163         "ucontext" or something similar) which contains
    164         machine-dependent information about the state of the tread at
    165         the time that the exception/signal occurred.  While
    166         asynchronous signals are blocked, the signal handler stores
    167         the pointer to its third argument (the "signal context") in a
    168         field in the current thread's TCR, sets some bits in another
    169         TCR field to indicate that the thread is now waiting to handle
    170         an exception, unblocks asynchronous signals, and waits for a
    171         global exception lock which serializes exception
    172         processing.</para>
     156          get in each other's way, a signal handler is entered with
     157          all signals blocked.  (This behavior is specified in the
     158          call to the sigaction() function which established the
     159          signal handler.)  The signal handler receives three
     160          arguments from the OS kernel; the first is an integer that
     161          identifies the signal, the second is a pointer to an object
     162          of type "siginfo_t", which may or may not contain a few
     163          fields that would help to identify the cause of the
     164          exception, and the third argument is a pointer to a data
     165          structure (called a "ucontext" or something similar), which
     166          contains machine-dependent information about the state of
     167          the thread at the time that the exception/signal occurred.
     168          While asynchronous signals are blocked, the signal handler
     169          stores the pointer to its third argument (the "signal
     170          context") in a field in the current thread's TCR, sets some
     171          bits in another TCR field to indicate that the thread is now
     172          waiting to handle an exception, unblocks asynchronous
     173          signals, and waits for a global exception lock that
     174          serializes exception processing.</para>
    173175        <para>On Darwin, the Mach exception thread creates a signal
    174         context (and maybe a siginfo_t structure), stores the signal
    175         context in the thread's TCR, sets the TCR field which describes
    176         the thread's state, and arranges that the thread resume
    177         execution at its signal handling function (with a signal
    178         handler, possibly NULL siginfo_t, and signal context as
    179         arguments.  When the thread resumes, it waits for the global
    180         exception lock.</para>
     176          context (and maybe a siginfo_t structure), stores the signal
     177          context in the thread's TCR, sets the TCR field which describes
     178          the thread's state, and arranges that the thread resume
     179          execution at its signal handling function (with a signal
     180          handler, possibly NULL siginfo_t, and signal context as
     181          arguments.  When the thread resumes, it waits for the global
     182          exception lock.</para>
    181183        <para>On x86-64 platforms where signal handing can be used to
    182         handle synchronous exceptions, there's an additional
    183         complication: the OS kernel ordinarily allocates the signal
    184         context and siginfo structures on the stack of the thread
    185         which received the signal; in practice, that means "wherever
    186         RSP is pointing."  &CCL;'s <xref
    187         linkend="Register-and-stack-usage-conventions"/> require that
    188         the thread's value stack - where RSP is usually pointing while
    189         lisp code is running - contain only "nodes" (properly tagged
    190         lisp objects), and scribbling a signal context all over the
    191         value stack would violate this requirement.  To maintain
    192         consistency, the sigaltstack() mechanism is used to cause the
    193         signal to be delivered on (and the signal context and siginfo
    194         to be allocated on) a special stack area (the last few pages
    195         of the thread's control stack, in practice.  When the signal
    196         handler runs, it (carefully) copies the signal context and
    197         siginfo to the thread's control stack and makes RSP point into
    198         that stack before invoking the "real" signal handler.  (The
    199         effect of this hack is that the "real" signal handler always
    200         runs on the thread's control stack.)</para>
     184          handle synchronous exceptions, there's an additional
     185          complication: the OS kernel ordinarily allocates the signal
     186          context and siginfo structures on the stack of the thread
     187          that received the signal; in practice, that means "wherever
     188          RSP is pointing."  &CCL;'s
     189          <xref linkend="Register-and-stack-usage-conventions"/>
     190          require that the thread's value stack&mdash;where RSP is
     191          usually pointing while lisp code is running&mdash;contain
     192          only "nodes" (properly tagged lisp objects), and scribbling
     193          a signal context all over the value stack would violate this
     194          requirement.  To maintain consistency, the sigaltstack()
     195          mechanism is used to cause the signal to be delivered on
     196          (and the signal context and siginfo to be allocated on) a
     197          special stack area (the last few pages of the thread's
     198          control stack, in practice).  When the signal handler runs,
     199          it (carefully) copies the signal context and siginfo to the
     200          thread's control stack and makes RSP point into that stack
     201          before invoking the "real" signal handler. The effect of
     202          this hack is that the "real" signal handler always runs on
     203          the thread's control stack.</para>
    201204        <para>Once the exception handler has obtained the global
    202         exception lock, it uses the values of the signal number,
    203         siginfo_t, and signal context arguments to determine the
    204         (logical) cause of the exception.  Some exceptions may be
    205         caused by factors that should generate lisp errors or other
    206         serious conditions (stack overflow); if this is the case, the
    207         kernel code may release the global exception lock and call out
    208         to lisp code.  (The lisp code in question may need to repeat
    209         some of the exception decoding process; in particular, it
    210         needs to be able to interpret register values in the signal
    211         context that it receives as an argument.)</para>
     205          exception lock, it uses the values of the signal number,
     206          siginfo_t, and signal context arguments to determine the
     207          (logical) cause of the exception.  Some exceptions may be
     208          caused by factors that should generate lisp errors or other
     209          serious conditions (stack overflow); if this is the case, the
     210          kernel code may release the global exception lock and call out
     211          to lisp code.  (The lisp code in question may need to repeat
     212          some of the exception decoding process; in particular, it
     213          needs to be able to interpret register values in the signal
     214          context that it receives as an argument.)</para>
    212215        <para>In some cases, the lisp kernel exception handler may not
    213         be able to recover from the exception (this is currently true
    214         of some types of memory-access fault and is also true of traps
    215         or illegal instructions that occur during foreign code
    216         execution.  In such cases, the kernel exception handler
    217         reports the exception as "unhandled", and the kernel debugger
    218         is invoked.</para>
     216          be able to recover from the exception (this is currently true
     217          of some types of memory-access fault and is also true of traps
     218          or illegal instructions that occur during foreign code
     219          execution.  In such cases, the kernel exception handler
     220          reports the exception as "unhandled", and the kernel debugger
     221          is invoked.</para>
    219222        <para>If the kernel exception handler identifies the
    220         exception's cause as being a transient out-of-memory condition
    221         (indicating that the current thread needs more memory to cons
    222         in), it tries to make that memory available.  In some cases,
    223         doing so involves invoking the GC.</para>
     223          exception's cause as being a transient out-of-memory condition
     224          (indicating that the current thread needs more memory to cons
     225          in), it tries to make that memory available.  In some cases,
     226          doing so involves invoking the GC.</para>
    224227      </sect2>
    225228
    226229      <sect2 id="Threads-comma---exceptions-comma---and-the-GC">
    227         <title>Threads, exceptions, and the GC</title>
     230            <title>Threads, exceptions, and the GC</title>
    228231        <para>&CCL;'s GC is not concurrent: when the GC is invoked in
    229         response to an exception in a particular thread, all other
    230         lisp threads must stop until the GC's work is done.  The
    231         thread that triggered the GC iterates over the global TCR
    232         list, sending each other thread a distinguished "suspend"
    233         signal, then iterates over the list again, waiting for a
    234         per-thread semaphore that indicates that the thread has
    235         received the "suspend" signal and responded appropriately.
    236         Once all other threads have acknowledged the request to
    237         suspend themselves, the GC thread can run the GC proper (after
    238         doing any necessary <xref linkend="PC-lusering"/>.)  Once the
    239         GC's completed its work, the thread that invoked the GC
    240         iterates over the global TCR list, raising a per-thread
    241         "resume" semaphore for each other thread.</para>
     232          response to an exception in a particular thread, all other
     233          lisp threads must stop until the GC's work is done.  The
     234          thread that triggered the GC iterates over the global TCR
     235          list, sending each other thread a distinguished "suspend"
     236          signal, then iterates over the list again, waiting for a
     237          per-thread semaphore that indicates that the thread has
     238          received the "suspend" signal and responded appropriately.
     239          Once all other threads have acknowledged the request to
     240          suspend themselves, the GC thread can run the GC proper (after
     241          doing any necessary <xref linkend="PC-lusering"/>.)  Once the
     242          GC's completed its work, the thread that invoked the GC
     243          iterates over the global TCR list, raising a per-thread
     244          "resume" semaphore for each other thread.</para>
    242245        <para>The signal handler for the asynchronous "suspend" signal
    243         is entered with all asynchronous signals blocked.  It saves
    244         its signal-context argument in a TCR slot, raises the tcr's
    245         "suspend" semaphore, then waits on the TCR's "resume"
    246         semaphore.</para>
     246          is entered with all asynchronous signals blocked.  It saves
     247          its signal-context argument in a TCR slot, raises the tcr's
     248          "suspend" semaphore, then waits on the TCR's "resume"
     249          semaphore.</para>
    247250        <para>The GC thread has access to the signal contexts of all
    248         TCRs (including its own) at the time when the thread received
    249         an exception or acknowledged a request to suspend itself.
    250         This information (and information about stack areas in the TCR
    251         itself) allows the GC to identify the "stack locations and
    252         register contents" that are elements of the GC's root
    253         set.</para>
     251          TCRs (including its own) at the time when the thread received
     252          an exception or acknowledged a request to suspend itself.
     253          This information (and information about stack areas in the TCR
     254          itself) allows the GC to identify the "stack locations and
     255          register contents" that are elements of the GC's root
     256          set.</para>
    254257      </sect2>
    255258
    256259      <sect2 id="PC-lusering">
    257         <title>PC-lusering</title>
     260            <title>PC-lusering</title>
    258261        <para>It's not quite accurate to say that &CCL;'s compiler
    259         and runtime follow precise stack and register usage
    260         conventions at all times; there are a few exceptions:</para>
    261 
    262         <itemizedlist>
     262          and runtime follow precise stack and register usage
     263          conventions at all times; there are a few exceptions:</para>
     264
     265            <itemizedlist>
    263266          <listitem>
    264             <para>On both PPC and x86-64 platforms, consing isn't
    265             fully atomic.It takes at least a few instructions to
    266             allocate an object in memory(and slap a header on it if
    267             necessary); if a thread is interrupted in the middle of
    268             that instruction sequence, the new object may or may
    269             not have been created or fully initialized at the point in
    270             time that the interrupt occurred.  (There are actually a
    271             few different states of partial initialization)</para>
    272           </listitem>
    273           <listitem>
    274             <para>On the PPC, the common act of building a lisp
    275             control stack frame involves allocating a four-word frame
    276             and storing three register values into that frame.  (The
    277             fourth word - the back pointer to the previous frame - is
    278             automatically set when the frame is allocated.)  The
    279             previous contents of those three words are unknown (there
    280             might have been a foreign stack frame at the same address a
    281             few instructions earlier),so interrupting a thread that's
    282             in the process of initializing a PPC control stack frame
    283             isn't GC-safe.</para>
    284           </listitem>
     267                <para>On both PPC and x86-64 platforms, consing isn't
     268                  fully atomic.It takes at least a few instructions to
     269                  allocate an object in memory(and slap a header on it if
     270                  necessary); if a thread is interrupted in the middle of
     271                  that instruction sequence, the new object may or may
     272                  not have been created or fully initialized at the point in
     273                  time that the interrupt occurred.  (There are actually a
     274                  few different states of partial initialization)</para>
     275              </listitem>
     276              <listitem>
     277                <para>On the PPC, the common act of building a lisp
     278                  control stack frame involves allocating a four-word frame
     279                  and storing three register values into that frame.  (The
     280                  fourth word - the back pointer to the previous frame - is
     281                  automatically set when the frame is allocated.)  The
     282                  previous contents of those three words are unknown (there
     283                  might have been a foreign stack frame at the same address a
     284                  few instructions earlier),so interrupting a thread that's
     285                  in the process of initializing a PPC control stack frame
     286                  isn't GC-safe.</para>
     287              </listitem>
    285288          <listitem>
    286             <para>There are similar problems with the initialization
    287             of temp stackframes on the PPC.  (Allocation and
    288             initialization doesn't happen atomically, and the newly
    289             allocated stack memory may have undefined contents.)</para>
    290           </listitem>
     289                <para>There are similar problems with the initialization
     290                  of temp stackframes on the PPC.  (Allocation and
     291                  initialization doesn't happen atomically, and the newly
     292                  allocated stack memory may have undefined contents.)</para>
     293              </listitem>
    291294          <listitem>
    292             <para><xref linkend="The-ephemeral-GC"/>'s write barrier
    293             has to be implemented atomically (i.e.,both an
    294             intergenerational store and the update of a
    295             corresponding reference bit has to happen without
    296             interruption, or neither of these events can
    297             happen.)</para>
    298           </listitem>
     295                <para><xref linkend="The-ephemeral-GC"/>'s write barrier
     296                  has to be implemented atomically (i.e.,both an
     297                  intergenerational store and the update of a
     298                  corresponding reference bit has to happen without
     299                  interruption, or neither of these events can
     300                  happen.)</para>
     301              </listitem>
    299302          <listitem>
    300             <para>There are a few more similar cases.</para>
    301           </listitem>
     303                <para>There are a few more similar cases.</para>
     304              </listitem>
    302305        </itemizedlist>
    303306
    304307        <para>Fortunately, the number of these non-atomic instruction
    305         sequences is small, and fortunately it's fairly easy for the
    306         interrupting thread to recognize when the interrupted thread
    307         is in the middle of such a sequence.  When this is detected,
    308         the interrupting thread modifies the state of the interrupted
    309         thread (modifying its PC and other registers) so that it is no
    310         longer in the middle of such a sequence (it's either backed
    311         out of it or the remaining instructions are emulated.)</para>
     308          sequences is small, and fortunately it's fairly easy for the
     309          interrupting thread to recognize when the interrupted thread
     310          is in the middle of such a sequence.  When this is detected,
     311          the interrupting thread modifies the state of the interrupted
     312          thread (modifying its PC and other registers) so that it is no
     313          longer in the middle of such a sequence (it's either backed
     314          out of it or the remaining instructions are emulated.)</para>
    312315        <para>This works because (a) many of the troublesome
    313         instruction sequences are PPC-specific and it's relatively
    314         easy to partially disassemble the instructions surrounding the
    315         interrupted thread's PC on the PPC and (b) those instruction
    316         sequences are heavily stylized and intended to be easily
    317         recognized.</para>
     316          instruction sequences are PPC-specific and it's relatively
     317          easy to partially disassemble the instructions surrounding the
     318          interrupted thread's PC on the PPC and (b) those instruction
     319          sequences are heavily stylized and intended to be easily
     320          recognized.</para>
    318321      </sect2>
    319322    </sect1>
     
    321324    <sect1 id="Register-usage-and-tagging">
    322325      <title>Register usage and tagging</title>
    323 
     326     
    324327      <sect2 id="Register-usage-and-tagging-overview">
    325         <title>Overview</title>
    326         <para>Regardless of other details of its implementation, a
    327         garbage collector's job is to partition the set of all
    328         heap-allocated lisp objects (CONSes, STRINGs, INSTANCEs, etc.)
    329         into two subsets.  The first subset contains all objects that
    330         are transitively referenced from a small set of "root" objects
    331         (the contents of the stacks and registers of all active
    332         threads at the time the GC occurs and the values of some
    333         global variables.)  The second subset contains everything
    334         else: those lisp objects that are not transitively reachable
    335         from the roots are garbage, and the memory occupied by garbage
    336         objects can be reclaimed (since the GC has just proven that
    337         it's impossible to reference them.)</para>
     328            <title>Overview</title>
     329            <para>Regardless of other details of its implementation, a
     330              garbage collector's job is to partition the set of all
     331              heap-allocated lisp objects (CONSes, STRINGs, INSTANCEs, etc.)
     332              into two subsets.  The first subset contains all objects that
     333              are transitively referenced from a small set of "root" objects
     334              (the contents of the stacks and registers of all active
     335              threads at the time the GC occurs and the values of some
     336              global variables.)  The second subset contains everything
     337              else: those lisp objects that are not transitively reachable
     338              from the roots are garbage, and the memory occupied by garbage
     339              objects can be reclaimed (since the GC has just proven that
     340              it's impossible to reference them.)</para>
    338341        <para>The set of live, reachable lisp objects basically form
    339         the nodes of a (usually large) graph, with edges from each
    340         node A to any other objects (nodes) that object A
    341         references.</para>
     342          the nodes of a (usually large) graph, with edges from each
     343          node A to any other objects (nodes) that object A
     344          references.</para>
    342345        <para>Some nodes in this graph can never have outgoing edges:
    343         an array with a specialized numeric or character type usually
    344         represents its elements in some (possibly more compact)
    345         specialized way.  Some nodes may refer to lisp objects that
    346         are never allocated in memory (FIXNUMs, CHARACTERs,
    347         SINGLE-FLOATs on 64-bit platforms ..)  This latter class of
    348         objects are sometimes called "immediates", but that's a little
    349         confusing because the term "immediate" is sometimes used to
    350         refer to things that can never be part of the big connectivity
    351         graph (e.g., the "raw" bits that make up a floating-point
    352         value, foreign address, or numeric value that needs to be used
    353         - at least fleetingly - in compiled code.)</para>
     346          an array with a specialized numeric or character type usually
     347          represents its elements in some (possibly more compact)
     348          specialized way.  Some nodes may refer to lisp objects that
     349          are never allocated in memory (FIXNUMs, CHARACTERs,
     350          SINGLE-FLOATs on 64-bit platforms ..)  This latter class of
     351          objects are sometimes called "immediates", but that's a little
     352          confusing because the term "immediate" is sometimes used to
     353          refer to things that can never be part of the big connectivity
     354          graph (e.g., the "raw" bits that make up a floating-point
     355          value, foreign address, or numeric value that needs to be used
     356          - at least fleetingly - in compiled code.)</para>
    354357        <para>For the GC to be able to build the connectivity graph
    355         reliably, it's necessary for it to be able to reliably tell
    356         (a) whether or not a "potential root" - the contents of a
    357         machine register or stack location - is in fact a node and (b)
    358         for any node, whether it may have components that refer to
    359         other nodes.</para>
     358          reliably, it's necessary for it to be able to reliably tell
     359          (a) whether or not a "potential root" - the contents of a
     360          machine register or stack location - is in fact a node and (b)
     361          for any node, whether it may have components that refer to
     362          other nodes.</para>
    360363        <para>There's no reliable way to answer the first question on
    361         stock hardware.  (If everything was a node, as might be the
    362         case on specially microcoded "lisp machine" hardware, it
    363         wouldn't even need to be asked.)  Since there's no way to just
    364         look at a machine word (the contents of a machine register or
    365         stack location) and tell whether or not it's a node or just
    366         some random non-node value, we have to either adopt and
    367         enforce strict conventions on register and stack usage or
    368         tolerate ambiguity.</para>
     364          stock hardware.  (If everything was a node, as might be the
     365          case on specially microcoded "lisp machine" hardware, it
     366          wouldn't even need to be asked.)  Since there's no way to just
     367          look at a machine word (the contents of a machine register or
     368          stack location) and tell whether or not it's a node or just
     369          some random non-node value, we have to either adopt and
     370          enforce strict conventions on register and stack usage or
     371          tolerate ambiguity.</para>
    369372        <para>"Tolerating ambiguity" is an approach taken by some
    370         ("conservative") GC schemes; by contrast, &CCL;'s GC is
    371         "precise", which in this case means that it believes that the
    372         contents of certain machine registers and stack locations are
    373         always nodes and that other registers and stack locations are
    374         never nodes and that these conventions are never violated by
    375         the compiler or runtime system.  The fact that threads are
    376         preemptively scheduled means that a GC could occur (because of
    377         activity in some other thread) on any instruction boundary,
    378         which in turn means that the compiler and runtime system must
    379         follow precise <xref
    380         linkend="Register-and-stack-usage-conventions"/> at all
    381         times.</para>
     373          ("conservative") GC schemes; by contrast, &CCL;'s GC is
     374          "precise", which in this case means that it believes that the
     375          contents of certain machine registers and stack locations are
     376          always nodes and that other registers and stack locations are
     377          never nodes and that these conventions are never violated by
     378          the compiler or runtime system.  The fact that threads are
     379          preemptively scheduled means that a GC could occur (because of
     380          activity in some other thread) on any instruction boundary,
     381          which in turn means that the compiler and runtime system must
     382          follow precise <xref
     383                            linkend="Register-and-stack-usage-conventions"/> at all
     384          times.</para>
    382385        <para>Once we've decided that a given machine word is a node,
    383         a <xref linkend="Tagging-scheme"/> describes how the node's
    384         value and type are encoded in that machine word.</para>
    385         <para>Most of this - so far - has discussed things from the
    386         GC's very low-level perspective.  From a much higher point of
    387         view, lisp functions accept nodes as arguments, return nodes
    388         as values, and (usually) perform some operations on those
    389         arguments in order to produce those results.  (In many cases,
    390         the operations in question involve raw non-node values.)
    391         Higher-level parts of the lisp type system (functions like
    392         TYPE-OF and CLASS-OF, etc.) depend on the <xref
    393         linkend="Tagging-scheme"/>.</para>
     386          a <xref linkend="Tagging-scheme"/> describes how the node's
     387          value and type are encoded in that machine word.</para>
     388        <para>Most of this discussion&mdash;so far&mdash;has treated
     389          things from the GC's very low-level perspective. From a much
     390          higher point of view, lisp functions accept nodes as
     391          arguments, return nodes as values, and (usually) perform
     392          some operations on those arguments in order to produce those
     393          results.  (In many cases, the operations in question involve
     394          raw non-node values.)  Higher-level parts of the lisp type
     395          system (functions like TYPE-OF and CLASS-OF, etc.) depend on
     396          the <xref linkend="Tagging-scheme"/>.</para>
    394397      </sect2>
    395398
    396399      <sect2 id="pc-locatives-on-the-PPC">
    397         <title>pc-locatives on the PPC</title>
     400            <title>pc-locatives on the PPC</title>
    398401        <para>On the PPC, there's a third case (besides "node" and
    399         "immediate" values).  As discussed below, a node that denotes
    400         a memory-allocated lisp object is a biased (tagged) pointer
    401         -to- that object; it's not generally possible to point -into-
    402         some composite (multi-element) object (such a pointer would
    403         not be a node, and the GC would have no way to update the
    404         pointer if it were to move the underlying object.)</para>
     402          "immediate" values).  As discussed below, a node that denotes
     403          a memory-allocated lisp object is a biased (tagged) pointer
     404          -to- that object; it's not generally possible to point -into-
     405          some composite (multi-element) object (such a pointer would
     406          not be a node, and the GC would have no way to update the
     407          pointer if it were to move the underlying object.)</para>
    405408        <para>Such a pointer ("into" the interior of a heap-allocated
    406         object) is often called a <emphasis>locative</emphasis>; the
    407         cases where locatives are allowed in &CCL; mostly involve
    408         the behavior of function call and return instructions.  (To be
    409         technically accurate, the other case also arises on x86-64, but
    410         that case isn't as user-visible.)</para>
     409          object) is often called a <emphasis>locative</emphasis>; the
     410          cases where locatives are allowed in &CCL; mostly involve
     411          the behavior of function call and return instructions.  (To be
     412          technically accurate, the other case also arises on x86-64, but
     413          that case isn't as user-visible.)</para>
    411414        <para>On the PowerPC (both PPC32 and PPC64), all machine
    412         instructions are 32 bits wide and all instruction words are
    413         allocated on 32-bit boundaries.  In PPC &CCL;, a CODE-VECTOR
    414         is a specialized type of vector-like object; its elements are
    415         32-bit PPC machine instructions.  A CODE-VECTOR is an
    416         attribute of FUNCTION object; a function call involves
    417         accessing the function's code-vector and jumping to the
    418         address of its first instruction.</para>
     415          instructions are 32 bits wide and all instruction words are
     416          allocated on 32-bit boundaries.  In PPC &CCL;, a CODE-VECTOR
     417          is a specialized type of vector-like object; its elements
     418          are 32-bit PPC machine instructions.  A CODE-VECTOR is an
     419          attribute of a FUNCTION object; a function call involves
     420          accessing the function's code-vector and jumping to the
     421          address of its first instruction.</para>
    419422        <para>As each instruction in the code vector sequentially
    420         executes, the hardware program counter (PC) register advances
    421         to the address of the next instruction (a locative into the
    422         code vector); since PPC instructions are always 32 bits wide
    423         and aligned on 32-bit boundaries, the low two bits of the PC
    424         are always 0.  If the function executes a call (simple call
    425         instructions have the mnemonic "bl" on the PPC, which stands
    426         for "branch and link"), the address of the next instruction
    427         (also a word-aligned locative into a code-vector) is copied
    428         into the special- purpose PPC "link register" (lr); a function
    429         returns to its caller via a "branch to link register" (blr)
    430         instruction.  Some cases of function call and return might
    431         also use the PPC's "count register" (ctr), and if either the
    432         lr or ctr needs to be stored in memory it needs to first be
    433         copied to a general-purpose register.</para>
     423          executes, the hardware program counter (PC) register advances
     424          to the address of the next instruction (a locative into the
     425          code vector); since PPC instructions are always 32 bits wide
     426          and aligned on 32-bit boundaries, the low two bits of the PC
     427          are always 0.  If the function executes a call (simple call
     428          instructions have the mnemonic "bl" on the PPC, which stands
     429          for "branch and link"), the address of the next instruction
     430          (also a word-aligned locative into a code-vector) is copied
     431          into the special- purpose PPC "link register" (lr); a function
     432          returns to its caller via a "branch to link register" (blr)
     433          instruction.  Some cases of function call and return might
     434          also use the PPC's "count register" (ctr), and if either the
     435          lr or ctr needs to be stored in memory it needs to first be
     436          copied to a general-purpose register.</para>
    434437        <para>&CCL;'s GC understands that certain registers contain
    435         these special "pc-locatives" (locatives that point into
    436         CODE-VECTOR objects); it contains special support for finding
    437         the containing CODE-VECTOR object and for adjusting all of
    438         these "pc-locatives" if the containing object is moved in
    439         memory.  The first part of that - finding the containing
    440         object - is possible and practical on the PPC because of
    441         architectural artifacts (fixed-width instructions and arcana
    442         of instruction encoding.)  It's not possible on x86-64, but
    443         fortunately not necessary either (though the second part -
    444         adjusting the PC/RIP when the containing object moves) is both
    445         necessary and simple.</para>
     438          these special "pc-locatives" (locatives that point into
     439          CODE-VECTOR objects); it contains special support for
     440          finding the containing CODE-VECTOR object and for adjusting
     441          all of these "pc-locatives" if the containing object is
     442          moved in memory.  The first part of that
     443          operation&mdash;finding the containing object&mdash;is
     444          possible and practical on the PPC because of architectural
     445          artifacts (fixed-width instructions and arcana of
     446          instruction encoding.)  It's not possible on x86-64, but
     447          fortunately not necessary either (though the second part -
     448          adjusting the PC/RIP when the containing object moves) is
     449          both necessary and simple.</para>
    446450      </sect2>
    447451
    448452      <sect2 id="Register-and-stack-usage-conventions">
    449453        <title>Register and stack usage conventions</title>
    450 
     454       
    451455        <sect3 id="Stack-conventions">
    452           <title>Stack conventions</title>
     456              <title>Stack conventions</title>
    453457          <para>On both PPC and X86 platforms, each lisp thread uses 3
    454           stacks; the ways in which these stacks are used differs
    455           between the PPC and X86.</para>
     458            stacks; the ways in which these stacks are used differs
     459            between the PPC and X86.</para>
    456460          <para>Each thread has:</para>
    457           <itemizedlist>
     461              <itemizedlist>
    458462            <listitem>
    459               <para>A "control stack".  On both platforms, this is
    460               "the stack" used by foreign code.  On the PPC, it
    461               consists of a linked list of frames where the first word
    462               in each frame points to the first word in the previous
    463               frame (and the outermost frame points to 0.)  Some
    464               frames on a PPC control stack are lisp frames; lisp
    465               frames are always 4 words in size and contain (in
    466               addition to the back pointer to the previous frame) the
    467               calling function (a node), the return address (a
    468               "locative" into the calling function's code-vector), and
    469               the value to which the value-stack pointer (see below)
    470               should be restored on function exit.  On the PPC, the GC
    471               has to look at control-stack frames, identify which of
    472               those frames are lisp frames, and treat the contents of
    473               the saved function slot as a node (and handle the return
    474               address locative specially.)  On x86-64, the control
    475               stack is used for dynamic-extent allocation of immediate
    476               objects.  Since the control stack never contains nodes
    477               on x86-64, the GC ignores it on that platform.
    478               Alignment of the control stack follows the ABI
    479               conventions of the platform (at least at any point in
    480               time where foreign code could run.)  On PPC, the r1
    481               register always points to the top of the current
    482               thread's control stack; on x86-64, the RSP register
    483               points to the top of the current thread's control stack
    484               when the thread is running foreign code and the address
    485               of the top of the control stack is kept in the thread's
    486               TCR (see <xref linkend="The-Thread-Context-Record"/>
    487               when not running foreign code.  The control stack "grows
    488               down."</para>
    489             </listitem>
     463                  <para>A "control stack".  On both platforms, this is
     464                    "the stack" used by foreign code.  On the PPC, it
     465                    consists of a linked list of frames where the first word
     466                    in each frame points to the first word in the previous
     467                    frame (and the outermost frame points to 0.)  Some
     468                    frames on a PPC control stack are lisp frames; lisp
     469                    frames are always 4 words in size and contain (in
     470                    addition to the back pointer to the previous frame) the
     471                    calling function (a node), the return address (a
     472                    "locative" into the calling function's code-vector), and
     473                    the value to which the value-stack pointer (see below)
     474                    should be restored on function exit.  On the PPC, the GC
     475                    has to look at control-stack frames, identify which of
     476                    those frames are lisp frames, and treat the contents of
     477                    the saved function slot as a node (and handle the return
     478                    address locative specially.)  On x86-64, the control
     479                    stack is used for dynamic-extent allocation of immediate
     480                    objects.  Since the control stack never contains nodes
     481                    on x86-64, the GC ignores it on that platform.
     482                    Alignment of the control stack follows the ABI
     483                    conventions of the platform (at least at any point in
     484                    time where foreign code could run.)  On PPC, the r1
     485                    register always points to the top of the current
     486                    thread's control stack; on x86-64, the RSP register
     487                    points to the top of the current thread's control stack
     488                    when the thread is running foreign code and the address
     489                    of the top of the control stack is kept in the thread's
     490                    TCR (see <xref linkend="The-Thread-Context-Record"/>
     491                    when not running foreign code.  The control stack "grows
     492                    down."</para>
     493                </listitem>
    490494            <listitem>
    491               <para>A "value stack".  On both platforms, all values on
    492               the value stack are nodes (including "tagged return
    493               addresses" on x86-64.)  The value stack is always
    494               aligned to the native word size; objects are always
    495               pushed on the value stack using atomic instructions
    496               ("stwu"/"stdu" on PPC, "push" on x86-64), so the
    497               contents of the value stack between its bottom and top
    498               are always unambiguously nodes; the compiler usually
    499               tries to pop or discard nodes from the value stack as
    500               soon as possible after their last use (as soon as they
    501               may have become garbage.)  On x86-64, the RSP register
    502               addresses the top of the value stack when running lisp
    503               code; that address is saved in the TCR when running
    504               foreign code.  On the PPC, a dedicated register (VSP,
    505               currently r15) is used to address the top of the value
    506               stack when running lisp code, and the VSP value is saved
    507               in the TCR when running foreign code.  The value stack
    508               grows down.</para>
    509             </listitem>
    510             <listitem>
    511               <para>A "temp stack".  The temp stack consists of a
    512               linked list of frames, each of which points to the
    513               previous temp stack frame.  The number of native machine
    514               words in each temp stack frame is always even, so the
    515               temp stack is aligned on a two-word (64- or 128-bit)
    516               boundary.  The temp stack is used for dynamic-extent
    517               objects on both platforms; on the PPC, it's used for
    518               essentially all such objects (regardless of whether or
    519               not the objects contain nodes); on the x86-64, immediate
    520               dynamic-extent objects (strings, foreign pointers, etc.)
    521               are allocated on the control stack and only
    522               node-containing dynamic-extent objects are allocated on
    523               the temp stack.  Data structures used to implement CATCH
    524               and UNWIND-PROTECT are stored on the temp stack on both
    525               ppc and x86-64.  Temp stack frames are always doublenode
    526               aligned and objects within a temp stack frame are
    527               aligned on doublenode boundaries.  The first word in
    528               each frame contains a back pointer to the previous
    529               frame; on the PPC, the second word is used to indicate
    530               to the GC whether the remaining objects are nodes (if
    531               the second word is 0) or immediate (otherwise.)  On
    532               x86-64, where temp stack frames always contain nodes,
    533               the second word is always 0.  The temp stack grows down.
    534               It usually takes several instructions to allocate and
    535               safely initialize a temp stack frame that's intended to
    536               contain nodes, and the GC has to recognize the case
    537               where a thread is in the process of allocating and
    538               initializing a temp stack frame and take care not to
    539               interpret any uninitialized words in the frame as nodes.
    540               See (someplace).  The PPC keeps the current top of the
    541               temp stack in a dedicated register (TSP, currently r12)
    542               when running lisp code and saves this register's value
    543               in the TCR when running foreign code.  The x86-64 keeps
    544               the address of the top of each thread's temp stack in
    545               the thread's TCR.</para>
    546             </listitem>
     495                  <para>A "value stack".  On both platforms, all values on
     496                    the value stack are nodes (including "tagged return
     497                    addresses" on x86-64.)  The value stack is always
     498                    aligned to the native word size; objects are always
     499                    pushed on the value stack using atomic instructions
     500                    ("stwu"/"stdu" on PPC, "push" on x86-64), so the
     501                    contents of the value stack between its bottom and top
     502                    are always unambiguously nodes; the compiler usually
     503                    tries to pop or discard nodes from the value stack as
     504                    soon as possible after their last use (as soon as they
     505                    may have become garbage.)  On x86-64, the RSP register
     506                    addresses the top of the value stack when running lisp
     507                    code; that address is saved in the TCR when running
     508                    foreign code.  On the PPC, a dedicated register (VSP,
     509                    currently r15) is used to address the top of the value
     510                    stack when running lisp code, and the VSP value is saved
     511                    in the TCR when running foreign code.  The value stack
     512                    grows down.</para>
     513                </listitem>
     514                <listitem>
     515                  <para>A "temp stack".  The temp stack consists of a
     516                    linked list of frames, each of which points to the
     517                    previous temp stack frame.  The number of native
     518                    machine words in each temp stack frame is always even,
     519                    so the temp stack is aligned on a two-word (64- or
     520                    128-bit) boundary.  The temp stack is used for
     521                    dynamic-extent objects on both platforms; on the PPC,
     522                    it's used for essentially all such objects (regardless
     523                    of whether or not the objects contain nodes); on the
     524                    x86-64, immediate dynamic-extent objects (strings,
     525                    foreign pointers, etc.)  are allocated on the control
     526                    stack and only node-containing dynamic-extent objects
     527                    are allocated on the temp stack.  Data structures used
     528                    to implement CATCH and UNWIND-PROTECT are stored on
     529                    the temp stack on both ppc and x86-64.  Temp stack
     530                    frames are always doublenode aligned and objects
     531                    within a temp stack frame are aligned on doublenode
     532                    boundaries.  The first word in each frame contains a
     533                    back pointer to the previous frame; on the PPC, the
     534                    second word is used to indicate to the GC whether the
     535                    remaining objects are nodes (if the second word is 0)
     536                    or immediate (otherwise.)  On x86-64, where temp stack
     537                    frames always contain nodes, the second word is always
     538                    0.  The temp stack grows down.  It usually takes
     539                    several instructions to allocate and safely initialize
     540                    a temp stack frame that's intended to contain nodes,
     541                    and the GC has to recognize the case where a thread is
     542                    in the process of allocating and initializing a temp
     543                    stack frame and take care not to interpret any
     544                    uninitialized words in the frame as nodes. The PPC
     545                    keeps the current top of the temp stack in a dedicated
     546                    register (TSP, currently r12) when running lisp code
     547                    and saves this register's value in the TCR when
     548                    running foreign code.  The x86-64 keeps the address of
     549                    the top of each thread's temp stack in the thread's
     550                    TCR.</para>
     551                </listitem>
    547552          </itemizedlist>
    548553        </sect3>
    549554
    550555        <sect3 id="Register-conventions">
    551           <title>Register conventions</title>
     556              <title>Register conventions</title>
    552557          <para>If there are a "reasonable" (for some value of
    553           "reasonable") number or general-purpose registers and the
    554           instruction set is "reasonably" orthogonal (most
    555           instructions that operate on GPRs can operate on any GPR),
    556           then it's possible to statically partition the GPRs into at
    557           least two sets: "immediate registers" never contain nodes,
    558           and "node registers" always contain nodes.  (On the PPC, a
    559           few registers are members of a third set of "PC locatives",
    560           and on both platforms some registers may have dedicated
    561           roles as stack or heap pointers; the latter class is treated
    562           as immediates by the GC proper but may be used to help
    563           determine the bounds of stack and heap memory areas.)</para>
    564           <para>The ultimate definition of register partitioning is
    565           hardwired into the GC in functions like "mark_xp()" and
    566           "forward_xp()", which process the values of some of the
    567           registers in an exception frame as nodes and may give some
    568           sort of special treatment to other register values they
    569           encounter there.)</para>
     558            "reasonable") number of general-purpose registers and the
     559            instruction set is "reasonably" orthogonal (most
     560            instructions that operate on GPRs can operate on any GPR),
     561            then it's possible to statically partition the GPRs into at
     562            least two sets: "immediate registers" never contain nodes,
     563            and "node registers" always contain nodes.  (On the PPC, a
     564            few registers are members of a third set of "PC locatives",
     565            and on both platforms some registers may have dedicated
     566            roles as stack or heap pointers; the latter class is treated
     567            as immediates by the GC proper but may be used to help
     568            determine the bounds of stack and heap memory areas.)</para>
     569              <para>The ultimate definition of register partitioning is
     570            hardwired into the GC in functions like "mark_xp()" and
     571            "forward_xp()", which process the values of some of the
     572            registers in an exception frame as nodes and may give some
     573            sort of special treatment to other register values they
     574            encounter there.)</para>
    570575          <para>On x86-64, the static register partitioning scheme involves:</para>
    571           <itemizedlist>
     576              <itemizedlist>
    572577            <listitem>
    573               <para>(only) three "immediate" registers.</para>
    574               <para>The RAX, RCX, and RDX registers are used as the
    575               implicit operands and results of some extended-precision
    576               multiply and divide instructions which generally involve
    577               non-node values; since their use in these instructions
    578               means that they can't be guaranteed to contain node
    579               values at all times, it's natural to put these registers
    580               in the "immediate" set. RAX is generally given the
    581               symbolic name "imm0", RDX is given the symbolic name
    582               "imm1" and RCX is given the symbolic name "imm2"; you
    583               may see these names in disassembled code, usually in
    584               operations involving type checking, array indexing, and
    585               foreign memory and function access.</para>
    586             </listitem>
     578                  <para>(only) three "immediate" registers.</para>
     579                  <para>The RAX, RCX, and RDX registers are used as the
     580                    implicit operands and results of some extended-precision
     581                    multiply and divide instructions which generally involve
     582                    non-node values; since their use in these instructions
     583                    means that they can't be guaranteed to contain node
     584                    values at all times, it's natural to put these registers
     585                    in the "immediate" set. RAX is generally given the
     586                    symbolic name "imm0", RDX is given the symbolic name
     587                    "imm1" and RCX is given the symbolic name "imm2"; you
     588                    may see these names in disassembled code, usually in
     589                    operations involving type checking, array indexing, and
     590                    foreign memory and function access.</para>
     591                </listitem>
    587592            <listitem>
    588               <para>(only) two "dedicated" registers.</para>
    589               <para>RSP and RBP have
    590               dedicated functionality dictated by the hardware and
    591               calling conventions.</para>
    592             </listitem>
     593                  <para>(only) two "dedicated" registers.</para>
     594                  <para>RSP and RBP have
     595                    dedicated functionality dictated by the hardware and
     596                    calling conventions.</para>
     597                </listitem>
    593598            <listitem>
    594               <para>11 "node" registers.</para>
    595               <para>All other registers (RBX, RSI, RDI, and R8-R15)
    596               are asserted to contain node values at (almost) all
    597               times; legacy "string" operations that implicitly use RSI
    598               and/or RDI are not used.</para>
    599             </listitem>
    600           </itemizedlist>
     599                  <para>11 "node" registers.</para>
     600                  <para>All other registers (RBX, RSI, RDI, and R8-R15)
     601                    are asserted to contain node values at (almost) all
     602                    times; legacy "string" operations that implicitly use RSI
     603                    and/or RDI are not used.</para>
     604                </listitem>
     605              </itemizedlist>
     606
    601607          <para>On the PPC, the static register partitioning scheme
    602           involves:</para>
    603 
    604           <itemizedlist>
     608            involves:</para>
     609              <itemizedlist>
    605610            <listitem>
    606               <para>6 "immediate" registers.</para>
    607               <para>Registers r3-r8 are given
    608               the symbolic names imm0-imm5.  As a RISC architecture
    609               with simpler addressing modes, the PPC probably
    610               uses immediate registers a bit more often than the CISC
    611               x86-64 does, but they're generally used for the same sort
    612               of things (type checking, array indexing, FFI,
    613               etc.)</para>
    614             </listitem>
    615             <listitem>
    616               <para>9 dedicated registers
    617               <itemizedlist>
    618                 <listitem>
    619                   <para>r0 (symbolic name rzero) always contains the
    620                   value 0 when running lisp code.  Its value is
    621                   sometimes read as 0 when it's used as the base
    622                   register in a memory address; keeping the value 0
    623                   there is sometimes convenient and avoids
    624                   asymmetry.</para>
    625                 </listitem>
    626                 <listitem>
    627                   <para>r1 (symbolic name sp) is the control stack
    628                   pointer, by PPC convention.</para>
    629                 </listitem>
    630                 <listitem>
    631                   <para>r2 is used to hold the current thread's TCR on
    632                   ppc64 systems; it's not used on ppc32.</para>
    633                 </listitem>
    634                 <listitem>
    635                   <para>r9 and r10 (symbolic names allocptr and
    636                   allocbase) are used to do per-thread memory
    637                   allocation</para>
    638                 </listitem>
    639                 <listitem>
    640                   <para>r11 (symbolic name nargs) contains the number
    641                   of function arguments on entry and the number of
    642                   return values in multiple-value returning
    643                   constructs.  It's not used more generally as either
    644                   a node or immediate register because of the way that
    645                   certain trap instruction encodings are
    646                   interpreted.</para>
    647                 </listitem>
    648                 <listitem>
    649                   <para>r12 (symbolic name tsp) holds the top of the
    650                   current thread's temp stack.</para>
    651                 </listitem>
    652                 <listitem>
    653                   <para>r13 is used to hold the TCR on PPC32 systems;
    654                   it's not used on PPC64.</para>
    655                 </listitem>
    656                 <listitem>
    657                   <para>r14 (symbolic name loc-pc) is used to copy
    658                   "pc-locative" values between main memory and
    659                   special-purpose PPC registers (LR and CTR) used in
    660                   function-call and return instructions.</para>
    661                 </listitem>
    662                 <listitem>
    663                   <para>r15 (symbolic name vsp) addresses the top of
    664                   the current thread's value stack.</para>
    665                 </listitem>
    666                 <listitem>
    667                   <para>lr and ctr are PPC branch-unit registers used
    668                   in function call and return instructions; they're
    669                   always treated as "pc-locatives", which precludes
    670                   the use of the ctr in some PPC looping
    671                   constructs.</para>
    672                 </listitem>
    673              
    674               </itemizedlist>
    675               </para>
    676             </listitem>
     611                  <para>6 "immediate" registers.</para>
     612                  <para>Registers r3-r8 are given
     613                    the symbolic names imm0-imm5.  As a RISC architecture
     614                    with simpler addressing modes, the PPC probably
     615                    uses immediate registers a bit more often than the CISC
     616                    x86-64 does, but they're generally used for the same sort
     617                    of things (type checking, array indexing, FFI,
     618                    etc.)</para>
     619                </listitem>
     620                <listitem>
     621                  <para>9 dedicated registers
     622                    <itemizedlist>
     623                          <listitem>
     624                            <para>r0 (symbolic name rzero) always contains the
     625                              value 0 when running lisp code.  Its value is
     626                              sometimes read as 0 when it's used as the base
     627                              register in a memory address; keeping the value 0
     628                              there is sometimes convenient and avoids
     629                              asymmetry.</para>
     630                          </listitem>
     631                          <listitem>
     632                            <para>r1 (symbolic name sp) is the control stack
     633                              pointer, by PPC convention.</para>
     634                          </listitem>
     635                  <listitem>
     636                            <para>r2 is used to hold the current thread's TCR on
     637                              ppc64 systems; it's not used on ppc32.</para>
     638                          </listitem>
     639                  <listitem>
     640                            <para>r9 and r10 (symbolic names allocptr and
     641                              allocbase) are used to do per-thread memory
     642                              allocation</para>
     643                          </listitem>
     644                  <listitem>
     645                            <para>r11 (symbolic name nargs) contains the number
     646                              of function arguments on entry and the number of
     647                              return values in multiple-value returning
     648                              constructs.  It's not used more generally as either
     649                              a node or immediate register because of the way that
     650                              certain trap instruction encodings are
     651                              interpreted.</para>
     652                          </listitem>
     653                  <listitem>
     654                            <para>r12 (symbolic name tsp) holds the top of the
     655                              current thread's temp stack.</para>
     656                          </listitem>
     657                          <listitem>
     658                            <para>r13 is used to hold the TCR on PPC32 systems;
     659                              it's not used on PPC64.</para>
     660                          </listitem>
     661                          <listitem>
     662                            <para>r14 (symbolic name loc-pc) is used to copy
     663                              "pc-locative" values between main memory and
     664                              special-purpose PPC registers (LR and CTR) used in
     665                              function-call and return instructions.</para>
     666                          </listitem>
     667                  <listitem>
     668                            <para>r15 (symbolic name vsp) addresses the top of
     669                              the current thread's value stack.</para>
     670                          </listitem>
     671                          <listitem>
     672                            <para>lr and ctr are PPC branch-unit registers used
     673                              in function call and return instructions; they're
     674                              always treated as "pc-locatives", which precludes
     675                              the use of the ctr in some PPC looping
     676                              constructs.</para>
     677                          </listitem>
     678                 
     679                    </itemizedlist>
     680                  </para>
     681                </listitem>
    677682            <listitem>
    678               <para>17 "node" registers</para>
    679               <para>r15-r31 are always treated as node
    680               registers</para>
    681             </listitem>
    682            
     683                  <para>17 "node" registers</para>
     684                  <para>r15-r31 are always treated as node
     685                    registers</para>
     686                </listitem>
     687               
    683688          </itemizedlist>
    684689        </sect3>
     
    686691
    687692      <sect2 id="Tagging-scheme">
    688         <title>Tagging scheme</title>
     693            <title>Tagging scheme</title>
    689694        <para>&CCL; always allocates lisp objects on double-node
    690         (64-bit for 32-bit platforms, 128-bit for 64-bit platforms)
    691         boundaries; this mean that the low 3 bits (32-bit lisp) or 4
    692         bits (64-bit lisp) are always 0 and are therefore redundant
    693         (we only really need to know the upper 29 or 60 bits in order
    694         to identify the aligned object address.)  The extra bits in a
    695         lisp node can be used to encode at least some information
    696         about the node's type, and the other 29/60 bits represent
    697         either an immediate value or a doublenode-aligned memory
    698         address.  The low 3 or 4 bits of a node are called the node's
    699         "tag bits", and the conventions used to encode type
    700         information in those tag bits are called a "tagging
    701         scheme."</para>
     695          (64-bit for 32-bit platforms, 128-bit for 64-bit platforms)
     696          boundaries; this mean that the low 3 bits (32-bit lisp) or 4
     697          bits (64-bit lisp) are always 0 and are therefore redundant
     698          (we only really need to know the upper 29 or 60 bits in order
     699          to identify the aligned object address.)  The extra bits in a
     700          lisp node can be used to encode at least some information
     701          about the node's type, and the other 29/60 bits represent
     702          either an immediate value or a doublenode-aligned memory
     703          address.  The low 3 or 4 bits of a node are called the node's
     704          "tag bits", and the conventions used to encode type
     705          information in those tag bits are called a "tagging
     706          scheme."</para>
    702707        <para>It might be possible to use the same tagging scheme on
    703         all platforms (at least on all platforms with the same word
    704         size and/or the same number of available tag bits), but there
    705         are often some strong reasons for not doing so.  These
    706         arguments tend to be very machine-specific: sometimes, there
    707         are fairly obvious machine-dependent tricks that can be
    708         exploited to make common operations on some types of tagged
    709         objects faster; other times, there are architectural
    710         restrictions that make it impractical to use certain tags for
    711         certain types.  (On PPC64, the "ld" (load doubleword) and
    712         "std" (store doubleword) instructions - which load and store a
    713         GPR operand at the effective address formed by adding the
    714         value of another GPR operand and a 16-bit constant operand -
    715         require that the low two bits of that constant operand be 0.
    716         Since such instructions would typically be used to access the
    717         fields of things like CONS cells and structures, it's
    718         desirable that that the tags chosen for CONS cells and
    719         structures allow the use of these instructions as opposed to
    720         more expensive alternatives.)</para>
     708          all platforms (at least on all platforms with the same word
     709          size and/or the same number of available tag bits), but there
     710          are often some strong reasons for not doing so.  These
     711          arguments tend to be very machine-specific: sometimes, there
     712          are fairly obvious machine-dependent tricks that can be
     713          exploited to make common operations on some types of tagged
     714          objects faster; other times, there are architectural
     715          restrictions that make it impractical to use certain tags for
     716          certain types.  (On PPC64, the "ld" (load doubleword) and
     717          "std" (store doubleword) instructions - which load and store a
     718          GPR operand at the effective address formed by adding the
     719          value of another GPR operand and a 16-bit constant operand -
     720          require that the low two bits of that constant operand be 0.
     721          Since such instructions would typically be used to access the
     722          fields of things like CONS cells and structures, it's
     723          desirable that that the tags chosen for CONS cells and
     724          structures allow the use of these instructions as opposed to
     725          more expensive alternatives.)</para>
    721726        <para>One architecture-dependent tagging trick that works well
    722         on all architectures is to use a tag of 0 for FIXNUMs: a
    723         fixnum basically encodes its value shifted left a few bits and
    724         keeps those low bits clear. FIXNUM addition, subtraction, and
    725         binary logical operations can operate directly on the node
    726         operands, addition and subtraction can exploit hardware-based
    727         overflow detection, and (in the absence of overflow) the
    728         hardware result of those operations is a node (fixnum).  Some
    729         other slightly-less-common operations may require a few extra
    730         instructions, but arithmetic operations on FIXNUMs should be
    731         as cheap as possible and using a tag of zero for FIXNUMs helps
    732         to ensure that it will be.</para>
    733         <para>If we have N available tag bits (N = 3 for 32-bit
    734         &CCL; and N = 4 for 64-bit &CCL;), this way of
    735         representing fixnums with the low M bits forced to 0 works as
    736         long as M &lt;= N.  The smaller we make M, the larger the
    737         values of MOST-POSITIVE-FIXNUM and MOST-NEGATIVE become; the
    738         larger we make N, the more distinct non-FIXNUM tags become
    739         available.  A reasonable compromise is to choose M = N-1; this
    740         basically yields two distinct FIXNUM tags (one for even
    741         fixnums, one for odd fixnums), gives 30-bit fixnums on 32-bit
    742         platforms and 61-bit fixnums on 64-bit platforms, and leaves
    743         us with 6 or 14 tags to encoded other types.</para>
     727          on all architectures is to use a tag of 0 for FIXNUMs: a
     728          fixnum basically encodes its value shifted left a few bits
     729          and keeps those low bits clear. FIXNUM addition,
     730          subtraction, and binary logical operations can operate
     731          directly on the node operands, addition and subtraction can
     732          exploit hardware-based overflow detection, and (in the
     733          absence of overflow) the hardware result of those operations
     734          is a node (fixnum).  Some other slightly-less-common
     735          operations may require a few extra instructions, but
     736          arithmetic operations on FIXNUMs should be as cheap as
     737          possible and using a tag of zero for FIXNUMs helps to ensure
     738          that it will be.</para>
     739            <para>If we have N available tag bits (N = 3 for 32-bit &CCL;
     740              and N = 4 for 64-bit &CCL;), this way of representing
     741              fixnums with the low M bits forced to 0 works as long as M
     742              &lt;= N.  The smaller we make M, the larger the values of
     743              MOST-POSITIVE-FIXNUM and MOST-NEGATIVE become; the larger we
     744              make N, the more distinct non-FIXNUM tags become available.
     745              A reasonable compromise is to choose M = N-1; this basically
     746              yields two distinct FIXNUM tags (one for even fixnums, one
     747              for odd fixnums), gives 30-bit fixnums on 32-bit platforms
     748              and 61-bit fixnums on 64-bit platforms, and leaves us with 6
     749              or 14 tags to encoded other types.</para>
    744750        <para>Once we get past the assignment of FIXNUM tags, things
    745         quickly devolve into machine-dependencies.  We can fairly
    746         easily see that we can't directly all other primitive lisp
    747         object types with only 6 or 14 available tag values; the
    748         details of how types are encoded vary between the ppc32,
    749         ppc64, and x86-64 implementations, but there are some general
    750         common principles:</para>
    751 
    752         <itemizedlist>
    753           <listitem>
    754             <para>CONS cells always contain exactly 2 elements and are
    755             usually fairly common.It therefore makes sense to give
    756             CONS cells their own tag.  Unlike the fixnum case - where a
    757             tag value of 0 had positive implications - there doesn't
    758             seem to be any advantage to using any particular value.
    759             (A longtime ago - in the case of 68K MCL - the CONS tag
    760             and the order of CAR and CDR in memory were chosen to allow
    761             smaller, cheaper addressing modes to be used to "cdr down a
    762             list."  That's not a factor on ppc or x86-64,but all
    763             versions of &CCL; still store the CDR of a CONS cell
    764             first in memory.  It doesn't matter, but doing it the way
    765             that the host system did made boostrapping to a new target
    766             system a little easier.)
    767             </para>
    768           </listitem>
    769           <listitem>
    770             <para>Any way you look at it, NIL is a bit ... unusual. NIL
    771             is both a SYMBOL and a LIST (as well as being a canonical
    772             truth value and probably a few other things.)  Its role as
    773             a LIST is probably much more important to most programs
    774             than its role as a SYMBOL is:LISTP has to be true of NIL
    775             and primitives like CAR and CDR do LISTP implicitly when
    776             safe and want that operation to be fast.There are several
    777             possible approaches to this; &CCL; uses two of them. On
    778             PPC32 and X86-64, NIL is basically a weird CONS cell that
    779             straddles two doublenodes; the tag of NIL is unique and
    780             congruent modulo 4 (modulo 8 on 64-bit) with the tag used
    781             for CONS cells.  LISTP is therefore true of any node whose
    782             low 2 (or 3) bits contain the appropriate tag value (it's
    783             not otherwise necessary to special-case NIL.)
    784             SYMBOL accessors (SYMBOL-NAME, SYMBOL-VALUE, SYMBOL-PLIST
    785             ..) -do- have to special-case NIL (and access the
    786             components of an internal proxy symbol.) On PPC64 (where
    787             architectural restrictions dictate the set of tags that can
    788             be used to access fixed components of an object),
    789             that approach wasn't practical.  NIL is just a
    790             distinguished SYMBOL,and it just happens to be the case
    791             that its pname slot and values lots are at the same offsets
    792             from a tagged pointer as a CONS cell's CDR and CAR would be.
    793             NIL's pname is set to NIL (SYMBOL-NAME checks for this and
    794             returns the string "NIL"), and LISTP (and therefore safe
    795             CAR and CDR) have to check for (OR NULL CONSP). At least in
    796             the case of CAR and CDR, the fact that the PPC has multiple
    797             condition-code fields keeps that extra test from
    798             being prohibitively expensive.</para>
    799           </listitem>
    800           <listitem>
    801             <para>Some objects are immediate (but not FIXNUMs).This is
    802             true of CHARACTERs and, on 64-bit platforms,
    803             SINGLE-FLOATs.It's also true of some nodes used in the
    804             runtime system (special values used to indicate unbound
    805             variables and slots, for instance.) On 64-bit platforms,
    806             SINGLE-FLOATs have their own unique tag (making them a
    807             little easier to recognize; on all platforms, CHARACTERs
    808             share a tag with other immediate objects (unbound markers)
    809             but are easy to recognize (by looking at several of their
    810             low bits.)  The GC treats any node with an immediate tag
    811             (and any node with a fixnum tag) as a leaf.</para>
    812           </listitem>
     751          quickly devolve into machine-dependencies.  We can fairly
     752          easily see that we can't directly tag all other primitive
     753          lisp object types with only 6 or 14 available tag values;
     754          the details of how types are encoded vary between the ppc32,
     755          ppc64, and x86-64 implementations, but there are some
     756          general common principles:</para>
     757
     758            <itemizedlist>
     759              <listitem>
     760                <para>CONS cells always contain exactly 2 elements and are
     761                  usually fairly common.It therefore makes sense to give
     762                  CONS cells their own tag.  Unlike the fixnum case -
     763                  where a tag value of 0 had positive implications - there
     764                  doesn't seem to be any advantage to using any particular
     765                  value.  (A longtime ago - in the case of 68K MCL - the
     766                  CONS tag and the order of CAR and CDR in memory were
     767                  chosen to allow smaller, cheaper addressing modes to be
     768                  used to "cdr down a list."  That's not a factor on ppc
     769                  or x86-64, but all versions of &CCL; still store the CDR
     770                  of a CONS cell first in memory.  It doesn't matter, but
     771                  doing it the way that the host system did made
     772                  boostrapping to a new target system a little easier.)
     773                </para>
     774              </listitem>
     775              <listitem>
     776                <para>Any way you look at it, NIL is a bit
     777                  ... unusual. NIL is both a SYMBOL and a LIST (as well as
     778                  being a canonical truth value and probably a few other
     779                  things.)  Its role as a LIST is probably much more
     780                  important to most programs than its role as a SYMBOL is:
     781                  LISTP has to be true of NIL and primitives like CAR and
     782                  CDR do LISTP implicitly when safe and want that
     783                  operation to be fast. There are several possible
     784                  approaches to this problem; &CCL; uses two of them. On
     785                  PPC32 and X86-64, NIL is basically a weird CONS cell
     786                  that straddles two doublenodes; the tag of NIL is unique
     787                  and congruent modulo 4 (modulo 8 on 64-bit) with the tag
     788                  used for CONS cells.  LISTP is therefore true of any
     789                  node whose low 2 (or 3) bits contain the appropriate tag
     790                  value (it's not otherwise necessary to special-case
     791                  NIL.)  SYMBOL accessors (SYMBOL-NAME, SYMBOL-VALUE,
     792                  SYMBOL-PLIST ..) -do- have to special-case NIL (and
     793                  access the components of an internal proxy symbol.) On
     794                  PPC64 (where architectural restrictions dictate the set
     795                  of tags that can be used to access fixed components of
     796                  an object), that approach wasn't practical.  NIL is just
     797                  a distinguished SYMBOL,and it just happens to be the
     798                  case that its pname slot and values slot are at the same
     799                  offsets from a tagged pointer as a CONS cell's CDR and
     800                  CAR would be.  NIL's pname is set to NIL (SYMBOL-NAME
     801                  checks for this and returns the string "NIL"), and LISTP
     802                  (and therefore safe CAR and CDR) has to check for (OR
     803                  NULL CONSP). At least in the case of CAR and CDR, the
     804                  fact that the PPC has multiple condition-code fields
     805                  keeps that extra test from being prohibitively
     806                  expensive.</para>
     807              </listitem>
     808              <listitem>
     809                <para>Some objects are immediate (but not FIXNUMs). This
     810                  is true of CHARACTERs and, on 64-bit platforms,
     811                  SINGLE-FLOATs. It's also true of some nodes used in the
     812                  runtime system (special values used to indicate unbound
     813                  variables and slots, for instance.) On 64-bit platforms,
     814                  SINGLE-FLOATs have their own unique tag (making them a
     815                  little easier to recognize; on all platforms, CHARACTERs
     816                  share a tag with other immediate objects (unbound
     817                  markers) but are easy to recognize (by looking at
     818                  several of their low bits.)  The GC treats any node with
     819                  an immediate tag (and any node with a fixnum tag) as a
     820                  leaf.</para>
     821              </listitem>
    813822          <listitem>
    814             <para>There are some advantages to treating everything
    815             else - memory-allocated objects that aren't CONS cells -
    816             uniformly.There are some disadvantages to that uniform
    817             treatment as well, and the treatment of "memory-allocated
    818             non-CONS objects" isn't entirely uniform across all
    819             &CCL; implementations.  Let's first pretend that
    820             the treatment is uniform, then discuss the ways in which it
    821             isn't.The "uniform approach" is to treat all
    822             memory-allocated non-CONS objects as if they were vectors;
    823             this use of the term is a little looser than what's implied
    824             by the CL VECTOR type.  &CCL; actually uses the
    825             term "uvector" to mean "a memory-allocated lisp object
    826             other than a CONS cell,whose first word is a header which
    827             describes the object's type and the number of elements that
    828             it contains."  In this view, a SYMBOL is a UVECTOR, as is a
    829             STRING, a STANDARD-INSTANCE, a CL array or vector,a
    830             FUNCTION, and even a DOUBLE-FLOAT.In the PPC
    831             implementations (where things are a little more
    832             ... uniform),a single tag value is used to denote any
    833             uvector; in order to determine something more specific
    834             about the type of the object in question, it's necessary to
    835             fetch the low byte of the header word from memory.  On
    836             the x86-64 platform, certain types of uvectors - SYMBOLs
    837             and FUNCTIONs -are given their own unique tags.  The good
    838             news about the x86-64 approach is that SYMBOLs and
    839             FUNCTIONs can be recognized without referencing memory; the
    840             slightly bad news is that primitive operations that work on
    841             UVECTOR-tagged objects - like the function CCL:UVREF -
    842             don't work on SYMBOLs or FUNCTIONs on x86-64 (but -do- work
    843             on those types of objects in the PPC ports.) The header word
    844             which precedes a UVECTOR's data in memory contains 8 bits
    845             of type information in the low byte and either 24 or 56
    846             bits of"element-count" information in the rest of the
    847             word.  (This is where the sometimes-limiting value of 2^24
    848             for ARRAY-TOTAL-SIZE-LIMIT on PPC32 platforms comes from.)
    849             The low byte of the header - sometimes called the uvector's
    850             subtag - is itself tagged (which means that the header is
    851             tagged.)  The (3 or 4) tag bits in the subtag are used to
    852             determine whether the uvector's elements are nodes or
    853             immediates.(A UVECTOR whose elements are nodes is called a
    854             GVECTOR; a UVECTOR whose elements are immediates is called
    855             an IVECTOR.  This terminology came from Spice Lisp, which
    856             was a predecessor of CMUCL.)  Even though a uvector header
    857             is tagged, a header is not a node.  There's no (supported)
    858             way to get your hands on one in lisp and doing so could be
    859             dangerous.  (If the value of a header wound up in a lisp
    860             node register and that register wound up getting pushed on
    861             a thread's value stack, the GC might misinterpret that
    862             situation to mean that there was a stack-allocated UVECTOR
    863             on the value stack.)</para>
    864           </listitem>
    865        
    866         </itemizedlist>
     823                <para>There are some advantages to treating everything
     824                  else&mdash;memory-allocated objects that aren't CONS
     825                  cells&mdash;uniformly.There are some disadvantages to
     826                  that uniform treatment as well, and the treatment of
     827                  "memory-allocated non-CONS objects" isn't entirely
     828                  uniform across all &CCL; implementations.  Let's first
     829                  pretend that the treatment is uniform, then discuss the
     830                  ways in which it isn't.The "uniform approach" is to
     831                  treat all memory-allocated non-CONS objects as if they
     832                  were vectors; this use of the term is a little looser
     833                  than what's implied by the CL VECTOR type.  &CCL;
     834                  actually uses the term "uvector" to mean "a
     835                  memory-allocated lisp object other than a CONS cell,
     836                  whose first word is a header that describes the object's
     837                  type and the number of elements that it contains."  In
     838                  this view, a SYMBOL is a UVECTOR, as is a STRING, a
     839                  STANDARD-INSTANCE, a CL array or vector, a FUNCTION, and
     840                  even a DOUBLE-FLOAT. In the PPC implementations (where
     841                  things are a little more ... uniform), a single tag
     842                  value is used to denote any uvector; in order to
     843                  determine something more specific about the type of the
     844                  object in question, it's necessary to fetch the low byte
     845                  of the header word from memory.  On the x86-64 platform,
     846                  certain types of uvectors - SYMBOLs and FUNCTIONs -are
     847                  given their own unique tags.  The good news about the
     848                  x86-64 approach is that SYMBOLs and FUNCTIONs can be
     849                  recognized without referencing memory; the slightly bad
     850                  news is that primitive operations that work on
     851                  UVECTOR-tagged objects&mdash;like the function
     852                  CCL:UVREF&mdash;don't work on SYMBOLs or FUNCTIONs on
     853                  x86-64 (but -do- work on those types of objects in the
     854                  PPC ports.) The header word that precedes a UVECTOR's
     855                  data in memory contains 8 bits of type information in
     856                  the low byte and either 24 or 56 bits of "element-count"
     857                  information in the rest of the word.  (This is where the
     858                  sometimes-limiting value of 2^24 for
     859                  ARRAY-TOTAL-SIZE-LIMIT on PPC32 platforms comes from.)
     860                  The low byte of the header&mdash;sometimes called the
     861                  uvector's subtag&mdash;is itself tagged (which means
     862                  that the header is tagged.)  The (3 or 4) tag bits in
     863                  the subtag are used to determine whether the uvector's
     864                  elements are nodes or immediates. (A UVECTOR whose
     865                  elements are nodes is called a GVECTOR; a UVECTOR whose
     866                  elements are immediates is called an IVECTOR.  This
     867                  terminology came from Spice Lisp, which was a
     868                  predecessor of CMUCL.)  Even though a uvector header is
     869                  tagged, a header is not a node.  There's no (supported)
     870                  way to get your hands on one in lisp and doing so could
     871                  be dangerous.  (If the value of a header wound up in a
     872                  lisp node register and that register wound up getting
     873                  pushed on a thread's value stack, the GC might
     874                  misinterpret that situation to mean that there was a
     875                  stack-allocated UVECTOR on the value stack.)</para>
     876              </listitem>
     877         
     878            </itemizedlist>
    867879      </sect2>
    868880    </sect1>
     
    870882    <sect1 id="Heap-Allocation">
    871883      <title>Heap Allocation</title> <para>When the &CCL; kernel first
    872       starts up, a large contiguous chunk of the process's address
    873       space is mapped as "anonymous, no access" memory. ("Large" means
    874       different things in different contexts; on LinuxPPC32, it means
    875       "about 1 gigabyte", on DarwinPPC32, it means "about 2
    876       gigabytes", and on current 64-bit platforms it ranges from 128
    877       to 512 gigabytes, depending on OS. These values are both
    878       defaults and upper limits; the --heap-reserve argument can be
    879       used to try to reserve less than the default.)</para>
     884        starts up, a large contiguous chunk of the process's address
     885        space is mapped as "anonymous, no access" memory. ("Large"
     886        means different things in different contexts; on LinuxPPC32,
     887        it means "about 1 gigabyte", on DarwinPPC32, it means "about 2
     888        gigabytes", and on current 64-bit platforms it ranges from 128
     889        to 512 gigabytes, depending on OS. These values are both
     890        defaults and upper limits;
     891        the <literal>--heap-reserve</literal> argument can be used to
     892        try to reserve less than the default.)</para>
    880893      <para>Reserving address space that can't (yet) be read or
    881       written to doesn't cost much; in particular, it doesn't require
    882       that corresponding swap space or physical memory be available.
    883       Marking the address range as being "mapped" helps to ensure that
    884       other things (results from random calls to malloc(), dynamically
    885       loaded shared libraries) won't be allocated in this region that
    886       lisp has reserved for its own heap growth.</para>
     894        written to doesn't cost much; in particular, it doesn't require
     895        that corresponding swap space or physical memory be available.
     896        Marking the address range as being "mapped" helps to ensure that
     897        other things (results from random calls to malloc(), dynamically
     898        loaded shared libraries) won't be allocated in this region that
     899        lisp has reserved for its own heap growth.</para>
    887900      <para>A small portion (around 1/32 on 32-bit platforms and 1/64
    888       on 64-bit platforms) of that large chunk of address space is
    889       reserved for GC data structures.  Memory pages reserved for
    890       these data structures are mapped read-write as pages made
    891       writable in the main portion of the heap.</para>
     901        on 64-bit platforms) of that large chunk of address space is
     902        reserved for GC data structures.  Memory pages reserved for
     903        these data structures are mapped read-write as pages are made
     904        writable in the main portion of the heap.</para>
    892905      <para>The initial heap image is mapped into this reserved
    893       address space and an additional (LISP-HEAP-GC-THRESHOLD) bytes
    894       are mapped read-write.  GC data structures grow to match the
    895       amount of GC-able memory in the initial image + the gc
    896       threshold, and control is transferred to lisp code.  Inevitably,
    897       that code spoils everything and starts consing; there are
    898       basically three layers of memory allocation that can go
    899       on.</para>
     906        address space and an additional (LISP-HEAP-GC-THRESHOLD) bytes
     907        are mapped read-write.  GC data structures grow to match the
     908        amount of GC-able memory in the initial image plus the gc
     909        threshold, and control is transferred to lisp code.
     910        Inevitably, that code spoils everything and starts consing;
     911        there are basically three layers of memory allocation that can
     912        go on.</para>
    900913
    901914      <sect2 id="Per-thread-object-allocation">
    902         <title>Per-thread object allocation</title>
     915            <title>Per-thread object allocation</title>
    903916        <para>Each lisp thread has a private "reserved memory
    904         segment"; when a thread starts up, its reserved memory segment
    905         is empty.  PPC ports maintain the highest unallocated address
    906         and the lowest allocatable address in the current segment in
    907         registers when running lisp code; on x86-664, these values are
    908         maintained in the current threads's TCR.  (An "empty" heap
    909         segment is one whose high pointer and low pointer are equal.)
    910         When a thread is not in the middle of allocating something, the
    911         low 3 or 4 bits of the high and low pointers are clear (the
    912         pointers are doublenode-aligned.)</para>
     917          segment"; when a thread starts up, its reserved memory segment
     918          is empty.  PPC ports maintain the highest unallocated address
     919          and the lowest allocatable address in the current segment in
     920          registers when running lisp code; on x86-664, these values are
     921          maintained in the current threads's TCR.  (An "empty" heap
     922          segment is one whose high pointer and low pointer are equal.)
     923          When a thread is not in the middle of allocating something, the
     924          low 3 or 4 bits of the high and low pointers are clear (the
     925          pointers are doublenode-aligned.)</para>
    913926        <para>A thread tries to allocate an object whose physical size
    914         in bytes is X and whose tag is Y by:</para>
    915         <orderedlist>
    916           <listitem>
    917             <para>decrementing the "high" pointer by (- X Y)</para>
    918           </listitem>
    919           <listitem>
    920             <para>trapping if the high pointer is less than the low
    921             pointer</para>
    922           </listitem>
    923           <listitem>
    924             <para>using the (tagged) high pointer to initialize the
    925             object, if necessary</para>
    926           </listitem>
    927           <listitem>
    928             <para>clearing the low bits of the high pointer</para>
    929           </listitem>
    930         </orderedlist>
     927          in bytes is X and whose tag is Y by:</para>
     928            <orderedlist>
     929              <listitem>
     930                <para>decrementing the "high" pointer by (- X Y)</para>
     931              </listitem>
     932              <listitem>
     933                <para>trapping if the high pointer is less than the low
     934                  pointer</para>
     935              </listitem>
     936              <listitem>
     937                <para>using the (tagged) high pointer to initialize the
     938                  object, if necessary</para>
     939              </listitem>
     940              <listitem>
     941                <para>clearing the low bits of the high pointer</para>
     942              </listitem>
     943            </orderedlist>
    931944        <para>On PPC32, where the size of a CONS cell is 8 bytes and
    932         the tag of a CONS cell is 1, machine code which sets the arg_z
    933         register to the result of doing (CONS arg_y arg_z) looks
    934         like:</para>
     945          the tag of a CONS cell is 1, machine code which sets the arg_z
     946          register to the result of doing (CONS arg_y arg_z) looks
     947          like:</para>
    935948        <programlisting>
    936949  (SUBI ALLOCPTR ALLOCPTR 7)    ; decrement the high pointer by (- 8 1)
     
    940953  (MR ARG_Z ALLOCPTR)           ; arg_z is the new CONS cell
    941954  (RLWINM ALLOCPTR ALLOCPTR 0 0 28)     ; clear tag bits
    942         </programlisting>
    943         <para>On x86-64, the idea's similar but the implementation is
    944         different.  The high and low pointers to the current thread's
    945         reserved segment are kept in the TCR, which is addressed by
    946         the gs segment register. An x86-64 CONS cell is 16 bytes wide
    947         and has a tag of 3; we canonically use the temp0 register to
    948         initialize the object</para>
     955            </programlisting>
     956            <para>On x86-64, the idea's similar but the implementation is
     957          different.  The high and low pointers to the current thread's
     958          reserved segment are kept in the TCR, which is addressed by
     959          the gs segment register. An x86-64 CONS cell is 16 bytes wide
     960          and has a tag of 3; we canonically use the temp0 register to
     961          initialize the object</para>
    949962        <programlisting>
    950963  (subq ($ 13) ((% gs) 216))    ; decrement allocptr
     
    958971  (movq (% arg_z) (-3 (% temp0))); set the cdr
    959972  (movq (% temp0) (% arg_z))    ; return the cons
    960         </programlisting>
     973            </programlisting>
    961974        <para>If we don't take the trap (if allocating 8-16 bytes
    962         doesn't exhaust the thread's reserved memory segment), that's
    963         a fairly short and simple instruction sequence.  If we do take
    964         the trap, we'll have to do some additional work in order to
    965         get a new segment for the current thread.</para>
     975          doesn't exhaust the thread's reserved memory segment), that's
     976          a fairly short and simple instruction sequence.  If we do take
     977          the trap, we'll have to do some additional work in order to
     978          get a new segment for the current thread.</para>
    966979      </sect2>
    967980
    968981      <sect2 id="Allocation-of-reserved-heap-segments">
    969         <title>Allocation of reserved heap segments</title>
     982            <title>Allocation of reserved heap segments</title>
    970983        <para>After the lisp image is first mapped into memory - and after
    971         each full GC - the lisp kernel ensures that
    972         (LISP-HEAP-GC-TRESHOLD) additional bytes beyond the current
    973         end of the heap are mapped read-write.</para>
     984          each full GC - the lisp kernel ensures that
     985          (LISP-HEAP-GC-TRESHOLD) additional bytes beyond the current
     986          end of the heap are mapped read-write.</para>
    974987        <para>If a thread traps while trying to allocate memory, the
    975         thread goes through the usual exception-handling protocol (to
    976         ensure that any other thread that GCs "sees" the state of the
    977         trapping thread and to serialize exception handling.)  When
    978         the exception handler runs, it determines the nature and size
    979         of the failed allocation and tries to complete the allocation
    980         on the thread's behalf (and leave it with a reasonably large
    981         thread-specific memory segment so that the next small
    982         allocation is unlikely to trap.</para>
     988          thread goes through the usual exception-handling protocol (to
     989          ensure that any other thread that GCs "sees" the state of the
     990          trapping thread and to serialize exception handling.)  When
     991          the exception handler runs, it determines the nature and size
     992          of the failed allocation and tries to complete the allocation
     993          on the thread's behalf (and leave it with a reasonably large
     994          thread-specific memory segment so that the next small
     995          allocation is unlikely to trap.</para>
    983996        <para>Depending on the size of the requested segment
    984         allocation, the number of segment allocations that have
    985         occurred since the last GC, and the EGC and GC thresholds, the
    986         segment allocation trap handler may invoke a full or ephemeral
    987         GC before returning a new segment.  It's worth noting that the
    988         [E]GC is triggered based on the number of and size of these
    989         segments that have been allocated since the last GC; it doesn't
    990         have much to do with how "full" each of those per-thread
    991         segments are.  It's possible for a large number of threads to
    992         do fairly incidental memory allocation and trigger the GC as a
    993         result; avoiding this involves tuning the per-thread
    994         allocation quantum and the GC/EGC thresholds
    995         appropriately.</para>
     997          allocation, the number of segment allocations that have
     998          occurred since the last GC, and the EGC and GC thresholds, the
     999          segment allocation trap handler may invoke a full or ephemeral
     1000          GC before returning a new segment.  It's worth noting that the
     1001          [E]GC is triggered based on the number of and size of these
     1002          segments that have been allocated since the last GC; it doesn't
     1003          have much to do with how "full" each of those per-thread
     1004          segments are.  It's possible for a large number of threads to
     1005          do fairly incidental memory allocation and trigger the GC as a
     1006          result; avoiding this involves tuning the per-thread
     1007          allocation quantum and the GC/EGC thresholds
     1008          appropriately.</para>
    9961009      </sect2>
    9971010
    9981011      <sect2 id="Heap-growth">
    999         <title>Heap growth</title>
     1012            <title>Heap growth</title>
    10001013        <para>All OSes on which &CCL; currently runs use an
    1001         "overcommit" memory allocation strategy by default (though
    1002         some of them provide ways of overriding that default.)  What
    1003         this means in general is that the OS doesn't necessarily
    1004         ensure that backing store is available when asked to map pages
    1005         as read-write; it'll often return a success indicator from the
    1006         mapping attempt (mapping the pages as "zero-fill,
    1007         copy-on-write"), and only try to allocate the backing store
    1008         (swap space and/or physical memory) when non-zero contents are
    1009         written to the pages.</para>
     1014          "overcommit" memory allocation strategy by default (though
     1015          some of them provide ways of overriding that default.)  What
     1016          this means in general is that the OS doesn't necessarily
     1017          ensure that backing store is available when asked to map pages
     1018          as read-write; it'll often return a success indicator from the
     1019          mapping attempt (mapping the pages as "zero-fill,
     1020          copy-on-write"), and only try to allocate the backing store
     1021          (swap space and/or physical memory) when non-zero contents are
     1022          written to the pages.</para>
    10101023        <para>It -sounds- like it'd be better to have the mmap() call
    1011         fail immediately, but it's actually a complicated issue.
    1012         (It's possible that other applications will stop using some
    1013         backing store before lisp code actually touches the pages that
    1014         need it, for instance.)  It's also not guaranteed that lisp
    1015         code would be able to "cleanly" signal an out-of-memory
    1016         condition if lisp is ... out of memory</para>
    1017         <para>I don't know that I've ever seen an abrupt out-of-memory
    1018         failure that wasn't preceded by several minutes of excessive
    1019         paging activity.  The most expedient course in cases like this
    1020         is to either (a) use less memory or (b) get more memory; it's
    1021         generally hard to use memory that you don't have.</para>
     1024          fail immediately, but it's actually a complicated issue.
     1025          (It's possible that other applications will stop using some
     1026          backing store before lisp code actually touches the pages that
     1027          need it, for instance.)  It's also not guaranteed that lisp
     1028          code would be able to "cleanly" signal an out-of-memory
     1029          condition if lisp is ... out of memory</para>
     1030            <para>I don't know that I've ever seen an abrupt out-of-memory
     1031              failure that wasn't preceded by several minutes of excessive
     1032              paging activity.  The most expedient course in cases like this
     1033              is to either (a) use less memory or (b) get more memory; it's
     1034              generally hard to use memory that you don't have.</para>
    10221035      </sect2>
    10231036    </sect1>
     
    10261039      <title>GC details</title>
    10271040      <para>The GC uses a Mark/Compact algorithm; its
    1028       execution time is essentially a factor of the amount of live
    1029       data in the heap. (The somewhat better-known Mark/Sweep
    1030       algorithms don't compact the live data but instead traverse the
    1031       garbage to rebuild free-lists; their execution time is therefore
    1032       a factor of the total heap size.)</para>
     1041        execution time is essentially a factor of the amount of live
     1042        data in the heap. (The somewhat better-known Mark/Sweep
     1043        algorithms don't compact the live data but instead traverse the
     1044        garbage to rebuild free-lists; their execution time is therefore
     1045        a factor of the total heap size.)</para>
    10331046      <para>As mentioned in <xref linkend="Heap-Allocation"/>, two
    1034       auxiliary data structures (proportional to the size of the lisp
    1035       heap) are maintained. These are</para>
     1047        auxiliary data structures (proportional to the size of the lisp
     1048        heap) are maintained. These are</para>
    10361049      <orderedlist>
    1037         <listitem>
    1038           <para>the markbits bitvector, which contains a bit for
    1039           every doublenode in the dynamic heap (plus a few extra words
    1040           for alignment and so that sub-bitvectors can start on word
    1041           boundaries.)</para>
    1042         </listitem>
    1043         <listitem>
    1044           <para>the relocation table, which contains a native word for
    1045           every 32 or 64 doublenodes in the dynamic heap, plus an
    1046           extra word used to keep track of the end of the heap.</para>
    1047         </listitem>
     1050            <listitem>
     1051              <para>the markbits bitvector, which contains a bit for
     1052                every doublenode in the dynamic heap (plus a few extra words
     1053                for alignment and so that sub-bitvectors can start on word
     1054                boundaries.)</para>
     1055            </listitem>
     1056            <listitem>
     1057              <para>the relocation table, which contains a native word for
     1058                every 32 or 64 doublenodes in the dynamic heap, plus an
     1059                extra word used to keep track of the end of the heap.</para>
     1060            </listitem>
    10481061      </orderedlist>
    10491062      <para>The total GC space overhead is therefore on the order of
    1050       3% (2/64 or 1/32).</para>
     1063        3% (2/64 or 1/32).</para>
    10511064      <para>The general algorithm proceeds as follows:</para>
    10521065
    10531066      <sect2 id="Mark-phase">
    1054         <title>Mark phase</title>
     1067            <title>Mark phase</title>
    10551068        <para>Each doublenode in the dynamic heap has a corresponding
    1056         bit in the markbits vector. (For any doublenode in the heap,
    1057         the index of its mark bit is determined by subtracting the
    1058         address of the start of the heap from the address of the
    1059         object and dividing the result by 8 or 16.) The GC knows the
    1060         markbit index of the free pointer, so determining that the
    1061         markbit index of a doubleword address is between the start of
    1062         the heap and the free pointer can be done with a single
    1063         unsigned comparison.</para>
     1069          bit in the markbits vector. (For any doublenode in the heap,
     1070          the index of its mark bit is determined by subtracting the
     1071          address of the start of the heap from the address of the
     1072          object and dividing the result by 8 or 16.) The GC knows the
     1073          markbit index of the free pointer, so determining that the
     1074          markbit index of a doubleword address is between the start of
     1075          the heap and the free pointer can be done with a single
     1076          unsigned comparison.</para>
    10641077        <para>The markbits of all doublenodes in the dynamic heap are
    1065         zeroed before the mark phase begins. An object is
    1066         <emphasis>marked</emphasis> if the markbits of all of its
    1067         constituent doublewords are set and unmarked otherwise;
    1068         setting an object's markbits involves setting the corresponding
    1069         markbits of all constituent doublenodes in the object.</para>
     1078          zeroed before the mark phase begins. An object is
     1079          <emphasis>marked</emphasis> if the markbits of all of its
     1080          constituent doublewords are set and unmarked otherwise;
     1081          setting an object's markbits involves setting the corresponding
     1082          markbits of all constituent doublenodes in the object.</para>
    10701083        <para>The mark phase traverses each root. If the tag of the
    1071         value of the root indicates that it's a non-immediate node
    1072         whose address lies in the lisp heap, then:</para>
    1073         <orderedlist>
    1074           <listitem>
    1075             <para>If the object is already marked, do nothing.</para>
    1076           </listitem>
    1077           <listitem>
    1078             <para>Set the object's markbit(s).</para>
    1079           </listitem>
    1080           <listitem>
    1081             <para>If the object is an ivector, do nothing further.</para>
    1082           </listitem>
    1083           <listitem>
    1084             <para>If the object is a cons cell, recursively mark its
    1085             car and cdr.</para>
    1086           </listitem>
    1087           <listitem>
    1088             <para>Otherwise, the object is a gvector. Recursively mark
    1089             its elements.</para>
    1090           </listitem>
    1091         </orderedlist>
     1084          value of the root indicates that it's a non-immediate node
     1085          whose address lies in the lisp heap, then:</para>
     1086            <orderedlist>
     1087              <listitem>
     1088                <para>If the object is already marked, do nothing.</para>
     1089              </listitem>
     1090              <listitem>
     1091                <para>Set the object's markbit(s).</para>
     1092              </listitem>
     1093              <listitem>
     1094                <para>If the object is an ivector, do nothing further.</para>
     1095              </listitem>
     1096              <listitem>
     1097                <para>If the object is a cons cell, recursively mark its
     1098                  car and cdr.</para>
     1099              </listitem>
     1100              <listitem>
     1101                <para>Otherwise, the object is a gvector. Recursively mark
     1102                  its elements.</para>
     1103              </listitem>
     1104            </orderedlist>
    10921105        <para>Marking an object thus involves ensuring that its mark
    1093         bits are set and then recursively marking any pointers
    1094         contained within the object if the object was originally
    1095         unmarked. If this recursive step was implemented in the
    1096         obvious manner, marking an object would take stack space
    1097         proportional to the length of the pointer chain from some root
    1098         to that object. Rather than storing that pointer chain
    1099         implicitly on the stack (in a series of recursive calls to the
    1100         mark subroutine), the &CCL; marker uses mixture of recursion
    1101         and a technique called <emphasis>link inversion</emphasis> to
    1102         store the pointer chain in the objects themselves.  (Recursion
    1103         tends to be simpler and faster; if a recursive step notes that
    1104         stack space is becoming limited, the link-inversion technique
    1105         is used.)</para>
     1106          bits are set and then recursively marking any pointers
     1107          contained within the object if the object was originally
     1108          unmarked. If this recursive step was implemented in the
     1109          obvious manner, marking an object would take stack space
     1110          proportional to the length of the pointer chain from some root
     1111          to that object. Rather than storing that pointer chain
     1112          implicitly on the stack (in a series of recursive calls to the
     1113          mark subroutine), the &CCL; marker uses mixture of recursion
     1114          and a technique called <emphasis>link inversion</emphasis> to
     1115          store the pointer chain in the objects themselves.  (Recursion
     1116          tends to be simpler and faster; if a recursive step notes that
     1117          stack space is becoming limited, the link-inversion technique
     1118          is used.)</para>
    11061119        <para>Certain types of objects are treated a little specially:</para>
    1107         <orderedlist>
    1108         <listitem>
    1109           <para>To support a feature called <emphasis>GCTWA
    1110               <footnote>
    1111                 <para>I believe that the acronym comes from MACLISP,
    1112                 where it stood for "Garbage Collection of Truly
    1113                 Worthless Atoms".</para>
    1114               </footnote>
    1115               , </emphasis>the vector which contains the
    1116               internal symbols of the current package is marked on
    1117               entry to the mark phase, but the symbols themselves are
    1118               not marked at this time. Near the end of the mark phase,
    1119               symbols referenced from this vector which are
    1120               not otherwise marked are marked if and only if they're
    1121               somehow distinguishable from newly created symbols (by
    1122               virtue of their having function bindings, value bindings,
    1123               plists, or other attributes.)</para>
    1124         </listitem>
    1125         <listitem>
    1126           <para>Pools have their first element set to NIL before any
    1127           other elements are marked.</para>
    1128         </listitem>
    1129         <listitem>
    1130           <para>All hash tables have certain fields (used to cache
    1131           previous results) invalidated.</para>
    1132         </listitem>
    1133         <listitem>
    1134           <para>Weak Hash Tables and other weak objects are put on a
    1135           linkedlist as they're encountered; their contents are only
    1136           retained if there are other (non-weak) references to
    1137           them.</para>
    1138         </listitem>
    1139         </orderedlist>
     1120            <orderedlist>
     1121              <listitem>
     1122                <para>To support a feature called <emphasis>GCTWA
     1123                <footnote>
     1124                          <para>I believe that the acronym comes from MACLISP,
     1125                            where it stood for "Garbage Collection of Truly
     1126                            Worthless Atoms".</para>
     1127                </footnote>
     1128                    , </emphasis>the vector that contains the internal
     1129                  symbols of the current package is marked on entry to the
     1130                  mark phase, but the symbols themselves are not marked at
     1131                  this time. Near the end of the mark phase, symbols
     1132                  referenced from this vector which are not otherwise
     1133                  marked are marked if and only if they're somehow
     1134                  distinguishable from newly created symbols (by virtue of
     1135                  their having function bindings, value bindings, plists,
     1136                  or other attributes.)</para>
     1137              </listitem>
     1138              <listitem>
     1139                <para>Pools have their first element set to NIL before any
     1140                  other elements are marked.</para>
     1141              </listitem>
     1142              <listitem>
     1143                <para>All hash tables have certain fields (used to cache
     1144                  previous results) invalidated.</para>
     1145              </listitem>
     1146              <listitem>
     1147                <para>Weak Hash Tables and other weak objects are put on a
     1148                  linkedlist as they're encountered; their contents are only
     1149                  retained if there are other (non-weak) references to
     1150                  them.</para>
     1151              </listitem>
     1152            </orderedlist>
    11401153        <para>At the end of the mark phase, the markbits of all
    1141         objects which are transitively reachable from the roots are
    1142         set and all other markbits are clear.</para>
     1154          objects that are transitively reachable from the roots are
     1155          set and all other markbits are clear.</para>
    11431156      </sect2>
    11441157
    11451158      <sect2 id="Relocation-phase">
    1146         <title>Relocation phase</title>
    1147         <para>The <emphasis>forwarding address</emphasis> of a
    1148         doublenode in the dynamic heap is (&lt;its current address> -
    1149         (size_of_doublenode * &lt;the number of unmarked markbits that
    1150         precede it>)) or alternately (&lt;the base of the heap> +
    1151         (size_of_doublenode * &lt;the number of marked markbits that
    1152         precede it &gt;)). Rather than count the number of preceding
    1153         markbits each time, the relocation table is used to precompute
    1154         an approximation of the forwarding addresses for all
    1155         doublewords. Given this approximate address and a pointer into
    1156         the markbits vector, it's relatively easy to compute the exact
    1157         forwarding address.</para>
    1158         <para>The relocation table contains the forwarding addresses
    1159         of each <emphasis>pagelet</emphasis>, where a pagelet is 256
    1160         bytes (or 32 doublenodes). The forwarding address of the first
    1161         pagelet is the base of the heap. The forwarding address of the
    1162         second pagelet is the sum of the forwarding address of the
    1163         first and 8 bytes for each mark bit set in the first 32-bit
    1164         word in the markbits table. The last entry in the relocation
    1165         table contains the forwarding address that the freepointer
    1166         would have, e.g., the new value of the freepointer after
    1167         compaction.</para>
    1168         <para>In many programs, old objects rarely become garbage and
    1169         new objects often do. When building the relocation table, the
    1170         relocation phase notes the address of the first unmarked
    1171         object in the dynamic heap. Only the area of the heap between
    1172         the first unmarked object and the freepointer needs to be
    1173         compacted; only pointers to this area will need to be
    1174         forwarded (the forwarding address of all other pointers to the
    1175         dynamic heap is the address of that pointer.)  Often, the
    1176         first unmarked object is much nearer the free pointer than it
    1177         is to the base of the heap.</para>
     1159            <title>Relocation phase</title>
     1160            <para>The <emphasis>forwarding address</emphasis> of a
     1161              doublenode in the dynamic heap is (&lt;its current address> -
     1162              (size_of_doublenode * &lt;the number of unmarked markbits that
     1163              precede it>)) or alternately (&lt;the base of the heap> +
     1164              (size_of_doublenode * &lt;the number of marked markbits that
     1165              precede it &gt;)). Rather than count the number of preceding
     1166              markbits each time, the relocation table is used to precompute
     1167              an approximation of the forwarding addresses for all
     1168              doublewords. Given this approximate address and a pointer into
     1169              the markbits vector, it's relatively easy to compute the exact
     1170              forwarding address.</para>
     1171            <para>The relocation table contains the forwarding addresses
     1172              of each <emphasis>pagelet</emphasis>, where a pagelet is 256
     1173              bytes (or 32 doublenodes). The forwarding address of the first
     1174              pagelet is the base of the heap. The forwarding address of the
     1175              second pagelet is the sum of the forwarding address of the
     1176              first and 8 bytes for each mark bit set in the first 32-bit
     1177              word in the markbits table. The last entry in the relocation
     1178              table contains the forwarding address that the freepointer
     1179              would have, e.g., the new value of the freepointer after
     1180              compaction.</para>
     1181            <para>In many programs, old objects rarely become garbage and
     1182              new objects often do. When building the relocation table, the
     1183              relocation phase notes the address of the first unmarked
     1184              object in the dynamic heap. Only the area of the heap between
     1185              the first unmarked object and the freepointer needs to be
     1186              compacted; only pointers to this area will need to be
     1187              forwarded (the forwarding address of all other pointers to the
     1188              dynamic heap is the address of that pointer.)  Often, the
     1189              first unmarked object is much nearer the free pointer than it
     1190              is to the base of the heap.</para>
    11781191      </sect2>
    11791192
    11801193      <sect2 id="Forwarding-phase">
    1181         <title>Forwarding phase</title>
     1194            <title>Forwarding phase</title>
    11821195        <para>The forwarding phase traverses all roots and the "old"
    1183         part of the dynamic heap (the part between the base of the
    1184         heap and the first unmarked object.) All references to objects
    1185         whose address is between the first unmarked object and the
    1186         free pointer are updated to point to the address the object
    1187         will have after compaction by using the relocation table and
    1188         the markbits vector and interpolating.</para>
    1189         <para>The relocation table entry for the pagelet nearest the
    1190         object is found. If the pagelet's address is less than the
    1191         object's address, the number of set markbits that precede the
    1192         object on the pagelet is used to determine the object's
    1193         address; otherwise, the number of set markbits the follow the
    1194         object on the pagelet is used.</para>
     1196          part of the dynamic heap (the part between the base of the
     1197          heap and the first unmarked object.) All references to objects
     1198          whose address is between the first unmarked object and the
     1199          free pointer are updated to point to the address the object
     1200          will have after compaction by using the relocation table and
     1201          the markbits vector and interpolating.</para>
     1202            <para>The relocation table entry for the pagelet nearest the
     1203              object is found. If the pagelet's address is less than the
     1204              object's address, the number of set markbits that precede
     1205              the object on the pagelet is used to determine the object's
     1206              address; otherwise, the number of set markbits that follow
     1207              the object on the pagelet is used.</para>
    11951208        <para>Since forwarding views the heap as a set of doublewords,
    1196         locatives are (mostly) treated like any other pointers. (The
    1197         basic difference is that locatives may appear to be tagged as
    1198         fixnums, in which case they're treated as word-aligned
    1199         pointers into the object.)</para>
     1209          locatives are (mostly) treated like any other pointers. (The
     1210          basic difference is that locatives may appear to be tagged as
     1211          fixnums, in which case they're treated as word-aligned
     1212          pointers into the object.)</para>
    12001213        <para>If the forward phase changes the address of any hash
    1201         table key in a hash table that hashes by address (e.g., an EQ
    1202         hash table), it sets a bit in the hash table's header. The
    1203         hash table code will rehash the hash table's contents if it
    1204         tries to do a lookup on a key in such a table.</para>
     1214          table key in a hash table that hashes by address (e.g., an EQ
     1215          hash table), it sets a bit in the hash table's header. The
     1216          hash table code will rehash the hash table's contents if it
     1217          tries to do a lookup on a key in such a table.</para>
    12051218        <para>Profiling reveals that about half of the total time
    1206         spent in the GC is spent in the subroutine which determines a
    1207         pointer's forwarding address. Exploiting GCC-specific idioms,
    1208         hand-coding the routine, and inlining calls to it could all be
    1209         expected to improve GC performance.</para>
     1219          spent in the GC is spent in the subroutine which determines a
     1220          pointer's forwarding address. Exploiting GCC-specific idioms,
     1221          hand-coding the routine, and inlining calls to it could all be
     1222          expected to improve GC performance.</para>
    12101223      </sect2>
    12111224
    12121225      <sect2 id="Compact-phase">
    1213         <title>Compact phase</title>
     1226            <title>Compact phase</title>
    12141227        <para>The compact phase compacts the area between the first
    1215         unmarked object and the freepointer so that it contains only
    1216         marked objects.  While doing so, it forwards any pointers it
    1217         finds in the objects it copies.</para>
     1228          unmarked object and the freepointer so that it contains only
     1229          marked objects.  While doing so, it forwards any pointers it
     1230          finds in the objects it copies.</para>
    12181231        <para>When the compact phase is finished, so is the GC (more
    1219         or less): the free pointer and some other data structures are
    1220         updated and control returns to the exception handler that
    1221         invoked the GC. If sufficient memory has been freed to satisfy
    1222         any allocation request that may have triggered the GC, the
    1223         exception handler returns; otherwise, a "seriously low on
    1224         memory" condition is signaled, possibly after releasing a
    1225         small emergency pool of memory.</para>
     1232          or less): the free pointer and some other data structures are
     1233          updated and control returns to the exception handler that
     1234          invoked the GC. If sufficient memory has been freed to satisfy
     1235          any allocation request that may have triggered the GC, the
     1236          exception handler returns; otherwise, a "seriously low on
     1237          memory" condition is signaled, possibly after releasing a
     1238          small emergency pool of memory.</para>
    12261239      </sect2>
    12271240    </sect1>
     
    12301243      <title>The ephemeral GC</title>
    12311244      <para>In the &CCL; memory management scheme, the relative age
    1232       of two objects in the dynamic heap can be determined by their
    1233       addresses: if addresses X and Y are both addresses in the
    1234       dynamic heap, X is younger than Y (X was created more recently
    1235       than Y) if it is nearer to the free pointer (and farther from
    1236       the base of the heap) than Y.</para>
     1245        of two objects in the dynamic heap can be determined by their
     1246        addresses: if addresses X and Y are both addresses in the
     1247        dynamic heap, X is younger than Y (X was created more recently
     1248        than Y) if it is nearer to the free pointer (and farther from
     1249        the base of the heap) than Y.</para>
    12371250      <para>Ephemeral (or generational) garbage collectors attempt to
    1238       exploit the following assumptions:</para>
     1251        exploit the following assumptions:</para>
    12391252      <itemizedlist>
    1240         <listitem>
    1241           <para>most newly created objects become garbage soon after
    1242           they'recreated.</para>
    1243         </listitem>
    1244         <listitem>
    1245           <para>most objects that have already survived several GCs
    1246           are unlikely to ever become garbage.</para>
    1247         </listitem>
    1248         <listitem>
    1249           <para>old objects can only point to newer objects as the
    1250           result of a destructive modification (e.g., via
    1251           SETF.)</para>
    1252         </listitem>
     1253            <listitem>
     1254              <para>most newly created objects become garbage soon after
     1255                they'recreated.</para>
     1256            </listitem>
     1257            <listitem>
     1258              <para>most objects that have already survived several GCs
     1259                are unlikely to ever become garbage.</para>
     1260            </listitem>
     1261            <listitem>
     1262              <para>old objects can only point to newer objects as the
     1263                result of a destructive modification (e.g., via
     1264                SETF.)</para>
     1265            </listitem>
    12531266      </itemizedlist>
    12541267
    12551268      <para>By concentrating its efforts on (frequently and quickly)
    1256       reclaiming newly created garbage, an ephemeral collector hopes
    1257       to postpone the more costly full GC as long as possible. It's
    1258       important to note that most programs create some long-lived
    1259       garbage, so an EGC can't typically eliminate the need for full
    1260       GC.</para>
     1269        reclaiming newly created garbage, an ephemeral collector hopes
     1270        to postpone the more costly full GC as long as possible. It's
     1271        important to note that most programs create some long-lived
     1272        garbage, so an EGC can't typically eliminate the need for full
     1273        GC.</para>
    12611274      <para>An EGC views each object in the heap as belonging to
    1262       exactly one <emphasis>generation</emphasis>; generations are
    1263       sets of objects that are related to each other by age: some
    1264       generation is the youngest, some the oldest, and there's an age
    1265       relationship between any intervening generations. Objects are
    1266       typically assigned to the youngest generation when first
    1267       allocated; any object that has survived some number of GCs in
    1268       its current generation is promoted (or
    1269       <emphasis>tenured</emphasis>) into an older generation.</para>
     1275        exactly one <emphasis>generation</emphasis>; generations are
     1276        sets of objects that are related to each other by age: some
     1277        generation is the youngest, some the oldest, and there's an age
     1278        relationship between any intervening generations. Objects are
     1279        typically assigned to the youngest generation when first
     1280        allocated; any object that has survived some number of GCs in
     1281        its current generation is promoted (or
     1282        <emphasis>tenured</emphasis>) into an older generation.</para>
    12701283      <para>When a generation is GCed, the roots consist of the
    1271       stacks, registers, and global variables as always and also of
    1272       any pointers to objects in that generation from other
    1273       generations. To avoid the need to scan those (often large) other
    1274       generations looking for such intergenerational references, the
    1275       runtime system must note all such intergenerational references
    1276       at the point where they're created (via Setf).<footnote><para>This is
    1277       sometimes called "The Write Barrier": all assignments which
    1278       might result in intergenerational references must be noted, as
    1279       if the other generations were write-protected.</para></footnote> The
    1280       set of pointers that may contain intergenerational references is
    1281       sometimes called <emphasis>the remembered set</emphasis>.</para>
     1284        stacks, registers, and global variables as always and also of
     1285        any pointers to objects in that generation from other
     1286        generations. To avoid the need to scan those (often large) other
     1287        generations looking for such intergenerational references, the
     1288        runtime system must note all such intergenerational references
     1289        at the point where they're created (via Setf).<footnote><para>This is
     1290            sometimes called "The Write Barrier": all assignments which
     1291            might result in intergenerational references must be noted, as
     1292            if the other generations were write-protected.</para></footnote> The
     1293        set of pointers that may contain intergenerational references is
     1294        sometimes called <emphasis>the remembered set</emphasis>.</para>
    12821295      <para>In &CCL;'s EGC, the heap is organized exactly the same
    1283       as otherwise; "generations" are merely structures which contain
    1284       pointers to regions of the heap (which is already ordered by
    1285       age.) When a generation needs to be GCed, any younger generation
    1286       is incorporated into it; all objects which survive a GC of a
    1287       given generation are promoted into the next older
    1288       generation. The only intergenerational references that can exist
    1289       are therefore those where an old object is modified to contain a
    1290       pointer to a new object.</para>
     1296        as otherwise; "generations" are merely structures which contain
     1297        pointers to regions of the heap (which is already ordered by
     1298        age.) When a generation needs to be GCed, any younger generation
     1299        is incorporated into it; all objects which survive a GC of a
     1300        given generation are promoted into the next older
     1301        generation. The only intergenerational references that can exist
     1302        are therefore those where an old object is modified to contain a
     1303        pointer to a new object.</para>
    12911304      <para>The EGC uses exactly the same code as the full GC. When a
    1292       given GC is "ephemeral",</para>
     1305        given GC is "ephemeral",</para>
    12931306      <itemizedlist>
    12941307        <listitem>
    1295           <para>the "base of the heap" used to determine an object's
    1296           markbit address is the base of the generation
    1297           being collected;</para>
    1298         </listitem>
     1308              <para>the "base of the heap" used to determine an object's
     1309                markbit address is the base of the generation
     1310                being collected;</para>
     1311            </listitem>
    12991312        <listitem>
    1300           <para>the markbits vector is actually a pointer into the
    1301           middle of the global markbits table; preceding entries in
    1302           this table are used to note doubleword addresses in older
    1303           generations that (may) contain intergenerational
    1304           references;</para>
    1305         </listitem>
     1313              <para>the markbits vector is actually a pointer into the
     1314                middle of the global markbits table; preceding entries in
     1315                this table are used to note doubleword addresses in older
     1316                generations that (may) contain intergenerational
     1317                references;</para>
     1318            </listitem>
    13061319        <listitem>
    1307           <para>some steps (notably GCTWA and the handling of weak
    1308           objects) are not performed;</para>
    1309         </listitem>
     1320              <para>some steps (notably GCTWA and the handling of weak
     1321                objects) are not performed;</para>
     1322            </listitem>
    13101323        <listitem>
    1311           <para>the intergenerational references table is used to
    1312           find additional roots for the mark and forward phases. If a
    1313           bit is set inthe intergenerational references table, that
    1314           means that the corresponding doubleword (in some "old"
    1315           generation, in some "earlier" part of the heap) may have had
    1316           a pointer to an object in a younger generation stored into
    1317           it.</para>
    1318         </listitem>
    1319      
     1324              <para>the intergenerational references table is used to
     1325                find additional roots for the mark and forward phases. If a
     1326                bit is set in the intergenerational references table, that
     1327                means that the corresponding doubleword (in some "old"
     1328                generation, in some "earlier" part of the heap) may have had
     1329                a pointer to an object in a younger generation stored into
     1330                it.</para>
     1331            </listitem>
     1332       
    13201333      </itemizedlist>
    13211334      <para>With one exception (the implicit setfs that occur on entry
    1322       to and exit from the binding of a special variable), all setfs
    1323       that might introduce an intergenerational reference must be
    1324       memoized.@footnote{Note that the implicit setfs that occur when
    1325       initializing an object - as in the case of a call to cons or
    1326       vector - can't introduce intergenerational references, since the
    1327       newly created object is always younger than the objects used to
    1328       initialize it.} It's always safe to push any cons cell or
    1329       gvector locative onto the memo stack; it's never safe to push
    1330       anything else.
     1335        to and exit from the binding of a special variable), all setfs
     1336        that might introduce an intergenerational reference must be
     1337        memoized.
     1338        <footnote><para>Note that the implicit setfs that occur when
     1339        initializing an object - as in the case of a call to cons or
     1340        vector - can't introduce intergenerational references, since
     1341        the newly created object is always younger than the objects
     1342        used to initialize it.</para></footnote> It's always safe to
     1343        push any cons cell or gvector locative onto the memo stack;
     1344        it's never safe to push anything else.
    13311345      </para>
    13321346
    13331347      <para>Typically, the intergenerational references bitvector is
    1334       sparse: a relatively small number of old locations are stored
    1335       into, although some of them may have been stored into many
    1336       times. The routine that scans the memoization buffer does a lot
    1337       of work and usually does it fairly often; it uses a simple,
    1338       brute-force method but might run faster if it was smarter about
    1339       recognizing addresses that it'd already seen.
     1348        sparse: a relatively small number of old locations are stored
     1349        into, although some of them may have been stored into many
     1350        times. The routine that scans the memoization buffer does a lot
     1351        of work and usually does it fairly often; it uses a simple,
     1352        brute-force method but might run faster if it was smarter about
     1353        recognizing addresses that it'd already seen.
    13401354      </para>
    13411355
    13421356      <para>When the EGC mark and forward phases scan the
    1343       intergenerational reference bits, they can clear any bits that
    1344       denote doublewords that definitely do not contain
    1345       intergenerational references.
     1357        intergenerational reference bits, they can clear any bits that
     1358        denote doublewords that definitely do not contain
     1359        intergenerational references.
    13461360      </para>
    13471361    </sect1>
     
    13501364      <title>Fasl files</title>
    13511365      <para>Saving and loading of Fasl files is implemented in
    1352       xdump/faslenv.lisp, level-0/nfasload.lisp, and lib/nfcomp.lisp.
    1353       The information here is only an overview, which might help when
    1354       reading the source.</para>
     1366        xdump/faslenv.lisp, level-0/nfasload.lisp, and lib/nfcomp.lisp.
     1367        The information here is only an overview, which might help when
     1368        reading the source.</para>
    13551369      <para>The &CCL; Fasl format is forked from the old MCL Fasl
    1356       format; there are a few differences, but they are minor.  The
    1357       name "nfasload" comes from the fact that this is the so-called
    1358       "new" Fasl system, which was true in 1986 or so.  </para>
     1370        format; there are a few differences, but they are minor.  The
     1371        name "nfasload" comes from the fact that this is the so-called
     1372        "new" Fasl system, which was true in 1986 or so.  </para>
    13591373      <para>A Fasl file begins with a "file header", which contains
    1360       version information and a count of the following "blocks".
    1361       There's typically only one "block" per Fasl file.  The blocks
    1362       are part of a mechanism for combining multiple logical files
    1363       into a single physical file, in order to simplify the
    1364       distribution of precompiled programs. </para>
     1374        version information and a count of the following "blocks".
     1375        There's typically only one "block" per Fasl file.  The blocks
     1376        are part of a mechanism for combining multiple logical files
     1377        into a single physical file, in order to simplify the
     1378        distribution of precompiled programs. </para>
    13651379      <para>Each block begins with a header for itself, which just
    1366       describes the size of the data that follows.</para>
     1380        describes the size of the data that follows.</para>
    13671381      <para>The data in each block is treated as a simple stream of
    1368       bytes, which define a bytecode program.  The actual bytecodes,
    1369       "fasl operators", are defined in xdump/faslenv.lisp.  The
    1370       descriptions in the source file are terse, but, according to
    1371       Gary, "probably accurate".</para>
     1382        bytes, which define a bytecode program.  The actual bytecodes,
     1383        "fasl operators", are defined in xdump/faslenv.lisp.  The
     1384        descriptions in the source file are terse, but, according to
     1385        Gary, "probably accurate".</para>
    13721386      <para>Some of the operators are used to create a per-block
    1373       "object table", which is a vector used to keep track of
    1374       previously-loaded objects and simplify references to them.  When
    1375       the table is created, an index associated with it is set to
    1376       zero; this is analogous to an array fill-pointer, and allows the
    1377       table to be treated like a stack.</para>
     1387        "object table", which is a vector used to keep track of
     1388        previously-loaded objects and simplify references to them.  When
     1389        the table is created, an index associated with it is set to
     1390        zero; this is analogous to an array fill-pointer, and allows the
     1391        table to be treated like a stack.</para>
    13781392      <para>The low seven bits of each bytecode are used to specify
    1379       the fasl operator; currently, about fifty operators are defined.
    1380       The high byte, when set, indicates that the result of the
    1381       operation should be pushed onto the object table.</para>
     1393        the fasl operator; currently, about fifty operators are defined.
     1394        The high byte, when set, indicates that the result of the
     1395        operation should be pushed onto the object table.</para>
    13821396      <para>Most bytecodes are followed by operands; the operand data
    1383       is byte-aligned.  How many operands there are, and their type,
    1384       depend on the bytecode.  Operands can be indices into the object
    1385       table, immediate values, or some combination of these.</para>
     1397        is byte-aligned.  How many operands there are, and their type,
     1398        depend on the bytecode.  Operands can be indices into the object
     1399        table, immediate values, or some combination of these.</para>
    13861400      <para>An exception is the bytecode #xFF, which has the symbolic
    1387       name ccl::$faslend; it is used to mark the end of the
    1388       block.</para>
     1401        name ccl::$faslend; it is used to mark the end of the
     1402        block.</para>
    13891403    </sect1>
    13901404
     
    13951409
    13961410      <sect2 id="How-CCL-Recognizes-Objective-C-Objects">
    1397         <title>How &CCL; Recognizes Objective-C Objects</title>
     1411            <title>How &CCL; Recognizes Objective-C Objects</title>
    13981412        <para>In most cases, pointers to instances of Objective-C
    1399         classes are recognized as such; the recognition is (and
    1400         probably always will be) slightly heuristic. Basically, any
    1401         pointer that passes basic sanity checks and whose first word
    1402         is a pointer to a known ObjC class is considered to be an
    1403         instance of that class; the Objective-C runtime system would
    1404         reach the same conclusion.</para>
     1413          classes are recognized as such; the recognition is (and
     1414          probably always will be) slightly heuristic. Basically, any
     1415          pointer that passes basic sanity checks and whose first word
     1416          is a pointer to a known ObjC class is considered to be an
     1417          instance of that class; the Objective-C runtime system would
     1418          reach the same conclusion.</para>
    14051419        <para>It's certainly possible that a random pointer to an
    1406         arbitrary memory address could look enough like an ObjC
    1407         instance to fool the lisp runtime system, and it's possible
    1408         that pointers could have their contents change so that
    1409         something that had either been a true ObjC instance (or had
    1410         looked a lot like one) is changed (possibly by virtue of
    1411         having been deallocated.)</para>
     1420          arbitrary memory address could look enough like an ObjC
     1421          instance to fool the lisp runtime system, and it's possible
     1422          that pointers could have their contents change so that
     1423          something that had either been a true ObjC instance (or had
     1424          looked a lot like one) is changed (possibly by virtue of
     1425          having been deallocated.)</para>
    14121426        <para>In the first case, we can improve the heuristics
    1413         substantially: we can make stronger assertions that a
    1414         particular pointer is really "of type :ID" when it's a
    1415         parameter to a function declared to take such a pointer as an
    1416         argument or a similarly declared function result; we can be
    1417         more confident of something we obtained via SLOT-VALUE of a
    1418         slot defined to be of type :ID than if we just dug a pointer
    1419         out of memory somewhere.</para>
     1427          substantially: we can make stronger assertions that a
     1428          particular pointer is really "of type :ID" when it's a
     1429          parameter to a function declared to take such a pointer as an
     1430          argument or a similarly declared function result; we can be
     1431          more confident of something we obtained via SLOT-VALUE of a
     1432          slot defined to be of type :ID than if we just dug a pointer
     1433          out of memory somewhere.</para>
    14201434        <para>The second case is a little more subtle: ObjC memory
    1421         management is based on a reference-counting scheme, and it's
    1422         possible for an object to ... cease to be an object while lisp
    1423         is still referencing it.  If we don't want to deal with this
    1424         possibility (and we don't), we'll basically have to ensure
    1425         that the object is not deallocated while lisp is still
    1426         thinking of it as a first-class object. There's some support
    1427         for this in the case of objects created with MAKE-INSTANCE,
    1428         but we may need to give similar treatment to foreign objects
    1429         that are introduced to the lisp runtime in other ways (as
    1430         function arguments, return values, SLOT-VALUE results, etc. as
    1431         well as those instances that are created under lisp
    1432         control.)</para>
     1435          management is based on a reference-counting scheme, and it's
     1436          possible for an object to ... cease to be an object while lisp
     1437          is still referencing it.  If we don't want to deal with this
     1438          possibility (and we don't), we'll basically have to ensure
     1439          that the object is not deallocated while lisp is still
     1440          thinking of it as a first-class object. There's some support
     1441          for this in the case of objects created with MAKE-INSTANCE,
     1442          but we may need to give similar treatment to foreign objects
     1443          that are introduced to the lisp runtime in other ways (as
     1444          function arguments, return values, SLOT-VALUE results, etc. as
     1445          well as those instances that are created under lisp
     1446          control.)</para>
    14331447        <para>This doesn't all work yet (in fact, not much of it works
    1434         yet); in practice, this has not yet been as much of a problem
    1435         as anticipated, but that may be because existing Cocoa code
    1436         deals primarily with relatively long-lived objects such as
    1437         windows, views, menus, etc.</para>
     1448          yet); in practice, this has not yet been as much of a problem
     1449          as anticipated, but that may be because existing Cocoa code
     1450          deals primarily with relatively long-lived objects such as
     1451          windows, views, menus, etc.</para>
    14381452      </sect2>
    14391453
    14401454      <sect2>
    1441         <title>Recommended Reading</title>
    1442 
    1443         <variablelist>
    1444           <varlistentry>
    1445             <term>
    1446               <ulink url="http://developer.apple.com/documentation/Cocoa/">Cocoa Documentation</ulink>
    1447             </term>
    1448            
    1449            <listitem>
    1450              <para>
    1451                This is the top page for all of Apple's documentation on
    1452                Cocoa.  If you are unfamiliar with Cocoa, it is a good
    1453                place to start.
    1454              </para>
    1455            </listitem>
    1456         </varlistentry>
    1457         <varlistentry>
    1458           <term>
    1459             <ulink url="http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ObjC_classic/index.html">Foundation Reference for Objective-C</ulink>
    1460           </term>
    1461 
    1462           <listitem>
    1463             <para>
    1464               This is one of the two most important Cocoa references; it
    1465               covers all of the basics, except for GUI programming.  This is
    1466               a reference, not a tutorial.
    1467             </para>
    1468           </listitem>
    1469         </varlistentry>
    1470       </variablelist>
     1455            <title>Recommended Reading</title>
     1456
     1457            <variablelist>
     1458              <varlistentry>
     1459                <term>
     1460                  <ulink url="http://developer.apple.com/documentation/Cocoa/">Cocoa Documentation</ulink>
     1461                </term>
     1462               
     1463                <listitem>
     1464                  <para>
     1465                    This is the top page for all of Apple's documentation on
     1466                    Cocoa.  If you are unfamiliar with Cocoa, it is a good
     1467                    place to start.
     1468                  </para>
     1469                </listitem>
     1470              </varlistentry>
     1471              <varlistentry>
     1472                <term>
     1473                  <ulink url="http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ObjC_classic/index.html">Foundation Reference for Objective-C</ulink>
     1474                </term>
     1475
     1476                <listitem>
     1477                  <para>
     1478                    This is one of the two most important Cocoa references; it
     1479                    covers all of the basics, except for GUI programming.  This is
     1480                    a reference, not a tutorial.
     1481                  </para>
     1482                </listitem>
     1483              </varlistentry>
     1484        </variablelist>
    14711485      </sect2>
    14721486    </sect1>
Note: See TracChangeset for help on using the changeset viewer.