Ignore:
Timestamp:
Apr 1, 2008, 11:15:04 AM (12 years ago)
Author:
gb
Message:

update, fix up a bit

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/source/doc/src/implementation.xml

    r8820 r8976  
    1111  <chapter id="Implementation-Details-of-CCL">
    1212    <title>Implementation Details of &CCL;</title>
    13     <para>This chapter describes many aspects of &CCL;'s
    14     implementation as of (roughly) version 1.1.  Details vary a bit
    15     between the three archutectures (PPC32, PPC64, and X86-64)
     13    <para>This chapter describes many aspects of OpenMCL's
     14    implementation as of (roughly) version 1.1. Details vary a bit
     15    between the three architectures (PPC32, PPC64, and x86-64)
    1616    currently supported and those details change over time, so the
    1717    definitive reference is the source code (especially some files in
    1818    the ccl/compiler/ directory whose names contain the string "arch"
    19     and some files in the ccl/lisp-kernel/ directory whose namee
    20     contain the string "constants".)  Hopefully, this chapter will
    21     make it easier for someone who's interested to read and understand
    22     the contents of those files.</para>
     19    and some files in the ccl/lisp-kernel/ directory whose names
     20    contain the string "constants".) Hopefully, this chapter will make
     21    it easier for someone who's interested to read and understand the
     22    contents of those files.</para>
    2323
    2424    <sect1 id="Threads-and-exceptions">
     
    113113        of the keyboard driver noticing that ^C or ^Z has been
    114114        pressed) and synchronous hardware-level events (like trying to
    115         execute and illegal instruction or access protected memory.)
     115        execute an illegal instruction or access protected memory.)
    116116        It makes some sense to defer ("block") handling of
    117117        aysnchronous signals so that some critical code sequences
     
    151151        (from breakpoints or single-step operations), it's possible to
    152152        use GDB to debug &CCL;.</para>
    153         <para>On platforms where signal handling and debugging don't get in each
    154 other's way, a signal handler is entered with all signals blocked.
    155 (This behavior is specified in the call to the sigaction() function
    156 which established the signal handler.)  The signal handler recieves
    157 three arguments from the OS kernel; the first is an intger which
    158 identifies the signal, the second is a pointer to an object of
    159 type "siginfo_t", which may or may not contain a few fields that
    160 would help to identify the cause of the exception, and the third
    161 argument is a pointer to a data structure (called a "ucontext"
    162 or something similar) which contains machine-dependent information
    163 about the state of the tread at the time that the exception/signal
    164 occurred.  While asynchronous signals are blocked, the signal handler
    165 stores the pointer to its third argument (the "signal context") in
    166 a field in the current thread's TCR, sets some bits in another TCR
    167 field to indicate that the thread is now waiting to handle an
    168 exception, unblocks asynchronous signals, and waits for a global
    169 exception lock which serializes exception processing.</para>
     153        <para>On platforms where signal handling and debugging don't
     154        get in each other's way, a signal handler is entered with all
     155        signals blocked.  (This behavior is specified in the call to
     156        the sigaction() function which established the signal
     157        handler.)  The signal handler recieves three arguments from
     158        the OS kernel; the first is an intger which identifies the
     159        signal, the second is a pointer to an object of type
     160        "siginfo_t", which may or may not contain a few fields that
     161        would help to identify the cause of the exception, and the
     162        third argument is a pointer to a data structure (called a
     163        "ucontext" or something similar) which contains
     164        machine-dependent information about the state of the tread at
     165        the time that the exception/signal occurred.  While
     166        asynchronous signals are blocked, the signal handler stores
     167        the pointer to its third argument (the "signal context") in a
     168        field in the current thread's TCR, sets some bits in another
     169        TCR field to indicate that the thread is now waiting to handle
     170        an exception, unblocks asynchronous signals, and waits for a
     171        global exception lock which serializes exception
     172        processing.</para>
    170173        <para>On Darwin, the Mach exception thread creates a signal
    171174        context (and maybe a siginfo_t structure), stores the signal
     
    181184        context and siginfo structures on the stack of the thread
    182185        which received the signal; in practice, that means "wherever
    183         RSP is pointing."  &CCL;'s require that the thread's value
    184         stack - where RSP is usually pointing while lisp code is
    185         running - contain only "nodes" (properly tagged lisp objects),
    186         and scribbling a signal context all over the value stack would
    187         violate this requirement.  To maintain consistency, the
    188         sigaltstack() mechanism is used to cause the signal to be
    189         delivered on (and the signal context and siginfo to be
    190         allocated on) a special stack area (the last few pages of the
    191         thread's cntrol stack, in practice.  When the signal handler
    192         runs, it (carefully) copies the signal context and siginfo to
    193         the thread's control stack and makes RSP point into that stack
    194         before invoking the "real" signal handler.  (The effect of
    195         this hack is that the "real" signal handler always runs on the
    196         thread's control stack.)</para>
     186        RSP is pointing."  &CCL;'s <xref
     187        linkend="Register-and-stack-usage-conventions"/> require that
     188        the thread's value stack - where RSP is usually pointing while
     189        lisp code is running - contain only "nodes" (properly tagged
     190        lisp objects), and scribbling a signal context all over the
     191        value stack would violate this requirement.  To maintain
     192        consistency, the sigaltstack() mechanism is used to cause the
     193        signal to be delivered on (and the signal context and siginfo
     194        to be allocated on) a special stack area (the last few pages
     195        of the thread's cntrol stack, in practice.  When the signal
     196        handler runs, it (carefully) copies the signal context and
     197        siginfo to the thread's control stack and makes RSP point into
     198        that stack before invoking the "real" signal handler.  (The
     199        effect of this hack is that the "real" signal handler always
     200        runs on the thread's control stack.)</para>
    197201        <para>Once the exception handler has obtained the global
    198202        exception lock, it uses the values of the signal number,
     
    214218        is invoked.</para>
    215219        <para>If the kernel exception handler identifies the
    216         exception' cause as being a transient out-of-memory condition
     220        exception's cause as being a transient out-of-memory condition
    217221        (indicating that the current thread needs more memory to cons
    218222        in), it tries to make that memory available.  In some cases,
     
    222226      <sect2 id="Threads-comma---exceptions-comma---and-the-GC">
    223227        <title>Threads, exceptions, and the GC</title>
    224         <para>&CCL;'s GC is not concurrent: when the GC is invoked
    225         in response to an exception in a particular thread, all other
     228        <para>&CCL;'s GC is not concurrent: when the GC is invoked in
     229        response to an exception in a particular thread, all other
    226230        lisp threads must stop until the GC's work is done.  The
    227231        thread that triggered the GC iterates over the global TCR
     
    232236        Once all other threads have acknowledged the request to
    233237        suspend themselves, the GC thread can run the GC proper (after
    234         doing any necessary .)  Once the GC's completed its work, the
    235         thread that invoked the GC iterates over the global TCR list,
    236         raising a per-thread "resume" semaphore for each other
    237         thread.</para>
     238        doing any necessary <xref linkend="PC-lusering"/>.)  Once the
     239        GC's completed its work, the thread that invoked the GC
     240        iterates over the global TCR list, raising a per-thread
     241        "resume" semaphore for each other thread.</para>
    238242        <para>The signal handler for the asynchronous "suspend" signal
    239243        is entered with all asynchronous signals blocked.  It saves
     
    258262        <itemizedlist>
    259263          <listitem>
    260 <para>On both PPC and x86-64 platforms, consing isn't fully atomic.It takes at least a few instructions to allocate an object in memory(and slap a header on it if necesssary); if a thread is interrupted inthe middle of that instruction sequence, the new object may or may nothave been created or fully initialized at the point in time that theinterrupt occurred.  (There are actually a few different states ofpartial initialization)</para>
    261 </listitem>
     264            <para>On both PPC and x86-64 platforms, consing isn't
     265            fully atomic.It takes at least a few instructions to
     266            allocate an object in memory(and slap a header on it if
     267            necesssary); if a thread is interrupted inthe middle of
     268            that instruction sequence, the new object may or may
     269            nothave been created or fully initialized at the point in
     270            time that theinterrupt occurred.  (There are actually a
     271            few different states ofpartial initialization)</para>
     272          </listitem>
     273          <listitem>
     274            <para>On the PPC, the common act of building a lisp
     275            control stack frame involves allocating a four-word frame
     276            and storing three register values into that frame.  (The
     277            fourth word - the back pointer to the previous frame - is
     278            automatically set when the frame is allocated.)  The
     279            previous contents of those three words are unknown (there
     280            might have been a foreign stack frame at the same address a
     281            few instructions earlier),so interrupting a thread that's
     282            in the process of initializing a PPC control stack frame
     283            isn't GC-safe.</para>
     284          </listitem>
    262285          <listitem>
    263 <para>On the PPC, the common act of building a lisp control stack frameinvolves allocating a four-word frame and storing three register valuesinto that frame.  (The fourth word - the back pointer to the previousframe - is automatically set when the frame is allocated.)  The previouscontents of those three words are unknown (there might have been aforeign stack frame at the same address a few instructions earlier),so interrupting a thread that's in the process of initializing aPPC control stack frame isn't GC-safe.</para>
    264 </listitem>
     286            <para>There are similar problems with the initialization
     287            of temp stackframes on the PPC.  (Allocation and
     288            initialization doesn't happen atomically, and the newly
     289            allocated stack memory may have undefined contents.)</para>
     290          </listitem>
    265291          <listitem>
    266 <para>There are similar problems with the initialization of temp stackframes on the PPC.  (Allocation and initialization doesn't happenatomically, and the newly allocated stack memory may have undefinedcontents.)</para>
    267 </listitem>
     292            <para><xref linkend="The-ephemeral-GC"/>'s write barrier
     293            has to be implemented atomically (i.e.,both an
     294            intergenerational store and the update of a
     295            corresponding reference bit has to happen without
     296            interruption, or neither of these events can
     297            happen.)</para>
     298          </listitem>
    268299          <listitem>
    269 <para>'s write barrier has to be implemented atomically (i.e.,both an intergenerational store and the update of a correspondingreference bit has to happen without interruption, or neither of theseevents can happen.)</para>
    270 </listitem>
    271           <listitem>
    272 <para>There are a few more similar cases.</para>
    273 </listitem>
    274        
     300            <para>There are a few more similar cases.</para>
     301          </listitem>
    275302        </itemizedlist>
    276303
    277         <para>Fortunately, the number of these non-atomic instruction sequences is
    278 small, and fortunately it's fairly easy for the interrupting thread
    279 to recognize when the interrupted thread is in the middle of such
    280 a sequence.  When this is detected, the interrupting thread modfies
    281 the state of the interrupted thread (modifying its PC and other
    282 registers) so that it is no longer in the middle of such a sequenece
    283 (it's either backed out of it or the remaining instructions are
    284 emulated.)</para>
    285         <para>This works because (a) many of the troublesome instruction sequences
    286 are PPC-specific and it's relatively easy to partially disassemble the
    287 instructions surrounding the interrupted thread's PC on the PPC and
    288 (b) those instruction sequences are heavily stylized and intended to
    289 be easily recognized.</para>
     304        <para>Fortunately, the number of these non-atomic instruction
     305        sequences is small, and fortunately it's fairly easy for the
     306        interrupting thread to recognize when the interrupted thread
     307        is in the middle of such a sequence.  When this is detected,
     308        the interrupting thread modfies the state of the interrupted
     309        thread (modifying its PC and other registers) so that it is no
     310        longer in the middle of such a sequenece (it's either backed
     311        out of it or the remaining instructions are emulated.)</para>
     312        <para>This works because (a) many of the troublesome
     313        instruction sequences are PPC-specific and it's relatively
     314        easy to partially disassemble the instructions surrounding the
     315        interrupted thread's PC on the PPC and (b) those instruction
     316        sequences are heavily stylized and intended to be easily
     317        recognized.</para>
    290318      </sect2>
    291319    </sect1>
     
    349377        activity in some other thread) on any instruction boundary,
    350378        which in turn means that the compiler and runtime system must
    351         follow precise at all times.</para>
     379        follow precise <xref
     380        linkend="Register-and-stack-usage-conventions"/> at all
     381        times.</para>
    352382        <para>Once we've decided that a given machine word is a node,
    353         a describes how the node's value and type are encoded in that
    354         machine word.</para>
     383        a <xref linkend="Tagging-scheme"/> describes how the node's
     384        value and type are encoded in that machine word.</para>
    355385        <para>Most of this - so far - has discussed thigs from the
    356386        GC's very low-level perspective.  From a much higher point of
     
    360390        the operations in question involve raw non-node values.)
    361391        Higher-level parts of the lisp type system (functions like
    362         TYPE-OF and CLASS-OF, etc.) depend on the .</para>
     392        TYPE-OF and CLASS-OF, etc.) depend on the <xref
     393        linkend="Tagging-scheme"/>.</para>
    363394      </sect2>
    364395
     
    400431        also use the PPC's "count register" (ctr), and if either the
    401432        lr or ctr needs to be stored in memory it needs to first be
    402         copied to a general-purpose registers.</para>
     433        copied to a general-purpose register.</para>
    403434        <para>&CCL;'s GC understands that certain registers contain
    404435        these special "pc-locatives" (locatives that point into
     
    439470              should be restored on function exit.  On the PPC, the GC
    440471              has to look at control-stack frames, identify which of
    441               those frames are lisp frames, and treat the contents
    442               ofthe saved function slot as a node (and handle the
    443               return address locative specially.)  On x86-64, the
    444               control stack is used for dynamic-extent allocation of
    445               immediate objects.  Since the control stack never
    446               contains nodes on x86-64, the GC ignores it on that
    447               platform.  Alignment of the control stack follows the
    448               ABI conventions of the platform (at least at any point
    449               in time where foreign code could run.)  On PPC, the r1
     472              those frames are lisp frames, and treat the contents of
     473              the saved function slot as a node (and handle the return
     474              address locative specially.)  On x86-64, the control
     475              stack is used for dynamic-extent allocation of immediate
     476              objects.  Since the control stack never contains nodes
     477              on x86-64, the GC ignores it on that platform.
     478              Alignment of the control stack follows the ABI
     479              conventions of the platform (at least at any point in
     480              time where foreign code could run.)  On PPC, the r1
    450481              register always points to the top of the current
    451482              thread's control stack; on x86-64, the RSP register
     
    453484              when the thread is running foreign code and the address
    454485              of the top of the control stack is kept in the thread's
    455               TCR see when not running foreign code.  The control
    456               stack "grows down."</para>
     486              TCR (see <xref linkend="The-Thread-Context-Record"/>
     487              when not running foreign code.  The control stack "grows
     488              down."</para>
    457489            </listitem>
    458490            <listitem>
     
    539571          <itemizedlist>
    540572            <listitem>
    541               <para>(only) two "immediate" registers.The RAX and RDX
    542               registers are used as the implicit operands andresults
    543               of some extended-precision multiply and divide
    544               instructionswhich generally involve non-node values;
    545               since their use in theseinstructions means that they
    546               can't be guaranteed to contain nodevalues at all times,
    547               it's natural to put these registers in the"immediate"
    548               set.  RAX is generally given the symbolic name
    549               "imm0",and RDX is given the symbolic name "imm1"; you
    550               may see these namesin disassembled code, usually in
    551               operations involving type checking,array indexing, and
     573              <para>(only) three "immediate" registers.</para>
     574              <para>The RAX, RCX, and RDX registers are used as the
     575              implicit operands and results of some extended-precision
     576              multiply and divide instructions which generally involve
     577              non-node values; since their use in these instructions
     578              means that they can't be guaranteed to contain node
     579              values at all times, it's natural to put these registers
     580              in the "immediate" set. RAX is generally given the
     581              symbolic name "imm0", RDX is given the symbolic name
     582              "imm1" and RCX is given the symbolic name "imm2"; you
     583              may see these names in disassembled code, usually in
     584              operations involving type checking, array indexing, and
    552585              foreign memory and function access.</para>
    553586            </listitem>
    554587            <listitem>
    555               <para>(only) two "dedicated" registers.RSP and RBP have
    556               dedicated functionality dictated by the hardwareand
    557               calling conventions.  (There are a few places where RBP
    558               istemporarily used as an extra immediate
    559               register.)</para>
     588              <para>(only) two "dedicated" registers.</para>
     589              <para>RSP and RBP have
     590              dedicated functionality dictated by the hardware and
     591              calling conventions.</para>
    560592            </listitem>
    561593            <listitem>
    562               <para>12 "node" registers.All other registers (RBX, RCX,
    563               RSI, RDI, and R8-R15) are asserted tocontain node values
    564               at (almost) all times; legacy "string" operationsthat
    565               implicitly use RSI and/or RDI are not used.  Shift and
    566               rotateintructions which shift/rotate by a variable
    567               number of bits arerequired by the architecture to use
    568               the low byte of RCX (the traditionalCL register) as the
    569               implicit shift count; when it's necessary to keepa
    570               non-node shift count in the low byte of RCX, the upper 7
    571               bytes ofthe register are zeroed (so that
    572               misinterpetation of the immediatevalue in RCX as a node
    573               will not have negative GC affects.  (The GCmight briefly
    574               treate it as a node, but since it's not pointing
    575               anywherenear the lisp heap it'll soon lose interest in
    576               it.)Legacy instructions that use RCX (or some portions
    577               of it) as a loopcounter can not be used (since such
    578               instructions might introducenon-node values into
    579               RCX.)</para>
    580 </listitem>
     594              <para>11 "node" registers.</para>
     595              <para>All other registers (RBX, RSI, RDI, and R8-R15)
     596              are asserted to contain node values at (almost) all
     597              times; legacy "string" operationsthat implicitly use RSI
     598              and/or RDI are not used.</para>
     599            </listitem>
    581600          </itemizedlist>
    582           <para>On the PPC, the static register partitioning scheme involves:</para>
     601          <para>On the PPC, the static register partitioning scheme
     602          involves:</para>
    583603
    584604          <itemizedlist>
    585605            <listitem>
    586               <para>6 "immediate" registersRegisters r3-r8 are given
    587               the symbolic names imm0-imm5.  As a RISCarchitecture
     606              <para>6 "immediate" registers.</para>
     607              <para>Registers r3-r8 are given
     608              the symbolic names imm0-imm5.  As a RISC architecture
    588609              with simpler addressing modes, the PPC probably
    589               usesimmediate registers a bit more often than the CISC
    590               x86-64 does, butthey're generally used for the same sort
    591               of things (type checking,array indexing, FFI,
     610              uses immediate registers a bit more often than the CISC
     611              x86-64 does, but they're generally used for the same sort
     612              of things (type checking, array indexing, FFI,
    592613              etc.)</para>
    593614            </listitem>
     
    598619                  <para>r0 (symbolic name rzero) always contains the
    599620                  value 0 when runninglisp code.  Its value is
    600                   sometimes read as 0 when it's used as thebase
     621                  sometimes read as 0 when it's used as the base
    601622                  register in a memory address; keeping the value 0
    602                   there issometimes convenient and avoids
     623                  there is sometimes convenient and avoids
    603624                  asymmetry.</para>
    604625                </listitem>
     
    609630                <listitem>
    610631                  <para>r2 is used to hold the current thread's TCR on
    611                   ppc64 systems; it'snot used on ppc32.</para>
     632                  ppc64 systems; it's not used on ppc32.</para>
    612633                </listitem>
    613634                <listitem>
     
    626647                </listitem>
    627648                <listitem>
    628                   <para>r12 (symbolic name tsp) holds the top of the current thread's temp stack.</para>
     649                  <para>r12 (symbolic name tsp) holds the top of the
     650                  current thread's temp stack.</para>
    629651                </listitem>
    630652                <listitem>
    631                   <para>r13 is used to hold the TCR on PPC32 sytems; it's not used on PPC64.</para>
     653                  <para>r13 is used to hold the TCR on PPC32 sytems;
     654                  it's not used on PPC64.</para>
    632655                </listitem>
    633656                <listitem>
     
    653676            </listitem>
    654677            <listitem>
    655               <para>17 "node" registersr15-r31 are always treated as
    656               node registers</para>
     678              <para>17 "node" registers</para>
     679              <para>r15-r31 are always treated as node
     680              registers</para>
    657681            </listitem>
    658682           
     
    730754            <para>CONS cells always contain exactly 2 elements and are
    731755            usually fairly common.It therefore makes sense to give
    732             CONS cells their own tag.  Unlike thefixnum case - where a
     756            CONS cells their own tag.  Unlike the fixnum case - where a
    733757            tag value of 0 had positive implications - theredoesn't
    734758            seem to be any advantage to using any particular value.
     
    744768          </listitem>
    745769          <listitem>
    746             <para>Any way you look at it, NIL is a bit ... unusual.NIL
     770            <para>Any way you look at it, NIL is a bit ... unusual. NIL
    747771            is both a SYMBOL and a LIST (as well as being a canonical
    748772            truth value and probably a few other things.)  Its role as
     
    769793            NIL's pname is set to NIL (SYMBOL-NAME checks for this and
    770794            returns the string "NIL"), and LISTP (and therefore safe
    771             CAR and CDR) have to check for (OR NULL CONSP).At least in
     795            CAR and CDR) have to check for (OR NULL CONSP). At least in
    772796            the case of CAR and CDR, the fact that the PPC has multiple
    773797            condition-code fields keeps that extra test from
     
    775799          </listitem>
    776800          <listitem>
    777             <para>Some objects are immediate.(but not FIXNUMs).This is
     801            <para>Some objects are immediate (but not FIXNUMs).This is
    778802            true of CHARACTERs and, on 64-bit platforms,
    779803            SINGLE-FLOATs.It's also true of some nodes used in the
    780             runtime system (specialvalues used to indicate unbound
    781             variables and slots, for instance.)On 64-bit platforms,
    782             SINGLE-FLOATs have their own unique tag (makingthem a
     804            runtime system (special values used to indicate unbound
     805            variables and slots, for instance.) On 64-bit platforms,
     806            SINGLE-FLOATs have their own unique tag (making them a
    783807            little easier to recognize; on all platforms, CHARACTERs
    784             sharea tag with other immediate objects (unbound markers)
    785             but are easyto recognize (by looking at several of their
    786             low bits.)  The GCtreats any node with an immediate tag
    787             (and any node with a fixnumtag) as a leaf.</para>
     808            share a tag with other immediate objects (unbound markers)
     809            but are easy to recognize (by looking at several of their
     810            low bits.)  The GC treats any node with an immediate tag
     811            (and any node with a fixnum tag) as a leaf.</para>
    788812          </listitem>
    789813          <listitem>
     
    792816            uniformly.There are some disadvantages to that uniform
    793817            treatment as well, and the treatment of "memory-allocated
    794             non-CONS objects" isn't entirely uniformaccross all
     818            non-CONS objects" isn't entirely uniform accross all
    795819            &CCL; implementations.  Let's first pretend that
    796820            the treatment is uniform, then discuss the ways in which it
    797821            isn't.The "uniform approach" is to treat all
    798822            memory-allocated non-CONS objectsas if they were vectors;
    799             this use of the term is a little looser thanwhat's implied
     823            this use of the term is a little looser than what's implied
    800824            by the CL VECTOR type.  &CCL; actually uses the
    801             term"uvector" to mean "a memory-allocated lisp object
     825            term "uvector" to mean "a memory-allocated lisp object
    802826            other than a CONS cell,whose first word is a header which
    803             describes the object's type andthe number of elements that
    804             it contains."  In this view, a SYMBOL isa UVECTOR, as is a
     827            describes the object's type and the number of elements that
     828            it contains."  In this view, a SYMBOL is a UVECTOR, as is a
    805829            STRING, a STANDARD-INSTANCE, a CL array or vector,a
    806830            FUNCTION, and even a DOUBLE-FLOAT.In the PPC
    807831            implementations (where things are a little more
    808832            ... uniform),a single tag value is used to denote any
    809             uvector; in order to determinesomething more specific
    810             about the type of the object in question, it'snecessary to
     833            uvector; in order to determine something more specific
     834            about the type of the object in question, it's necessary to
    811835            fetch the low byte of the header word from memory.  On
    812             thex86-64 platform, certain types of uvectors - SYMBOLs
     836            the x86-64 platform, certain types of uvectors - SYMBOLs
    813837            and FUNCTIONs -are given their own unique tags.  The good
    814             news about the x86-64 approachis that SYMBOLs and
    815             FUNCTIONs can be recognized without referencingmemory; the
    816             slightly bad news is that primitive operations that workon
     838            news about the x86-64 approach is that SYMBOLs and
     839            FUNCTIONs can be recognized without referencing memory; the
     840            slightly bad news is that primitive operations that work on
    817841            UVECTOR-tagged objects - like the function CCL:UVREF -
    818             don't workon SYMBOLs or FUNCTIONs on x86-64 (but -do- work
    819             on those types of objectsin the PPC ports.)The header word
    820             which precedes a UVECTOR's data in memory contains 8bits
     842            don't work on SYMBOLs or FUNCTIONs on x86-64 (but -do- work
     843            on those types of objects in the PPC ports.) The header word
     844            which precedes a UVECTOR's data in memory contains 8 bits
    821845            of type information in the low byte and either 24 or 56
    822846            bits of"element-count" information in the rest of the
    823             word.  (This is wherethe sometimes-limiting value of 2^24
    824             for ARRAY-TOTAL-SIZE-LIMIT onPPC32 platforms comes from.)
    825             The low byte of the header - sometimescalled the uvector's
    826             subtag - is itself tagged (which means thatthe header is
    827             tagged.)  The (3 or 4) tag bits in the subtag are usedto
     847            word.  (This is where the sometimes-limiting value of 2^24
     848            for ARRAY-TOTAL-SIZE-LIMIT on PPC32 platforms comes from.)
     849            The low byte of the header - sometimes called the uvector's
     850            subtag - is itself tagged (which means that the header is
     851            tagged.)  The (3 or 4) tag bits in the subtag are used to
    828852            determine whether the uvector's elements are nodes or
    829853            immediates.(A UVECTOR whose elements are nodes is called a
    830             GVECTOR; a UVECTORwhose elements are immediates is called
    831             an IVECTOR.  This terminologycame from Spice Lisp, which
    832             was a predecessor of CMUCL.)Even though a uvector header
    833             is tagged, a header is not a node.  There'sno (supported)
    834             way to get your hands on one in lisp and doing so couldbe
     854            GVECTOR; a UVECTOR whose elements are immediates is called
     855            an IVECTOR.  This terminology came from Spice Lisp, which
     856            was a predecessor of CMUCL.)  Even though a uvector header
     857            is tagged, a header is not a node.  There's no (supported)
     858            way to get your hands on one in lisp and doing so could be
    835859            dangerous.  (If the value of a header wound up in a lisp
    836             noderegister and that register wound up getting pushed on
    837             a thread's valuestack, the GC might misinterpret that
    838             situation to mean that therewas a stack-allocated UVECTOR
     860            node register and that register wound up getting pushed on
     861            a thread's value stack, the GC might misinterpret that
     862            situation to mean that there was a stack-allocated UVECTOR
    839863            on the value stack.)</para>
    840864          </listitem>
     
    845869
    846870    <sect1 id="Heap-Allocation">
    847       <title>Heap Allocation</title> <para>When the &CCL; kernel
    848       first starts up, a large contiguous chunk of the process's
    849       address space is mapped as "anonymous, no access"
    850       memory. ("Large" means different things in different contexts;
    851       on LinuxPPC32, it means "about 1 gigabyte", on DarwinPPC32, it
    852       means "about 2 gigabytes", and on current 64-bit platforms it
    853       ranges from 128 to 512 gigabytes, depending on OS. These values
    854       are both defaults and upper limits; the --heap-reserve
    855       argument can be used to try to reserve less than the
    856       default.)</para>
     871      <title>Heap Allocation</title> <para>When the &CCL; kernel first
     872      starts up, a large contiguous chunk of the process's address
     873      space is mapped as "anonymous, no access" memory. ("Large" means
     874      different things in different contexts; on LinuxPPC32, it means
     875      "about 1 gigabyte", on DarwinPPC32, it means "about 2
     876      gigabytes", and on current 64-bit platforms it ranges from 128
     877      to 512 gigabytes, depending on OS. These values are both
     878      defaults and upper limits; the --heap-reserve argument can be
     879      used to try to reserve less than the default.)</para>
    857880      <para>Reserving address space that can't (yet) be read or
    858881      written to doesn't cost much; in particular, it doesn't require
    859882      that correspinding swap space or physical memory be available.
    860883      Marking the address range as being "mapped" helps to ensure that
    861       other things (result from random calls to malloc(), dynamically
     884      other things (results from random calls to malloc(), dynamically
    862885      loaded shared libraries) won't be allocated in this region that
    863886      lisp has reserved for its own heap growth.</para>
     
    880903        <para>Each lisp thread has a private "reserved memory
    881904        segment"; when a thread starts up, its reserved memory segment
    882         is empty.  PPC ports maintain the highest unallocated addres
    883         and he lowest allocated address in the current segment in
     905        is empty.  PPC ports maintain the highest unallocated address
     906        and the lowest allocatable address in the current segment in
    884907        registers when running lisp code; on x86-664, these values are
    885908        maintained in the current threads's TCR.  (An "empty" heap
     
    917940  (MR ARG_Z ALLOCPTR)           ; arg_z is the new CONS cell
    918941  (RLWINM ALLOCPTR ALLOCPTR 0 0 28)     ; clear tag bits
    919 </programlisting>
    920         <para>On x86-64, the idea's similar but the implementation is
     942        </programlisting>
     943        <para>On x86-64, the idea's similar but the implementation is
    921944        different.  The high and low pointers to the current thread's
    922945        reserved segment are kept in the TCR, which is addressed by
     
    9921015        code would be able to "cleanly" signal an out-of-memory
    9931016        condition if lisp is ... out of memory</para>
    994         <para>I don't know that I've ever seen an abrupt out-of-memory failure that
    995 wasn't preceeded by several minutes of excessive paging activity.  The
    996 most expedient course in cases like this is to either (a) use less memory
    997 or (b) get more memory; it's generally hard to use memory that you don't
    998 have.</para>
     1017        <para>I don't know that I've ever seen an abrupt out-of-memory
     1018        failure that wasn't preceeded by several minutes of excessive
     1019        paging activity.  The most expedient course in cases like this
     1020        is to either (a) use less memory or (b) get more memory; it's
     1021        generally hard to use memory that you don't have.</para>
    9991022      </sect2>
    10001023    </sect1>
     
    10081031      garbage to rebuild free-lists; their execution time is therefore
    10091032      a factor of the total heap size.)</para>
    1010       <para>As mentioned in , two auxiliary data structures
    1011       (proportional to the size of the lisp heap) are maintained. These are</para>
     1033      <para>As mentioned in <xref linkend="Heap-Allocation"/>, two
     1034      auxiliary data structures (proportional to the size of the lisp
     1035      heap) are maintained. These are</para>
    10121036      <orderedlist>
    10131037        <listitem>
    10141038          <para>the markbits bitvector, which contains a bit for
    1015           everydoublenode in the dynamic heap (plus a few extra words
    1016           for alignmentand so that sub-bitvectors can start on word
     1039          every doublenode in the dynamic heap (plus a few extra words
     1040          for alignment and so that sub-bitvectors can start on word
    10171041          boundaries.)</para>
    10181042        </listitem>
     
    10421066        <emphasis>marked</emphasis> if the markbits of all of its
    10431067        constituent doublewords are set and unmarked otherwise;
    1044         setting an object's markbits involves setting the corrsponding
     1068        setting an object's markbits involves setting the corresponding
    10451069        markbits of all constituent doublenodes in the object.</para>
    10461070        <para>The mark phase traverses each root. If the tag of the
     
    10631087          <listitem>
    10641088            <para>Otherwise, the object is a gvector. Recursively mark
    1065             itselements.</para>
     1089            its elements.</para>
    10661090          </listitem>
    10671091        </orderedlist>
     
    10851109          <para>To support a feature called <emphasis>GCTWA
    10861110              <footnote>
    1087                 <para>I believe that theacronym comes from MACLISP,
    1088                 where it stood for "Garbage Collection ofTruly
     1111                <para>I believe that the acronym comes from MACLISP,
     1112                where it stood for "Garbage Collection of Truly
    10891113                Worthless Atoms".</para>
    10901114              </footnote>
    10911115              , </emphasis>the vector which contains the
    1092               internalsymbols of the current package is marked on
     1116              internal symbols of the current package is marked on
    10931117              entry to the mark phasebut the symbols themselves are
    1094               not marked at this time. Near the endof the mark phase,
     1118              not marked at this time. Near the end of the mark phase,
    10951119              symbols referenced from this vector which are
    1096               nototherwise marked are marked if and only if they're
    1097               somehowdistinguishable from newly created symbols (by
    1098               virtue of their havingfunction bindings, value bindings,
     1120              not otherwise marked are marked if and only if they're
     1121              somehow distinguishable from newly created symbols (by
     1122              virtue of their having function bindings, value bindings,
    10991123              plists, or other attributes.)</para>
    11001124        </listitem>
    11011125        <listitem>
    11021126          <para>Pools have their first element set to NIL before any
    1103           otherelements are marked.</para>
     1127          other elements are marked.</para>
    11041128        </listitem>
    11051129        <listitem>
     
    11141138        </listitem>
    11151139        </orderedlist>
    1116         <para>At the end of the mark phase, the markbits of all objects which
    1117         are transitively reachable from the roots are set and all other markbits
    1118         are clear.</para>
     1140        <para>At the end of the mark phase, the markbits of all
     1141        objects which are transitively reachable from the roots are
     1142        set and all other markbits are clear.</para>
    11191143      </sect2>
    11201144
     
    12951319     
    12961320      </itemizedlist>
    1297       <para>The intergenerational references table is maintained
    1298       indirectly: whenever a setf operation that may introduce an
    1299       intergenerational reference occurs, a pointer to the doubleword
    1300       being stored into is pushed onto the <emphasis>memo
    1301       buffer</emphasis>, which is a stack whos top is addressed by the
    1302       memo register. Whenever the memo buffer overflows<tip><para>A
    1303       guard page at the end of the memo buffer simplifies overflow
    1304       detection.</para></tip> when the EGC is active, the handler
    1305       scans the buffer and sets bits in the intergenerational
    1306       references table for each doubleword address it finds in the
    1307       buffer that belongs to some generation other than the youngest;
    1308       the same scan is performed on entry to any ephemeral GC.  After
    1309       (possibly) performing this scan, the handler resets the memo
    1310       register to point to the bottom of the memo stack; this means
    1311       that when the EGC is inactive, the memo buffer is constantly
    1312       being filled and emptied for no apparent reason.</para>
    13131321      <para>With one exception (the implicit setfs that occur on entry
    13141322      to and exit from the binding of a special variable), all setfs
    13151323      that might introduce an intergenerational reference must be
    1316       memoized.<tip><para>Note that the implicit setfs that occur when
     1324      memoized.@footnote{Note that the implicit setfs that occur when
    13171325      initializing an object - as in the case of a call to cons or
    13181326      vector - can't introduce intergenerational references, since the
    13191327      newly created object is always younger than the objects used to
    1320       initialize it.</para></tip> It's always safe to push any cons
    1321       cell or gvector locative onto the memo stack; it's never safe to
    1322       push anything else.</para>
     1328      initialize it.} It's always safe to push any cons cell or
     1329      gvector locative onto the memo stack; it's never safe to push
     1330      anything else.
     1331      </para>
     1332
    13231333      <para>Typically, the intergenerational references bitvector is
    13241334      sparse: a relatively small number of old locations are stored
     
    13271337      of work and usually does it fairly often; it uses a simple,
    13281338      brute-force method but might run faster if it was smarter about
    1329       recognizing addresses that it'd already seen.</para>
     1339      recognizing addresses that it'd already seen.
     1340      </para>
     1341
    13301342      <para>When the EGC mark and forward phases scan the
    13311343      intergenerational reference bits, they can clear any bits that
    13321344      denote doublewords that definitely do not contain
    1333       intergenerational references.</para>
     1345      intergenerational references.
     1346      </para>
    13341347    </sect1>
    13351348
    13361349    <sect1 id="Fasl-files">
    13371350      <title>Fasl files</title>
    1338       <para>The information in this section was current in November
    1339       2004.  Saving and loading of Fasl files is implemented in
     1351      <para>Saving and loading of Fasl files is implemented in
    13401352      xdump/faslenv.lisp, level-0/nfasload.lisp, and lib/nfcomp.lisp.
    13411353      The information here is only an overview, which might help when
     
    13441356      format; there are a few differences, but they are minor.  The
    13451357      name "nfasload" comes from the fact that this is the so-called
    1346       "new" Fasl system, which was true in 1986 or so.  The format has
    1347       held up well, although it would certainly need extensions to
    1348       deal with 64-bit data, and some other modernization might be
    1349       possible.</para>
     1358      "new" Fasl system, which was true in 1986 or so.  </para>
    13501359      <para>A Fasl file begins with a "file header", which contains
    13511360      version information and a count of the following "blocks".
     
    13531362      are part of a mechanism for combining multiple logical files
    13541363      into a single physical file, in order to simplify the
    1355       distribution of precompiled programs.  (Nobody seems to be doing
    1356       anything interesting with this feature, at the moment, probably
    1357       because it isn't documented.)</para>
     1364      distribution of precompiled programs. </para>
    13581365      <para>Each block begins with a header for itself, which just
    13591366      describes the size of the data that follows.</para>
Note: See TracChangeset for help on using the changeset viewer.