# Changeset 15534

Ignore:
Timestamp:
Dec 11, 2012, 6:10:59 PM (8 years ago)
Message:

Translate the rest of the GC chapter. It could now use an update,
both for content and style.

File:
1 edited

### Legend:

Unmodified
 r15520 \chapter{The Garbage Collector} \section{Heap Space Allocation} \CCL\ manages a contiguous region of memory.  It maps a large block of address space at startup, and then uses (and returns) memory from that area as the size of the live lisp heap data grows and shrinks.  After the initial heap image loads and after each full GC, the lisp kernel will try to ensure that a specified amount (the lisp-heap-gc-threshold'') of free memory is available. The initial value of this kernel variable is 16MB on 32-bit implementations and 32MB on 64-bit implementations; it can be manipulated from Lisp (see below.) The large reserved memory block consumes very little in the way of system resources; memory that's actually committed to the lisp heap (live data and the threshold'' area where allocation takes place) consumes finite resources (physical memory and swap space). The lisp's consumption of those resources is proportional to its actual memory usage, which is generally a good thing. The \cd{-R} or \cd{-{}-heap-reserve} command-line option can be used to limit the size of the reserved block and therefore bound heap expansion. \section{Ephemeral GC} \end{defun} \section{Page Reclamation Policy} After a full GC finishes, it'll try to ensure that at least \cd{(lisp-heap-gc-threshold)} of virtual memory are available; objects will be allocated in this block of memory until it fills up, the GC is triggered, and the process repeats itself. Many programs reach near stasis in terms of the amount of logical memory that's in use after full GC (or run for long periods of time in a nearly static state), so the logical address range used for consing after the $n$th full GC is likely to be nearly or entirely identical to the address range used by the $n+1$th full GC. By default (and traditionally in CCL), the GC's policy is to release'' the pages in this address range: to advise the virtual memory system that the pages contain garbage and any physical pages associated with them don't need to be swapped out to disk before being reused and to (re-)map the logical address range so that the pages will be zero-filled by the virtual memory system when they're next accessed. This policy is intended to reduce the load on the VM system and keep CCL's working set to a minimum. For some programs (especially those that cons at a very high rate), the default policy may be less than ideal: releasing pages that are going to be needed almost immediately---and zero-fill-faulting them back in, lazily---incurs unnecessary overhead. (There's a false economy associated with minimizing the size of the working set if it's just going to shoot back up again until the next GC.) A policy of retaining'' pages between GCs might work better in such an environment. Functions described below give the user some control over this behavior. An adaptive, feedback-mediated approach might yield a better solution. \begin{defun}[Function] lisp-heap-gc-threshold Returns the value of the kernel variable that specifies the amount of free space to leave in the heap after full gc. \end{defun} \begin{defun}[Function] set-lisp-heap-gc-threshold new-threshold Sets the value of the kernel variable that specifies the amount of free space to leave in the heap after full GC to {\it new-threshold}, which should be a non-negative fixnum. Returns the value of that kernel variable (which may be somewhat larger than what was specified). \end{defun} \begin{defun}[Function] use-lisp-heap-gc-threshold Tries to grow or shrink lisp's heap space, so that the free space is (approximately) equal to the current heap threshold. Returns \nil. \end{defun} \begin{defun}[Function] gc-retain-pages flag Tries to influence the GC to retain/recycle the pages allocated between GCs if {\it flag} is true, and to release them otherwise. This is generally a tradeoff between paging and other VM considerations. \end{defun} \begin{defun}[Function] gc-retaining-pages Returns \cd{t} if the GC tries to retain pages between full GCs and \nil\ if it's trying to release them to improve VM paging performance. \end{defun} \section{Pure Areas} The function \fn{save-application} identifies code vectors and the pnames of interned symbols and copies these objects to a "pure" area of the image file it creates. (The "pure" area accounts for most of what the \cd{room} function reports as "static" space.) When the resulting image file is loaded, the pure area of the file is memory-mapped with read-only access. Code and pure data are paged in from the image file as needed (and don't compete for global virtual memory resources with other memory areas.) Code-vectors and interned symbol pnames are immutable: it is an error to try to change the contents of such an object. Previously, that error would have manifested itself in some random way. In the new scheme, it'll manifest itself as an "unhandled exception" error in the Lisp kernel. The kernel could probably be made to detect a spurious, accidental write to read-only space and signal a lisp error in that case, but it doesn't yet do so. The image file should be opened and/or mapped in some mode which disallows writing to the memory-mapped regions of the file from other processes. I'm not sure of how to do that; writing to the file when it's mapped by CCL can have unpredictable and unpleasant results. \cd{save-application} will delete its output file's directory entry and create a new file; one may need to exercise care when using file system utilities (like tar, for instance) that might overwrite an existing image file. \section{Weak References} In general, a weak reference'' is a reference to an object which does not prevent the object from being garbage-collected. For example, suppose that you want to keep a list of all the objects of a certain type. If you don't take special steps, the fact that you have a list of them will mean that the objects are always live, because you can always reference them through the list. Therefore, they will never be garbage-collected, and their memory will never be reclaimed, even if they are referenced nowhere else in the program. If you don't want this behavior, you need weak references. CCL supports weak references with two kinds of objects: weak hash tables and populations. Weak hash tables are created with the standard Common Lisp function \cd{make-hash-table}, which is extended to accept the keyword argument \cd{:weak}. Hash tables may be weak with respect to either their keys or their values. To make a hash table with weak keys, invoke make-hash-table with the option \cd{:weak t}, or, equivalently, \cd{:weak :key}. To make one with weak values, use \cd{:weak :value}. When the key is weak, the equality test must be \cd{eq} (because it wouldn't make sense otherwise). When garbage-collection occurs, key-value pairs are removed from the hash table if there are no non-weak references to the weak element of the pair (key or value). In general, weak-key hash tables are useful when you want to use the hash to store some extra information about the objects you look up in it, while weak-value hash tables are useful when you want to use the hash as an index for looking up objects. A population encapsulates an object, causing certain reference from the object to be considered weak. CCL supports two kinds of populations: lists, in which case the encapsulated object is a list of elements, which are spliced out of the list when there are no non-weak references to the element; and alists, in which case the encapsulated object is a list of conses which are spliced out of the list if there are no non-weak references to the car of the cons. If you are experimenting with weak references interactively, remember that the values of the last three expressions are referenced by the variables \cd{*}, \cd{**}, and \cd{***}. The easy workaround is to evaluate a few meaningless expressions in order to get the object out of the REPL variables before invoking the GC. \begin{defun}[Function] population &key :type :initial-contents The value of the \cd{:type} keyword argument must be either \cd{:list} (the default), or \cd{:alist}.  The \cd{:initial-contents} argument should be a sequence of elements (or conses, for type \cd{:alist}) which will be used to initialize the population.  The sequence itself (and the conses in case of an alist) is not stored in the population; a new list or alist is created to hold the elements. The new population is returned. \end{defun} \begin{defun}[Function] population-type population This function returns the type of {\it population}, either \cd{:list} or \cd{:alist}. \end{defun} \begin{defun}[Function] population-contents population This function returns the list encapsulated in {\it population}. Note that as long as there is a direct (non-weak) reference to this list, it will not be modified by the garbage collector. Therefore it is safe to traverse the list, and even modify it, no different from any other list. If you want the elements to become garbage-collectable again, you must stop refering to the list directly. This function may be used with \cd{setf} to update the contents of {\it population}.  The new list is copied;  it is not used directly. \end{defun}