source: trunk/source/doc/src/gc.xml @ 8820

Last change on this file since 8820 was 8820, checked in by jaj, 12 years ago

This commit includes support for docbook 4.5, stylesheet changes, and updated documentation.

In order to support docbook 4.5 in nXML mode, I have added a new directory called docbook-rng-4.5 and changed schemas.xml to point to it. This should just work when editing the documentation in EMACS.

The two most obvious changes to the stylesheets are that the table of contents for each chapter now occurs at the beginning of the chapter, and the format for refentries is cleaner and more concise.

I think that we should consistently use refentry elements for all of the definitions of functions, macros, variables, etc. This retains the structured data for the definitions that can be reformatted to have different appearences by the stylesheets. We should also consistently use other docbook elements such as function and varname. I'm not really happy with their appearance right now, but that can be easily tweaked in the stylesheets as long as they are consistently used throughout the documentation.

File size: 22.5 KB
Line 
1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
3<!ENTITY rest "<varname>&amp;rest</varname>">
4<!ENTITY key "<varname>&amp;key</varname>">
5<!ENTITY optional "<varname>&amp;optional</varname>">
6<!ENTITY body "<varname>&amp;body</varname>">
7<!ENTITY aux "<varname>&amp;aux</varname>">
8<!ENTITY allow-other-keys "<varname>&amp;allow-other-keys</varname>">
9<!ENTITY CCL "<literal>CCL</literal>">
10]>
11  <chapter id="Understanding-and-Configuring-the-Garbage-Collector">
12    <title>Understanding and Configuring the Garbage Collector</title>
13
14    <sect1 id="Heap-space-allocation">
15      <title>Heap space allocation</title>
16      <para>Release 0.10 or later of &CCL; uses a different memory
17      management scheme than previous versions did. Those earlier
18      versions would allocate a block of memory (of specified size) at
19      startup and would allocate lisp objects within that block. When
20      that block filled with live (non-GCed) objects, the lisp would
21      signal a "heap full" condition. The heap size imposed a limit on
22      the size of the largest object that could be allocated.</para>
23      <para>The new strategy involves reserving a very large (2GB on
24      DarwinPPC32, 1GB on LinuxPPC, "very large" on 64-bit
25      implementations) block at startup and consuming (and
26      relinquishing) its contents as the size of the live lisp heap
27      data grows and shrinks. After the initial heap image loads and
28      after each full GC, the lisp kernel will try to ensure that a
29      specified amount (the "lisp-heap-gc-threshold") of free memory
30      is available. The inital value of this kernel variable is 16MB
31      on 32-bit implementations and 32MB on 64-bit implementations ;
32      it can be manipulated from Lisp (see below.)</para>
33      <para>The large reserved memory block consumes very little in
34      the way of system resources; memory that's actually committed to
35      the lisp heap (live data and the "threshold" area where
36      allocation takes place) consumes finite resources (physical
37      memory and swap space). The lisp's consumption of those
38      resources is proportional to its actual memory usage, which is
39      generally a good thing.</para>
40      <para>This scheme is much more flexible than the old one, but it
41      may also increase the possibility that those resources can
42      become exhausted.  Neither the new scheme nor the old handles
43      that situation gracefully; under the old scheme, a program that
44      consumes lots of memory may have run into an artificial limit on
45      heap size before exhausting virtual memory.</para> 
46
47      <para>The -R or &ndash;heap-reserve command-line option can be
48      use to limit the size of the reserved block and therefore bound
49      heap expansion. Running</para>
50      <programlisting>
51> openmcl --heap-reserve 8M
52</programlisting>
53      <para>would provide an execution environment that's very similar to
54that provided by earlier &CCL; versions.</para>
55    </sect1>
56
57    <sect1 id="The-Ephemeral-GC">
58      <title>The Ephemeral GC</title>
59      <para>For many programs, the following observations are true to
60      a very large degree:</para>
61
62      <orderedlist continuation="restarts" inheritnum="ignore">
63        <listitem>
64          <para>Most heap-allocated objects have very short lifetimes ("are
65          ephemeral"): they become inaccessible soon after they&#39;re created.</para>
66        </listitem>
67
68        <listitem>
69          <para>Most non-ephemeral objects have very long lifetimes: it&#39;s
70          rarely productive for the GC to consider reclaiming them, since
71          it&#39;s rarely able to do so. (An object that&#39;s survived a large
72          number of GCs is likely to survive the next one. That&#39;s not always
73          true of course, but it&#39;s a reasonable heuristic.)</para>
74        </listitem>
75
76        <listitem>
77          <para>It&#39;s relatively rare for an old object to be destructively
78          modified (via SETF) so that it points to a new one, therefore most
79          references to newly-created objects can be found in the stacks and
80          registers of active threads. It&#39;s not generally necessary to scan
81          the entire heap to find references to new objects (or to prove that
82          such references don&#39;t exists), though it is necessary to keep
83          track of the (hopefully exceptional) cases where old objects are
84          modified to point at new ones.</para>
85        </listitem>
86      </orderedlist>
87      <orderedlist continuation="restarts" inheritnum="ignore">
88        <listitem>
89          <para>Most heap-allocated objects have very short lifetimes ("are
90          ephemeral"): they become inaccessible soon after they&#39;re created.</para>
91        </listitem>
92
93        <listitem>
94          <para>Most non-ephemeral objects have very long lifetimes: it&#39;s
95          rarely productive for the GC to consider reclaiming them, since
96          it&#39;s rarely able to do so. (An object that&#39;s survived a large
97          number of GCs is likely to survive the next one. That&#39;s not always
98          true of course, but it&#39;s a reasonable heuristic.)</para>
99        </listitem>
100
101        <listitem>
102          <para>It&#39;s relatively rare for an old object to be destructively
103          modified (via SETF) so that it points to a new one, therefore most
104          references to newly-created objects can be found in the stacks and
105          registers of active threads. It&#39;s not generally necessary to scan
106          the entire heap to find references to new objects (or to prove that
107          such references don&#39;t exists), though it is necessary to keep
108          track of the (hopefully exceptional) cases where old objects are
109          modified to point at new ones.</para>
110        </listitem>
111      </orderedlist>
112
113      <para>"Ephemeral" (or "generational") garbage collectors try to
114      exploit these observations: by concentrating on frequently
115      reclaiming newly-created objects quickly, it's less often
116      necessary to do more expensive GCs of the entire heap in order
117      to reclaim unreferenced memory.  In some environments, the
118      pauses associated with such full GCs can be noticable and
119      disruptive, and minimizing the frequency (and sometimes the
120      duration) of these pauses is probably the EGC's primary goal
121      (though there may be other benefits, such as increased locality
122      of reference and better paging behavior.) The EGC generally
123      leads to slightly longer execution times (and slightly higher,
124      amortized GC time), but there are cases where it can improve
125      overall performance as well; the nature and degree of its impact
126      on performance is highly application-dependant.</para>
127      <para>Most EGC strategies (including the one employed by
128      &CCL;) logically or physically divide memory into one or more
129      areas of relatively young objects ("generations") and one or
130      more areas of old objects.  Objects that have survived one or
131      more GCs as members of a young generation are promoted (or
132      "tenured") into an older generation, where they may or may not
133      survive long enough to be promoted to the next generation and
134      eventually may become "old" objects that can only be reclaimed
135      if a full GC proves that there are no live references to them.
136      This filtering process isn't perfect - a certain amount of
137      premature tenuring may take place - but it usually works very
138      well in practive.</para>
139      <para>It's important to note that a GC of the youngest
140      generation is typically very fast (perhaps a few milliseconds on
141      a modern CPU, depending on various factors), &CCL;'s EGC is
142      not concurrent and doesn't offer realtime guarantees.</para>
143      <para>&CCL;'s EGC maintains three ephemeral generations; all
144      newly created objects are created as members of the youngest
145      generation. Each generation has an associated
146      <emphasis>threshold</emphasis>, which indicates the number of
147      bytes in it and all younger generations that can be allocated
148      before a GC is triggered. These GCs will involve the target
149      generation and all younger ones (and may therefore cause some
150      premature tenuring); since the older generations have larger
151      thresholds, they're GCed less frequently and most short-lived
152      objects that make it into an older generation tend not to
153      survive there very long.</para>
154      <para>The EGC can be <emphasis>enabled</emphasis> or
155      <emphasis>disabled</emphasis> under program control; under some
156      circumstances, it may be enabled but
157      <emphasis>inactive</emphasis> (because a full GC is imminent.)
158      Since it may be hard to know or predict the consing behavior of
159      other threads, the distinction between the "active" and
160      "inactive" state isn't very meaningful, especially when native
161      threads are involved.</para>
162    </sect1>
163
164    <sect1 id="GC-Page-reclamation-policy">
165      <title>GC Page reclamation policy</title>
166      <para>After a full GC finishes, it'll try to ensure that at
167      least (LISP-HEAP-GC-THRESHOLD) of virtual memory are available;
168      objects will be allocated in this block of memory until it fills
169      up, the GC is triggered, and the process repeats itself.</para>
170      <para>Many programs reach near stasis in terms of the amount of
171      logical memory that's in use after full GC (or run for long
172      periods of time in a nearly static state), so the logical
173      address range used for consing after the Nth full GC is likely
174      to be nearly or entirely identical to the address range used by
175      the N+1th full GC.</para>
176      <para>By default (and traditionally in &CCL;), the GC's policy
177      is to "release" the pages in this address range: to advise the
178      virtual memory system that the pages contain garbage and any
179      physical pages associated with them don't need to be swapped out
180      to disk before being reused and to (re-)map the logical address
181      range so that the pages will be zero-filled by the virtual
182      memory system when they're next accessed.  This policy is
183      intended to reduce the load on the VM system and keep &CCL;'s
184      working set to a minimum.</para>
185      <para>For some programs (especially those that cons at a very
186      high rate), the default policy may be less than ideal: releasing
187      pages that're going to be needed almost immediately - and
188      zero-fill-faulting them back in, lazily - incurs unnecessary
189      overhead. (There's a false economy associated with minimizing
190      the size of the working set if it's just going to shoot back up
191      again until the next GC.) A policy of "retaining" pages between
192      GCs might work better in such an environment.</para>
193      <para>Functions described below give the user some control over
194      this behavior. An adaptive, feedback-mediated approach might
195      yield a better solution.</para>
196    </sect1>
197
198    <sect1 id="iPure--areas-are-read-only--paged-from-image-file">
199      <title>"Pure" areas are read-only, paged from image file</title>
200      <para>SAVE-APPLICATION identifies code vectors and the pnames of
201      interned symbols and copies these objects to a "pure" area of
202      the image file it creates. (The "pure" area accounts for most of
203      what the ROOM function reports as "static" space.)</para>
204      <para>When the resulting image file is loaded, the pure area of
205      the file is now memory-mapped with read-only access. Code and
206      pure data are paged in from the image file as needed (and don't
207      compete for global virtual memory resources with other memory
208      areas.)</para>
209      <para>Code-vectors and interned symbol pnames are immutable : it
210      is an error to try to change the contents of such an
211      object. Previously, that error would have manifested itself in
212      some random way. In the new scheme, it'll manifest itself as an
213      "unhandled exception" error in the Lisp kernel. The kernel could
214      probably be made to detect a spurious, accidental write to
215      read-only space and signal a lisp error in that case, but it
216      doesn't yet do so.</para>
217      <para>The image file should be opened and/or mapped in some mode
218      which disallows writing to the memory-mapped regions of the file
219      from other processes. I'm not sure of how to do that; writing to
220      the file when it's mapped by &CCL; can have unpredictable and
221      unpleasant results.  SAVE-APPLICATION will delete its output
222      file's directory entry and create a new file; one may need to
223      exercise care when using file system utilities (like tar, for
224      instance) that might overwrite an existing image file.</para>
225    </sect1>
226
227    <sect1 id="Weak-Hash-Tables">
228      <title>Weak Hash Tables</title>
229      <para>In general, a "weak reference" is a reference to an object
230      which will not prevent the object from being garbage-collected.
231      For example, suppose that you want to keep a list of all the
232      objects of a certain type.  If you don't take special steps, the
233      fact that you have a list of them will mean that the objects are
234      always "live", because you can always reference them through the
235      list.  Therefore, they will never be garbage-collected, and
236      their memory will never be reclaimed, even if they are
237      referenced nowhere else in the program.  You may want this
238      behaviour.  If you don't, you need weak references.</para>
239      <para>&CCL; supports weak references with "weak hash tables".
240      Hash tables may be weak with respect to either their keys or
241      their values.  To make a hash table with weak keys, invoke
242      <literal>make-hash-table</literal> with the option :weak t, or,
243      equivalently, :weak :key.  To make one with weak values, use
244      :weak :value.  When the key is weak, the equality test must be
245      #'eq (because it wouldn't make sense otherwise).</para>
246      <para>When garbage-collection occurs, key-value pairs are
247      removed from the hash table if there are no other references to
248      the weak element of the pair (key or value).</para>
249      <para>In general, weak-key hash tables are useful when you want
250      to use the hash to store some extra information about the
251      objects you look up in it, while weak-value hash tables are
252      useful when you want to use the hash as an index for looking up
253      objects.</para>
254      <para>If you are experimenting with weak hash tables
255      interactively, remember that an object is not dead if it was
256      returned by one of the last three interactively-evaluated
257      expressions, because of the variables <literal>*</literal>,
258      <literal>**</literal>, and <literal>***</literal>.  The easy
259      workaround is to evaluate some meaningless expression before
260      invoking <literal>gc</literal>, to get the object out of the
261      repl variables.</para>
262    </sect1>
263
264    <sect1 id="Garbage-Collection-Dictionary">
265      <title>Garbage-Collection Dictionary</title>
266
267      <refentry id="f_lisp-heap-gc-threshold">
268        <indexterm zone="f_lisp-heap-gc-threshold">
269          <primary>lisp-heap-gc-threshold</primary>
270        </indexterm>
271
272        <refnamediv>
273          <refname>LISP-HEAP-GC-THRESHOLD</refname>
274          <refpurpose></refpurpose>
275          <refclass>Function</refclass>
276        </refnamediv>
277
278        <refsynopsisdiv>
279          <synopsis><function>lisp-heap-gc-threshold</function></synopsis>
280        </refsynopsisdiv>
281
282        <refsect1>
283          <title>Description</title>
284
285          <para>Returns the value of the kernel variable that specifies the
286          amount of free space to leave in the heap after full GC.</para>
287        </refsect1>
288      </refentry>
289
290      <refentry id="f_set-lisp-heap-gc-threshold">
291        <indexterm zone="f_set-lisp-heap-gc-threshold">
292          <primary>set-lisp-heap-gc-threshold</primary>
293        </indexterm>
294
295        <refnamediv>
296          <refname>SET-LISP-HEAP-GC-THRESHOLD</refname>
297          <refpurpose></refpurpose>
298          <refclass>Function</refclass>
299        </refnamediv>
300
301        <refsynopsisdiv>
302          <synopsis>
303            <function>lisp-heap-gc-threshold new-threshold</function>
304          </synopsis>
305        </refsynopsisdiv>
306
307        <refsect1>
308          <title>Arguments and Values</title>
309
310          <variablelist>
311            <varlistentry>
312              <term>new-value</term>
313
314              <listitem>
315                <para>The requested new lisp-heap-gc-threshold.</para>
316              </listitem>
317            </varlistentry>
318          </variablelist>
319        </refsect1>
320
321        <refsect1>
322          <title>Description</title>
323
324          <para>Sets the value of the kernel variable that specifies the
325          amount of free space to leave in the heap after full GC to
326          new-value, which should be a non-negative fixnum. Returns the
327          value of that kernel variable (which may be somewhat larger than
328          what was specified).</para>
329        </refsect1>
330      </refentry>
331
332      <refentry id="f_use-lisp-heap-gc-threshold">
333        <indexterm zone="f_use-lisp-heap-gc-threshold">
334          <primary>use-lisp-heap-gc-threshold</primary>
335        </indexterm>
336
337        <refnamediv>
338          <refname>USE-LISP-HEAP-GC-THRESHOLD</refname>
339          <refpurpose></refpurpose>
340          <refclass>Function</refclass>
341        </refnamediv>
342
343        <refsynopsisdiv>
344          <synopsis>
345            <function>use-lisp-heap-gc-threshold</function>
346          </synopsis>
347        </refsynopsisdiv>
348
349        <refsect1>
350          <title>Description</title>
351
352          <para>Tries to grow or shrink lisp&#39;s heap space, so that the
353          free space is (approximately) equal to the current heap threshold.
354          Returns NIL</para>
355        </refsect1>
356      </refentry>
357
358      <refentry id="f_egc">
359        <indexterm zone="f_egc">
360          <primary>egc</primary>
361        </indexterm>
362
363        <refnamediv>
364          <refname>EGC</refname>
365          <refpurpose></refpurpose>
366          <refclass>Function</refclass>
367        </refnamediv>
368
369        <refsynopsisdiv>
370          <synopsis><function>egc</function> arg</synopsis>
371        </refsynopsisdiv>
372
373        <refsect1>
374          <title>Arguments and Values</title>
375
376          <variablelist>
377            <varlistentry>
378              <term>arg</term>
379
380              <listitem>
381                <para>a generalized boolean</para>
382              </listitem>
383            </varlistentry>
384          </variablelist>
385        </refsect1>
386
387        <refsect1>
388          <title>Description</title>
389
390          <para>Enables the EGC if arg is non-nil, disables the EGC
391          otherwise. Returns the previous enabled status. Although this
392          function is thread-safe (in the sense that calls to it are
393          serialized), it doesn&#39;t make a whole lot of sense to be
394          turning the EGC on and off from multiple threads ...</para>
395        </refsect1>
396      </refentry>
397
398      <refentry id="f_egc-enabled-p">
399        <indexterm zone="f_egc-enabled-p">
400          <primary>egc-enabled-p</primary>
401        </indexterm>
402
403        <refnamediv>
404          <refname>EGC-ENABLED-P</refname>
405          <refpurpose></refpurpose>
406          <refclass>Function</refclass>
407        </refnamediv>
408
409        <refsynopsisdiv>
410          <synopsis><function>egc-enabled-p</function></synopsis>
411        </refsynopsisdiv>
412
413        <refsect1>
414          <title>Description</title>
415
416          <para>Returns T if the EGC was enabled at the time of the call,
417          NIL otherwise.</para>
418        </refsect1>
419      </refentry>
420
421      <refentry id="f_egc-active-p">
422        <indexterm zone="f_egc-active-p">
423          <primary>egc-active-p</primary>
424        </indexterm>
425
426        <refnamediv>
427          <refname>EGC-ACTIVE-P</refname>
428          <refpurpose></refpurpose>
429          <refclass>Function</refclass>
430        </refnamediv>
431
432        <refsynopsisdiv>
433          <synopsis><function>egc-active-p</function></synopsis>
434        </refsynopsisdiv>
435
436        <refsect1>
437          <title>Description</title>
438
439          <para>Returns T if the EGC was active at the time of the call, NIL
440          otherwise. Since this is generally a volatile piece of
441          information, it&#39;s not clear whether this function serves a
442          useful purpose when native threads are involved.</para>
443        </refsect1>
444      </refentry>
445
446      <refentry id="f_egc-configuration">
447        <indexterm zone="f_egc-configuration">
448          <primary>egc-configuration</primary>
449        </indexterm>
450
451        <refnamediv>
452          <refname>EGC-CONFIGURATION</refname>
453          <refpurpose></refpurpose>
454          <refclass>Function</refclass>
455        </refnamediv>
456
457        <refsynopsisdiv>
458          <synopsis><function>egc-configuration</function></synopsis>
459        </refsynopsisdiv>
460
461        <refsect1>
462          <title>Description</title>
463
464          <para>Returns, as multiple values, the sizes in kilobytes of the
465          thresholds associated with the youngest ephemeral generation, the
466          middle ephemeral generation, and the oldest ephemeral generation</para>
467        </refsect1>
468      </refentry>
469
470      <refentry id="f_configure-gcc">
471        <indexterm zone="f_configure-gcc">
472          <primary>configure-gcc</primary>
473        </indexterm>
474
475        <refnamediv>
476          <refname>CONFIGURE-GCC</refname>
477          <refpurpose></refpurpose>
478          <refclass>Function</refclass>
479        </refnamediv>
480
481        <refsynopsisdiv>
482          <synopsis><function>configure-egc</function>
483          generation-0-size generation-1-size
484          generation-2-size</synopsis>
485        </refsynopsisdiv>
486
487        <refsect1>
488          <title>Arguments and Values</title>
489
490          <variablelist>
491            <varlistentry>
492              <term>generation-0-size</term>
493
494              <listitem>
495                <para>the requested threshold size of the youngest
496                generation, in kilobytes</para>
497              </listitem>
498            </varlistentry>
499
500            <varlistentry>
501              <term>generation-1-size</term>
502
503              <listitem>
504                <para>the requested threshold size of the middle generation,
505                in kilobytes</para>
506              </listitem>
507            </varlistentry>
508
509            <varlistentry>
510              <term>generation-2-size</term>
511
512              <listitem>
513                <para>the requested threshold size of the oldest generation,
514                in kilobytes</para>
515              </listitem>
516            </varlistentry>
517          </variablelist>
518        </refsect1>
519
520        <refsect1>
521          <title>Description</title>
522
523          <para>If the EGC is currently disabled, puts the indicated
524          threshold sizes in effect and returns T, otherwise, returns NIL.
525          (The provided threshold sizes are rounded up to a multiple of
526          64Kbytes in &CCL; 0.14 and to a multiple of 32KBytes in earlier
527          versions.)</para>
528        </refsect1>
529      </refentry>
530
531      <refentry id="f_gc-retain-pages">
532        <indexterm zone="f_gc-retain-pages">
533          <primary>gc-retain-pages</primary>
534        </indexterm>
535
536        <refnamediv>
537          <refname>GC-RETAIN-PAGES</refname>
538          <refpurpose></refpurpose>
539          <refclass>Function</refclass>
540        </refnamediv>
541
542        <refsynopsisdiv>
543          <synopsis><function>gc-retain-pages</function> arg</synopsis>
544        </refsynopsisdiv>
545
546        <refsect1>
547          <title>Arguments and Values</title>
548
549          <variablelist>
550            <varlistentry>
551              <term>arg</term>
552
553              <listitem>
554                <para>a generalized boolean</para>
555              </listitem>
556            </varlistentry>
557          </variablelist>
558        </refsect1>
559
560        <refsect1>
561          <title>Description</title>
562
563          <para>Tries to influence the GC to retain/recycle the pages
564          allocated between GCs if arg is true, and to release them
565          otherwise. This is generally a tradeoff between paging and other
566          VM considerations.</para>
567        </refsect1>
568      </refentry>
569
570      <refentry id="f_gc-retaining-pages">
571        <indexterm zone="f_gc-retaining-pages">
572          <primary>gc-retaining-pages</primary>
573        </indexterm>
574
575        <refnamediv>
576          <refname>GC-RETAINING-PAGES</refname>
577          <refpurpose></refpurpose>
578          <refclass>Function</refclass>
579        </refnamediv>
580
581        <refsynopsisdiv>
582          <synopsis><function>gc-retaining-pages</function></synopsis>
583        </refsynopsisdiv>
584
585        <refsect1>
586          <title>Description</title>
587
588          <para>Returns T if the GC tries to retain pages between full GCs
589          and NIL if it&#39;s trying to release them to improve VM paging
590          performance.</para>
591        </refsect1>
592      </refentry>
593    </sect1>
594  </chapter>
Note: See TracBrowser for help on using the repository browser.