Opened 14 years ago

Last modified 10 years ago

#234 assigned enhancement

[ACL2] Garbage Collection Performance on OpenMCL

Reported by: ragerdl Owned by: gb
Priority: minor Milestone:
Component: Runtime (threads, GC) Version:
Keywords: garbage collection ephemeral faster Cc: hunt@…, boyer@…, gb@…

Description (last modified by gb)

The folks at UTexas have been playing around with mergesort, and since you're already improving the garbage collector, you would probably like to see this problematic example.

Can you make running the below example with garbage collection enabled as fast as (or at least closer in performance to) executing when the garbage collector has really large GC thresholds? With GC effectively disabled, non-parallelized code speeds up by a factor close to 15. And parallelized code (via my threading library) speeds up by a factor close to 100.

This smaller example yields a slow down of about 3 when using default garbage collection thresholds. This example runs on ACL2-3.3 compiled with OpenMCL x86-64.

If you would like, I can also send you the parallelized mergesort example. Running this would require you to download PACL2, but you've done this before, so maybe this would be a good thing to do.

Thanks, David

(include-book "finite-set-theory/osets/sets" :dir :system)

(defmacro assign$ (name value)
 `(make-event (pprogn (f-put-global
                       (quote ,name)
                      (value '(value-triple nil)))))

(in-package "SETS")

(defun integers (n acc)
 (if (zp n)
     (reverse acc)
   (integers (1- n) (cons n acc))))

(ACL2::assign$ int-list (integers 3200000 nil))
; For true dramatic effect use this commented out version instead
; (ACL2::assign$ int-list (integers 32000000 nil))

(ACL2::cw "Timing the serial verions of mergesort")

(ACL2::assign$ serial-result (time$ (SETS::mergesort-exec (@ int-list))))

(ccl:set-lisp-heap-gc-threshold (* 4 16777216))
(ccl:configure-egc (expt 2 16) (expt 2 17) (expt 2 17))
; For true dramatic effect use this commented out version instead
; (ccl:configure-egc (expt 2 30) (expt 2 31) (expt 2 32))


  (let ((*package* (find-package "CCL")))
    (eval (read-from-string "

 ;;; Work of Gary Byers.

 ;;; The #_ and #$ reader macros in the code below are part of
 ;;; OpenMCL's ffi; you'd basically need to hide this code in
 ;;; a file that's isolated from other implementations.
 (defun acl2::physical-memory ()
    (rlet ((info :sysinfo))
      (if (eql 0 (#_sysinfo info))
        (pref info :sysinfo.totalram))))"))))

 (defparameter *gc-min-threshold* (expt 2 31))

 (defparameter *max-mem-usage*
   (let ((phys (physical-memory)))
     (max (floor (* 6/8 phys))

 (defun set-and-reset-gc-thresholds ()
   (let ((n (max (- *max-mem-usage* (ccl::%usedbytes))
     (cond ((not (eql n (ccl::lisp-heap-gc-threshold)))
            (ccl::set-lisp-heap-gc-threshold n)
   (cond ((not (eql *gc-min-threshold*
          (ccl::set-lisp-heap-gc-threshold *gc-min-threshold*)

 (defun set-max-mem-usage (max)
   (setf *max-mem-usage* max)

  #'(lambda ()
       (slot-value ccl:*application*


 (ccl::gc-verbose t t)
 (ccl:egc nil)


(ACL2::cw "GC Disabled.  Timing the serial verions of mergesort (again)")

(ACL2::assign$ serial-result (time$ (SETS::mergesort-exec (@ int-list))))

Change History (2)

comment:1 Changed 14 years ago by gb

  • Description modified (diff)
  • Status changed from new to assigned

I added some formatting to your original message; I think that it makes the code easier to read.

If you're asking whether garbage collection of an image containing > 50MB (or ~ half a gigabyte) of list structure can be as fast - all other things being equal - as GC would be if that list wasn't live ... well, zero-cost GC would be a good thing. How would it work ?

The time that a GC takes is generally proportional to the amount of live (non-garbage) data in the heap. It's also affected by the structure of that data: the GC has to prove that each and every one of those 3,200,000 CONS cells is reachable (and, in general, that their CARs are, though the CARs in your example are just fixnums.) It is hard to know how to tell what's live data and what isn't without looking at every live object; you can maybe reduce the incremental cost of looking, but it's likely to hold true that this is still O(n).

If your mergesort algorithm works by making many copies of some or all of the source list (and these copies are all "live", reachable, non-garbage, relatively long-lived objects), then n increases and GC time increases along with it.

If the mergesort algorithm works by having each thread repeatedly cons relatively short-lived objects, then enabling the EGC (or rather, not disabling it) may improve locality as well as reduce GC times. I don't know how that algorithm works.

If you're saying that you want to cons a lot, have the GC collect what it can, and not have this cost anything ... good luck. I'll try to look at this, but in general I wouldn't be surprised if GC costs dominate execution times of things that allocate and retain lots of objects; if you can implement a parallel mergesort that doesn't allocate and retain lots of objects, its execution time shouldn't be dominated by GC time. If most of the consing you're doing in your algorithm is incidental, then it isn't worthwhile to do full GC (and once again prove that every cons cell in the original list is still live), and turning off the EGC wouldn't make much sense.

(None of this has anything to do with the cost of processing large numbers of weak object references, which is currently sometimes non-linear.)

comment:2 Changed 10 years ago by rme

  • Priority changed from major to minor
Note: See TracTickets for help on using tickets.