Opened 8 years ago

Last modified 7 years ago

#921 new enhancement

CFASL support

Reported by: fare Owned by:
Priority: normal Milestone:
Component: Compiler Version: trunk
Keywords: ITA Cc:

Description

An optimization used by XCVB is to use CFASL: a facility to save and reproduce the compile-time side-effects of compiling a lisp file, as distinguished from its load-time side-effects, as suggested by James Knight.

Juho Snellman has developed a patch for SBCL that implements this functionality, since included in SBCL 1.0.32 back in 2009. Using this functionality doubled compilation speed with XCVB on a test subset of QRes.

I would like to have the same feature for CCL.

If additionally compilation to FASL or CFASL was deterministic from the input LISP file given the state of the system (e.g. without timestamps in the file or dependency on random numbers), then it might also help make distributed caching of the resulting files easier.

(Note: this was ITA bug 41937; I'll copy comments.)

Change History (18)

comment:1 Changed 8 years ago by fare

[Copying relevant comments from the ITA bugzilla, skipping some I think not relevant.]

Comment 3 Gail Zacharias 2009-02-19 22:50:03 EST

Let me see if I understand this correctly. In your make system, if you determine that A.fasl is up to date for A.lisp, instead of executing (compile-file "A.lisp"), you will execute (load "A.cfasl"), on the assumption that the latter is significantly faster than the former, and you want the effects of the two options on the running image to be the same. Is that basically right?

Do you need to catch side effects in uses of #.? Side-effects in uses of COMPILER-LET? Side-effects of macroexpansion?

comment:2 Changed 8 years ago by fare

Comment 4 Gail Zacharias 2009-02-19 23:24:34 EST (In reply to comment #0)

If additionally compilation to FASL or CFASL was deterministic from the input LISP file given the state of the system (e.g. without timestamps in the file or dependency on random numbers), then it might also help make distributed caching of the resulting files easier.

I believe that's the case. Or at least a quick scan through the compiler sources reveals no obvious randomizing influences (e.g. no maphash'es that might affect the output, etc.)

We are considering storing some identifying information about the source file, such as its modification date, in which case the dependency on the Lisp file will go beyond just its contents. But we don't do that right now.

There is of course a large dependency on the state of the Lisp itself, from obvious ones such as a largish assortment of variables that explicitly control how compilation is to be done and what info is to be stored in the fasl file, to more subtle ones such as, for example, the set of logical host names defined at compile time.

Also -- this doesn't affect the contents of the FASL file, but some compile-time warnings depend on the dynamic state of compilation-unit nesting. I.e. there are some side effects of compiling a file that don't affect the global state, but do affect the dynamically bound current compilation unit.

comment:3 Changed 8 years ago by fare

Comment 5 Francois-Rene Rideau 2009-02-20 14:12:58 EST Re: Comment #3, yes, you've got the gist of it.

We definitely do not want to record effects from #.

For macroexpansion-time effects and compiler-let effects, I'm not sure (I'd say we probably want them). However, the two should clearly be treated the same.

I think you might want to discuss things with Juho Snellman, perhaps by sending email to the sbcl-devel mailing-list.

NB: for reference, the change that Juho Snellman did for SBCL is here: http://repo.or.cz/w/sbcl/jsnell.git?a=treediff;h=refs/heads/cfasl;hp=refs/heads/master;hb=cfasl;hpb=master

Re: Comment #4, please don't include source file date, or allow the information to be overridden (with e.g. source file crypto checksum instead, or nothing at all). On the other hand, if there is a mechanism to extract information about the compiler and its settings, it would be nice if such information could be made available to the user in a documented way, so we could include it in a digest of all that constitutes the "source" for a given object. I do intend to include a digest of the compiler binary in such a digest, but if there are other files that the binary depends on, it would be nice to have an easy way to list all those files (easier than ptrace'ing the syscalls to open(2), say).

For effects depending on compilation-unit nesting, I haven't thought about it, but I don't think we need special-case them. If compiling the file triggers some warnings, loading the cfasl probably should produce the same warnings (and cause an error to be reported to make).

comment:4 Changed 8 years ago by fare

Comment 6 Francois-Rene Rideau 2009-02-25 12:21:37 EST After discussion with jsnell, it appears that his patch does NOT record macroexpansion-time side-effects or compiler-let-time side-effects.

Which makes sense.

If you want something in the CFASL, you essentially need an explicit or implicit (eval-when (:compile-toplevel ...) ...). If you want something in the FASL, you similarly need an explicit or implicit (eval-when (:load-toplevel ...) ...).

comment:5 Changed 8 years ago by fare

Comment 7 Dan Weinreb 2009-05-04 18:43:02 EDT Adding this, from GZ, 2/20:

Ok, I understand this better now. This is an optimization. I.e. xcvb wants to load a cfasl file instead of compiling the source, for an equivalent but faster effect. This means that everything that's needed to implement xcvb and convert QRes to use it should be doable without cfasl files, by just calling the compiler instead.

I recommend waiting until after Qres is converted and xcvb-based builds are tested and reliable without cfasls before adding cfasls to the mix. Note that of all the xcvb goals mentioned in bug 41937, the only one that requires cfasls is improved build times, which, even without cfasls, should be no worse than asdf-based build times, so there doesn't seem to be a compelling reason to rush this.

comment:6 Changed 8 years ago by fare

Comment 8 Dan Weinreb 2009-05-04 18:43:51 EDT And adding another one from GZ:

Ok. I'd say doing what the sbcl patch seems to do, i.e. modifying the file compiler to capture compile-toplevel forms and dumping them in a fasl file, around 3 days.

This appproach ignores compile-time side effects that don't happen through compile-time toplevel forms, which might lead to subtle problems with macros that have side-effects (e.g. interning stuff, building up a table, etc). For this reason, we would not use this approach in mainline ccl.

Once xcvb is up and running, I would be interested in finding out whether a different approach would be fast enough. What I have in mind is, instead of loading "cfasl" files, add an option in compile-file to just do the first pass, no code generation, no file output, no warnings. This would ensure correct semantics while still speeding up build times (although perhaps by not as much as using incorrect semantics...).

comment:7 Changed 8 years ago by fare

Comment 9 James Knight 2009-05-05 12:55:39 EDT

This appproach ignores compile-time side effects that don't happen through compile-time toplevel forms, which might lead to subtle problems with macros that have side-effects (e.g. interning stuff, building up a table, etc).

To me, it seems like a perfectly sensible choice that can be documented as such. It has a nice symmetry with the other eval "time"s. And macro-expanders really ought not have side-effects anyhow...

I think it cannot possibly lead to subtle problems in the XCVB model, only glaring problems, since every compilation is being done in a fresh lisp. If those un-written side-effects were necessary, your compilation won't work, always.

Furthermore, given that in systems today, it's almost always possible to recompile single files in isolation after having loaded (but not compiled) the rest of the system in the current lisp, it seems unlikely that there is actually much real-life code that depends on macro-expand-time side effects between files.

But there's only one way to really know: try it. So we should try it. :) I don't expect there will be issues which are not immediately and trivially solvable.

Probably the implementation currently in SBCL is enough to do the work with, if QRes still runs properly on SBCL these days.

comment:8 Changed 8 years ago by fare

Comment 11 Gail Zacharias 2009-05-27 13:24:18 EDT Paraphrased from a conversation with Faré:

The basic idea of cfasls is to make a distinction between code that is needed only at runtime, and code that is needed to compile other files, and use that distinction to speed up separate compilation. The cfasls need only contain code that is needed to compile other files (e.g. proclamations, macros, definitions of functions that will be invoked at macroexpand-time, etc). The more runtime-only code is left out, the faster the cfasl will load compared to loading the entire fasl, and the faster files that depend on it can start compiling.

Common Lisp provides no standard way for source code to specify whether it is only needed at runtime or whether it will be needed to compile other files. As I understand, for SBCL, the convention has been adopted that any code wrapped in eval-when (:compile-toplevel) gets output to the cfasl. In CCL, it is normally the case that definitions wrapped in :compile-toplevel are not accessible to other files, i.e. the exact opposite of the effect of putting them in a cfasl. So I think for CCL, the convention we'll adopt is that only code wrapped in both :compile-toplevel AND :load-toplevel will get output to a cfasl.

None of this has anything to do with capturing compile-time side effects of compiling a lisp file, and it is apparently not meant to do that at all.

comment:9 Changed 8 years ago by fare

Comment 12 Dan Weinreb 2009-05-27 14:31:54 EDT I think that the reason that "compile-time side effects" was the phrase being used is in the sense that when you compile a file, as you go, certain side effects are made in the compiler's Lisp world, such as defining macros, and so on.

What if a file contains this, among other things, at top level:

(eval-when (:compile-file :load-file)

(setq *foo* 3))

A second file, compiled later on in the build, then has a macro whose expansion depends on the value of *foo*.

What is supposed to happen? STIPULATE that the correct behavior is that the compilation of the second file expands the macro based on *foo* having a value of 3, then the compile-time side-effect of having set *foo* to 3 in the first file is affecting the compilation of the second file.

If we want to use the "cfasl" optimization, then, the setting of *foo* to 3 would have to be expressed in the cfasl file. In that case it DOES have to do with capturing compile-time side effects of compiling a file. Do you agree with that, given the stipulation?

If not, let's work that out. If so, is the stipulation true? If the stipulation is false, then taking the two files and copying the contents of the second one onto the end of the first one would change the semantics of the program. I'm not saying that's wrong, necessarily, but if it's true, we certainly ought to be aware of it.

comment:10 Changed 8 years ago by fare

Comment 13 Gail Zacharias 2009-05-27 15:03:17 EDT

If we want to use the "cfasl" optimization, then, the setting of *foo* to 3 would have to be expressed in the cfasl file. In that case it DOES have to do with capturing compile-time side effects of compiling a file. Do you agree with that, given the stipulation?

No. If the cfasl is merely an optimization, then the question is, what is it an optimization of? I.e. what is the operation xcvb will perform if the cfasl is not there? What you're saying, and what I originally thought it was supposed to be, is that loading "foo.cfasl" is an optimization of (compile-file "foo.lisp"). But actually, faré said that if there is no cfasl, then xcvb will just load the fasl. I.e. loading "foo.cfasl" is an optimization of (load-file "foo.fasl"), and any side effects of (compile-file "foo.lisp") are irrelevant.

comment:11 Changed 8 years ago by fare

Dan Weinreb 2009-05-28 16:24:38 EDT Yes, my understanding was that loading "foo.cfasl" is an optimization of (compile-file "foo.lisp"). Fare?

comment:12 Changed 8 years ago by fare

Comment 15 Francois-Rene Rideau 2009-06-01 14:58:04 EDT jknight initially thought of the cfasl as an optimization of (compile-file "foo") that doesn't otherwise actually create a fasl, but only the effects of compiling that are meaningful for further compilations.

Indeed, proclamations (special variables, types, optimization), macros, functions and variables used at macro-expansion time (including e.g. all the user-defined state accumulated while compiling db-class'es) are such effects.

Because XCVB *also* has to be backwards compatible ("if it's not backwards, it's not compatible"), it has to support systems that do not offer CFASLs, even though ideally, every implementation would support CFASL the same way. And so, what XCVB does when a CFASL is not available is the only thing it can do: load the FASL, and hope that works -- requiring the user to always make any non-local :COMPILE-TOPLEVEL effect also :LOAD-TOPLEVEL.

*If* it's true that in CCL, some :COMPILE-TOPLEVEL effects are dropped from the Lisp image at the end of the COMPILE-FILE if not also EVAL-WHEN'ed :LOAD-TOPLEVEL, then these effects should not be present in the CFASL. On the other hand, if these effects remain present in a Lisp image at the end of a :COMPILE-TOPLEVEL then we probably want them in the CFASL, too.

In any case, gzacharias's proposal to only include in a CFASL effects that are BOTH :compile-toplevel and :load-toplevel, while slightly different from what I originally envisioned, would probably suit the purpose of XCVB.

comment:13 Changed 8 years ago by fare

Comment 16 James Knight 2009-06-10 12:48:55 EDT I'm a little confused, if I make two files:

test1.lisp: (eval-when (:compile-toplevel :load-toplevel :execute)

(defmacro foo (x)

`(1+ ,x)))

test2.lisp: (defun bar ()

(foo 1))

And run in CCL: (compile-file "test1.lisp") (compile-file "test2.lisp") (load "test2.lx64fsl") (bar)

It prints 2. (note that I never loaded test1.lisp)

The same thing happens if I change the eval-when to be :compile-toplevel only. So I would expect the CFASL from test1.lisp to include the macro definition in both cases.

comment:14 Changed 8 years ago by fare

Comment 17 Gail Zacharias 2009-06-10 14:13:41 EDT You are assuming that (LOAD "test1.cfasl") is an optimization of (COMPILE-FILE "test1.lisp"). It is not, it is an optimization of (LOAD "test1.lx64fsl"), in the sense that if there is no CFASL, then xcvb will (LOAD "test1.lx64fsl"). So if your system had a situation where test2 really depended on test1 being compiled in the same image, you'd be out of luck. In order for your system to build on a platform without cfasls, you need to include :load-toplevel in test1.lisp.

comment:15 Changed 8 years ago by fare

Comment 18 James Knight 2009-06-10 14:34:24 EDT Yes, loading "test1.cfasl" is supposed to be an optimization of compile-file "test1.lisp".

XCVB doesn't do that right now, I guess because compile-file is too slow, and in "real systems" it's theorized that nobody would actually use :compile-toplevel by itself. So it almost always works to load "test1.fasl" instead, despite it being not exactly right.

comment:16 Changed 8 years ago by fare

Comment 19 Gail Zacharias 2009-06-10 16:53:37 EDT Well, I certainly can't contradict any assertions of what xcvb designer(s) might consider theoretically "right". Nonetheless I'm going to take a gamble and stick with basing ccl's approach on what xcvb actually does, rather than what it would like to do, if only compiling files wasn't slow.

Fortunately, I am pretty sure none of this matters to QRes, so it's ok for sbcl and ccl to take a different approach, although it's kind of interesting that it's going to be the opposite of the usual difference -- this time ccl is going to be stricter and not support system designs that aren't portable (to cfasl-less implementations), whereas sbcl will. But since QRes is going to limit itself to the stricter definition anyway, it's not a decision that ITA has to worry about.

comment:17 Changed 8 years ago by fare

  • Component changed from IDE to Compiler

comment:18 Changed 7 years ago by gz

  • Keywords ITA added
Note: See TracTickets for help on using tickets.