Getting to Know (and Love) the OpenMCL FFI

While many of us, once we learn Lisp, might wish to write every line of code after in Lisp, there are far too many existing libraries and tools written in other languages that we might want to interact with. Besides, who wants to have to port, debug, and then maintain reams of existing boilerplate operating system code?

So, what can we do? We must use a Foreign Function call Interface (FFI), which allows us to easily interact with external libraries and operating system features. And luckily, OpenMCL comes with one of the best implementations around.

If you cast about for FFI's, you might run across several excellent tools such as UFFI, CLISP's FFI, and others. However, most of the require lots of manual typing and boilerplate to interface Lisp and external libraries.

For example

;;; We first define the foreign function (manually specifying the types and return values):
(foreign-function glvertex4i (INT INT INT INT) VOID "glVertex4i")

;;; Now we can call the function:
(glvertex4i 0.1 0.2 0.3 0.4)

OpenMCL uses much slicker automated system that parses the C header files and generates an interface database that automates nearly all of this tedious (and error prone) manual typing.

;;; The interface database already contains the definition, so we can just do this:
(#_glvertex4i 0.1 0.2 0.3 0.4)

Simple Cases

(Note FootNote(This section shamelessly stolen from [])?) The primary way of referring to foreign constant and function names in OpenMCL is via the #$ and #_ reader macros. These reader macro functions each read a symbol into the "OS" package, look up its constant or function definition in the interface database, and assign the value of the constant to the symbol or install a macroexpansion function on the symbol.

In order to observe case-sensitivity, the reader-macros now read the symbol with (READTABLE-CASE :PRESERVE) in effect.

This means that it's necessary to type the foreign constant or function name in correct case, but it isn't necessary to use any special escaping constructs when writing the variable name. For instance:

     (#_read fd buf n) ; refers to foreign symbol "read"
     (#_READ fd buf n) ; refers to foreign symbol "READ", which may
     ; not exist ...
     #$o_rdonly ; Probably doesn't exist
     #$O_RDONLY ; Exists on most platforms

Foreign type, record, and field names

(Aside FootNote(Ditto)?) Constructs like RLET expect a foreign type or record name to be denoted by a symbol (typically a keyword); RREF (and PREF) expect an "accessor" form, typically a keyword formed by concatenating a foreign type or record name with a sequence of one or more foreign field names, separated by dots. These names are interned by the reader as other lisp symbols are, with an arbitrary value of READTABLE-CASE in effect (typically :UPCASE.) It seems like it would be very tedious to force users to manually escape (via vertical bar or backslash syntax) all lowercase characters in symbols used to specify foreign type, record, and field names (especially given that many traditional POSIX structure, type, and field names are entirely lowercase.)

The approach taken by OpenMCL is to allow the symbols (keywords) used to denote foreign type, record, and field names to contain angle brackets (< and >). Such symbols are translated to foreign names via the following set of conventions:

  • All instances of < and > in the symbol's pname are balanced and don't nest.
  • Any alphabetic characters in the symbol's pname that aren't enclosed in angle brackets are treated as lower-case, regardless of the value of READTABLE-CASE and regardless of the case in which they were written.
  • Alphabetic characters that appear within angle brackets are mapped to upper-case, again regardless of how they were written or interned.

There may be many ways of "escaping" (with angle brackets) sequences of upper-case and non-lower-case characters in a symbol used to denote a foreign name. When translating in the other direction, OpenMCL always escapes the longest sequence that starts with an upper-case character and doesn't contain a lower-case character.

It's often preferable to use this canonical form of a foreign type name.

The accessor forms used by PREF/RREF should be viewed as a series of foreign type/record and field names; upper-case sequences in the component names should be escaped with angle brackets, but those sequences shouldn't span components. (More simply, the separating dots shouldn't be enclosed, even if both surrounding characters need to be.)

Older POSIX code tends to use lower-case exclusively for type, record, and field names; there are only a few cases in the OpenMCL sources where mixed-case names need to be escaped.


     ;;; Allocate a record of type "window".
     (rlet ((w :window)) ...)
     ;;; Allocate a record of type "Window", which is probably a
     ;;;  different type
     (rlet ((w :<w>indow)) ...)
     ;;; This is equivalent to the last example
     (rlet ((w :<w>INDOW)))

Example: Calling C Library Functions

Suppose that we wanted to call the C library function open (which we now know should be spelled #_open), which will return a file descriptor (small non-negative integer) or -1 (error indicator) given a file namestring and an integer which specifies some mode flags.

If we did:

? (#_open "/etc/passwd" #$O_RDONLY) ; open an existing file
                                    ; for reading

we'd get an error (a complaint that the string "/etc/passwd" isn't a MACPTR).

In C, a string (for all intents and purposes; it could be argued that C doesn't have a "string" data type, just a few sets of conventions) is just the address of a sequence of 8-bit bytes with a #\nul byte at the end.

In Lisp in general (and in OpenMCL in particular), a string is ... something else entirely. In particular, it has some bits (somewhere) that say that it's a string, and it probably has some bits (somewhere) that say how many elements it contains, as well as some bits (or the absence of some bits) that say whether or not it's a SIMPLE-STRING, and whether or not it has a fill pointer and other things ... To the extent that it has an "address", that address is generally determined by the GC and can change at any instruction boundary. (There are exceptions to this general rule.)


What we sort of want to do in order to be able to call #_open is to:

a) allocate a block of non-relocatable memory somewhere. Since we'll only need that block of memory until we're done calling

#_open, a stack would be a good place to allocate it. The block of memory needs to be of length N+1, where N is the length of the lisp string.

b) copy the character codes of the characters in the lisp string to successive 8-bit bytes in the memory block.

c) slap a trailing #\Nul byte on the end of the memory block

d) pass the address of that memory block as the first argument to #_open.

This is a common enough idiom that there's a macro that does most of it for is (WITH-CSTRS). If we used WITH-CSTRS when attempting to call #_open, as in:

(with-cstrs ((c-name "/etc/passwd"))
  (#_open c-name #$O_RDONLY))

the FFI shouldn't complain and we should get a non-negative file descriptor back from the call to #_open.

Example: OpenGL

For a less contrived example: While the best lisp interface to #_glMaterialfv might not involve passing a lisp array around (and there might be some question as to whether an array of element-type SINGLE-FLOAT should be used or perhaps something more general). Assuming that we decided to pass a lisp SINGLE-FLOAT vector around, we'd wind up with a similar set of issues as existed in the string case (above):

In C, an array of SINGLE-FLOATs is just the address of a block of memory that contains some bits that can be interpreted as SINGLE-FLOATs (or what C would probably just call a "float").

In Lisp (especially in OpenMCL), a (VECTOR SINGLE-FLOAT) is something else entirely; as in the case of a Lisp STRING, it has a lot of auxiliary information about it encoded in bits that are part of the object, and (as in the string case) we can't meaningfully talk about the "address" of a (VECTOR SINGLE-FLOAT) in general without running into GC issues.

We'd have to go through essentially the same steps (stack-allocate some foreign memory, copy the values in the lisp SINGLE-FLOAT array into the foreign memory block, then pass the address of that block to #_glMaterialfv). Doing that would be simpler and less error-prone if there was a WITH-FOREIGN-SINGLE-FLOAT-VECTOR (analogous to WITH-CSTRs), but ... there isn't. If we were going to do a lot of this and had thought about if or how a lisp vector was the best way to represent the material matrix and had thought about whether a specialize vector type should be used or whether we wanted to be more general about it, we might find it useful to write WITH-FOREIGN-SINGLE-FLOAT-VECTOR. (We'd also find it easier to write if OpenMCL offered some syntactic sugar for dealing with foreign arrays; without that, the code looks uglier and is probably more error-prone as well.)

A less-than-general approach that'd do the right thing in this particular case would be something like:

(rlet ((c-float-array (:array :float 4)))
  ;; We happen to know that a single-float value is 32-bits/4 bytes
  ;; wide, so we can multiply each array index by 4 to get a byte
  ;; index.  The FFI could do this for us.
 (dotimes (i 4)
   (setf (%get-single-float c-float-array (* i 4))
         (aref lisp-float-array i)))
 (#_glMaterialfv #$GL_FRONT #$_GL_AMBIENT c-float-array))


Last modified 12 years ago Last modified on Feb 24, 2009, 3:52:06 PM