Dec 13, 2012, 6:27:55 AM (7 years ago)

Support using the "coding" option in a file's file options line (a
line at the start of a text file that contains name:value pairs
separated by semicolons bracketed by -*- sequences) to determine a
file's character encoding. Specifically:

  • OPEN now allows an external-format of :INFERRED; previously, this was shorthand for an external-format whose line-termination was inferred and whose character encoding was based on *DEFAULT-FILE-CHARACTER-ENCODING*. When an input file whose external-format is specified as :INFERRED is opened, its file options are parsed and the value of the "coding" option is used if such an option is found (and if the value is something that CCL supports.) If a supported "coding" option isn't found, *DEFAULT-FILE-CHARACTER-ENCODING* is used as before.
  • In the Cocoa IDE, the Hemlock command "Ensure File Options Line" (bound to Control-Meta-M by default) ensures that the first line in the current buffer is a file options line and fills in some plausible values for the "Mode", "Package", and "Coding" options. The "Process File Options" command (Control-Meta-m) can be used to process the file options line after it's been edited. (The file options line is always processed when the file is first opened; changes to the "coding" option affect how the file will be saved.)

When a Lisp source file is opened in the IDE editor, the following
character encodings are tried in this order until one of them

  • if the "Open ..." panel was used to open the file and an encoding other than "Automatic" - which is now the default - is selected, that encoding is tried.
  • if a "coding" option is found, that encoding is tried.
  • the value of *DEFAULT-FILE-CHARACTER-ENCODING* is tried.
  • iso-8859-1 is tried. All files can be decoded in iso-8859-1.

This is all supposed to be what Emacs does and I think that it's
pretty close in practice.

A file that caused problems for Paul Krueger a few days ago
because its encoding (ISO-8859-1) wasn't guessed correctly
now has an explicit "coding" option and serves as a test case.

1 edited


  • trunk/source/level-1/l1-streams.lisp

    r15292 r15536  
    27522752    (setf (ioblock-unread-char-function ioblock) (select-stream-untyi-function (ioblock-stream ioblock) :input))
    27532753    (setf (ioblock-decode-literal-code-unit-limit ioblock)
    2754           (if encoding
     2754          (if (and encoding (not (eq encoding :inferred)))
    27552755            (character-encoding-decode-literal-code-unit-limit encoding)
    27562756            256))   
    2757     (if encoding
     2757    (if (and encoding (not (eq encoding :inferred)))
    27582758      (let* ((unit-size (character-encoding-code-unit-size encoding)))
    27592759        (setf (ioblock-peek-char-function ioblock) '%encoded-ioblock-peek-char)
    31623162  (setq line-termination (cdr (assoc line-termination *canonical-line-termination-conventions*)))
    31633163  (when encoding
    3164     (unless (typep encoding 'character-encoding)
    3165       (setq encoding (get-character-encoding encoding)))
    3166     (if (eq encoding (get-character-encoding nil))
    3167       (setq encoding nil)))
     3164    (cond ((and (eq encoding :inferred)
     3165                (typep stream 'file-stream)
     3166                insize))
     3167          (t
     3168           (unless (typep encoding 'character-encoding)
     3169             (setq encoding (get-character-encoding encoding)))
     3170           (if (eq encoding (get-character-encoding nil))
     3171             (setq encoding nil)))))
    31683172  (when sharing
    31693173    (unless (or (eq sharing :private)
    32613265      (setf (ioblock-interactive ioblock) interactive))
    32623266    (setf (stream-ioblock stream) ioblock)
    3263     (when encoding
     3267    (when (and encoding (not (eq encoding :inferred)))
    32643268      (setf (ioblock-native-byte-order ioblock)
    32653269            (character-encoding-native-endianness encoding)))
    3266     (let* ((bom-info (and insize encoding (character-encoding-use-byte-order-mark encoding))))
     3270    (let* ((bom-info (and insize encoding (not (eq encoding :inferred)) (character-encoding-use-byte-order-mark encoding))))
    32673271      (when bom-info
    32683272        (ioblock-check-input-bom ioblock bom-info sharing)))
Note: See TracChangeset for help on using the changeset viewer.