Ignore:
Timestamp:
Dec 13, 2012, 6:27:55 AM (7 years ago)
Author:
gb
Message:

Support using the "coding" option in a file's file options line (a
line at the start of a text file that contains name:value pairs
separated by semicolons bracketed by -*- sequences) to determine a
file's character encoding. Specifically:

  • OPEN now allows an external-format of :INFERRED; previously, this was shorthand for an external-format whose line-termination was inferred and whose character encoding was based on *DEFAULT-FILE-CHARACTER-ENCODING*. When an input file whose external-format is specified as :INFERRED is opened, its file options are parsed and the value of the "coding" option is used if such an option is found (and if the value is something that CCL supports.) If a supported "coding" option isn't found, *DEFAULT-FILE-CHARACTER-ENCODING* is used as before.
  • In the Cocoa IDE, the Hemlock command "Ensure File Options Line" (bound to Control-Meta-M by default) ensures that the first line in the current buffer is a file options line and fills in some plausible values for the "Mode", "Package", and "Coding" options. The "Process File Options" command (Control-Meta-m) can be used to process the file options line after it's been edited. (The file options line is always processed when the file is first opened; changes to the "coding" option affect how the file will be saved.)

When a Lisp source file is opened in the IDE editor, the following
character encodings are tried in this order until one of them
succeeds:

  • if the "Open ..." panel was used to open the file and an encoding other than "Automatic" - which is now the default - is selected, that encoding is tried.
  • if a "coding" option is found, that encoding is tried.
  • the value of *DEFAULT-FILE-CHARACTER-ENCODING* is tried.
  • iso-8859-1 is tried. All files can be decoded in iso-8859-1.

This is all supposed to be what Emacs does and I think that it's
pretty close in practice.

A file that caused problems for Paul Krueger a few days ago
because its encoding (ISO-8859-1) wasn't guessed correctly
now has an explicit "coding" option and serves as a test case.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/source/level-1/l1-sysio.lisp

    r15270 r15536  
    3939  (setf (file-ioblock-octet-pos file-ioblock)
    4040        (file-octet-filepos file-ioblock)))
     41
     42(defun infer-character-encoding (file-ioblock)
     43  (let* ((inbuf (file-ioblock-inbuf file-ioblock))
     44         (bufptr (io-buffer-bufptr inbuf))
     45         (n (io-buffer-count inbuf)))
     46    (when (zerop n)
     47      (setq n (or (fd-stream-advance (file-ioblock-stream file-ioblock)
     48                                     file-ioblock
     49                                     t)
     50                  0)))
     51    (setf (stream-external-format (file-ioblock-stream file-ioblock))
     52          (or (external-format-from-octet-buffer bufptr n)
     53              (normalize-external-format :file (file-ioblock-line-termination file-ioblock))))))
     54   
    4155
    4256(defun infer-line-termination (file-ioblock)
     
    124138    (setq character-encoding
    125139          (default-character-encoding domain)))
    126   (unless (lookup-character-encoding character-encoding)
     140  (unless (or (lookup-character-encoding character-encoding)
     141              (and (eq domain :file) (eq character-encoding :inferred)))
    127142    (error "~S is not the name of a known character encoding."
    128143           character-encoding))
     
    143158        ((typep external-format 'external-format)
    144159         external-format)
     160        ((eq external-format :inferred)
     161         (make-external-format :domain domain :character-encoding :inferred :line-termination :inferred))
    145162        ((eq external-format :default)
    146163         (normalize-external-format domain *default-external-format*))
     
    872889                  (setf (file-ioblock-fileeof ioblock)
    873890                        (ioblock-octets-to-elements ioblock (fd-size fd)))
     891                  (when (and in-p (eq encoding :inferred))
     892                    (infer-character-encoding ioblock))
    874893                  (when (and in-p (eq line-termination :inferred))
    875894                    (infer-line-termination ioblock))
Note: See TracChangeset for help on using the changeset viewer.