Opened 8 years ago

Last modified 8 years ago

#833 reopened defect

newline handled incorrectly by load and compile-file

Reported by: mhd Owned by:
Priority: normal Milestone: Future Clozure CL
Component: ANSI CL Compliance Version: trunk
Keywords: Cc:

Description

When a line ending appears in a string to be read via LOAD or COMPILE-FILE, the resulting string object in Lisp should represent that line ending with the character #\Newline. Clozure Lisp (CCL) fails to do this on Windows. This is a bug. A very clear description of the requirements and associated problems is in CLtL2, section 2.2.2. "Line Divisions", online here:

http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node23.html

To test if your Lisp has the bug, write the following code into a Lisp file in a plain text editor (e.g., Notepad) on Windows and then LOAD the resulting file.

(format t "~:c" (char " " 0)) ; Should print Newline

If should print Newline. Otherwise, it's a bug (in which case, you're probably going to see it print Return). On CCL it prints Return. (Same with SBCL, by the way. CLisp, ACL, and Lispworks print Newline, which is correct.)

I see two ways to approach this problem.

(1) Interpret any of the following sequences as Newline: CR, LF, or CRLF (and for about the same money, throw in LFCR). This might be called the the liberal approach. This is the approach recommended in CLtL2 (see ref. above). It's also consistent with the Robustness Principle (http://en.wikipedia.org/wiki/Robustness_principle), which is widely believed in the IETF community to apply to the handling of the mess that is line-ending conventions in the modern computer world.

(2) Extend your implementation's definition of EXTERNAL-FORMAT so that it allows different line-ending conventions, and create such a format appropriate for each platform (e.g., CRLF convention on Windows), and for sure have that format be the default on Windows.

I think both approaches are fine. And they're not in conflict with eachother. External formats, when used for input, should ideally support the approach in (1), that is, a line-ending convention that's "any". (It looks like Franz ACL supports this with :E-CRLF external format.)

But you could do either just (1) or (2) as ways to solve the bug with LOAD and COMPILE-FILE with respect to strings in Lisp plain-text source files. Doing (2) lets users get access to it in places besides LOAD and COMPILE-FILE, i.e., anywhere :external-format args are supported.

(Note: the fact that CCL treats tilde-return (~CR) in format strings the same as tilde-newline helps to take the edge off this bug, but is not a fix.)

Change History (4)

comment:1 Changed 8 years ago by gb

  • Resolution set to invalid
  • Status changed from new to closed

(2) above is the obvious solution, which is why CCL (and doubtless most other lisps) implement it. See sections 4.5.2 and 4.5.3 of the CCL manual.

I don't see a bug here. You can make the argument that Notepad-compatibility is so important that CCL should default to using CRLF termination for files on Windows. When I thought about it I came to the opposite conclusion, but both sides of that argument seem to have merit. It'd be pretty clearly wrong to not recognize LF as the universal/standard line terminator on a Unixish system, but the situation with Windows and CRLF isn't nearly that clear-cut.

Most Windows programs seem to deal perfectly well with LF-terminated text files, and interoperability with the real world is often more important than DOS (or is it VMS?) compatibility. (As of a few years ago, Notepad was an exception; it now seems to treat CR-less LF's as horizontal whitespace. I think that it used to just crash.)

The documentation (section 4.5.3) is a bit out of date in that it doesn't mention that the :inferred line termination keyword will recognize CRLF on input files (as well as CR an LF); that can be handy if the file's line termination conventions are unknown. That's also heuristic, and in edge cases can guess wrong. I don't think that there's any way in which LOAD and OPEN and COMPILE-FILE can really guess a file's external-format with 100% reliability, and that's why these functions take an :EXTERNAL-FORMAT argument (and have for ~20 years).

comment:2 Changed 8 years ago by mhd

  • Resolution invalid deleted
  • Status changed from closed to reopened

What does this have to do with "Notepad compatibility". Emacs writes text files (including Lisp files) with CRLF line endings. That's how I produced the test file. FTP ASCII text file transfers (from unix systems, e.g.) will produce CRLF line endings. It's just the standard line ending on Windows.

I don't think Notepad crashing or not has anything to do with this.

LOAD and COMPILE-FILE should use the external format appropriate to Windows on Windows, i.e., either accepting CRLF or doing :inferred. There's no requirement for 100% reliability. It's just required to accept files formatted correctly for the given OS. Specifying external format is supposed to be for the nonstandard cases.

Many Windows programs these days deal with LF terminated lines, but all are required to deal with CRLF-terminated lines, and most do. The reason most Windows programs deal perfectly well with LF-terminated text files is that they do something along the lines I'm suggesting: they use :inferred or "any" for input. They handle CRLF same as LF terminated. CCL should do this, and then it will be a Windows program that deals with LF-termninated lines in addition to CRLF-terminated.

comment:3 follow-up: Changed 8 years ago by rme

Does it work for you to set ccl:*default-external-format* to :inferred in your init file?

comment:4 in reply to: ↑ 3 Changed 8 years ago by mhd

Replying to rme:

Does it work for you to set ccl:*default-external-format* to :inferred in your init file?

Yes, having ccl:*default-external-format* bound to :inferred around the call to LOAD works in my version of CCL on Windows. But I don't think this fully resolves the bug. If this were the default value (at least on Windows), that would "work" in the sense of being one way of resolving this bug. Basically, one should not have to resort to a CCL extension to read in a source file in the absolute most standard format for this platform.

Note: See TracTickets for help on using tickets.