Ticket #985 (closed defect: fixed)

Opened 22 months ago

Last modified 22 months ago

Symbols printed in readtable-case :preserve are never escaped and thus some symbols are printed in a form which cannot be read back in.

Reported by: adapterik Owned by:
Priority: normal Milestone:
Component: ANSI CL Compliance Version: trunk
Keywords: Cc:

Description

Problem: CCL should print symbols with escaping (single escape slash or multiple escape pipe) when *print-escape* is T in if it needs to do so to ensure that symbol names can be read back in again and result in the same symbol. Currently it does not do this for readtable-case :preserve or :invert.

Solution: Fix level-1/l1-io.lisp so that in print-escape mode in, for all readtable-case modes, escaping is carried out for symbol name characters which might not be read back in correctly otherwise. This includes the "invalid", "asterisked", and "package marker" characters on section 2.1.4.2 Constituent Traits.

The following lisp forms point out the problem:

(readtable-case *readtable*)
(setf a (read-from-string "|HiThere|"))
(setf b (read-from-string "|:|"))
(eq a (read-from-string (prin1-to-string a)))
(eq b (read-from-string (prin1-to-string b)))
(setf (readtable-case *readtable*) :preserve)
(EQ A (READ-FROM-STRING (PRIN1-TO-STRING A)))
(EQ B (READ-FROM-STRING (PRIN1-TO-STRING B)))

First we read in some symbols using the multiple escape delimiter |. We confirm that they can can be printed in an print-escape environment (prin1) and read back in, and are eq. In this case, we should get back the same symbol, thus the eq test.

If we try this on the current 1.9 trunk (version shown below), we receive an error on the final line, as shown below:

[epearson@archie ccl]$ ccl64
Welcome to Clozure Common Lisp Version 1.9-dev-r15376M-trunk  (LinuxX8664)!
? (readtable-case *readtable*)
:UPCASE
? (setf a (read-from-string "|HiThere|"))
|HiThere|
? (setf b (read-from-string "|:|"))
\:
? (read-from-string (prin1-to-string a))
|HiThere|
9
? (read-from-string (prin1-to-string b))
\:
2
? (setf (readtable-case *readtable*) :preserve)
:PRESERVE
? (READ-FROM-STRING (PRIN1-TO-STRING A))
HiThere
7
? (READ-FROM-STRING (PRIN1-TO-STRING B))
> Error: Unexpected end of file on #<STRING-INPUT-STREAM  #x3020006EFDDD>, near position 1, within ":"
> While executing: CCL::READ-CHAR-INTERNAL, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 >

If we look at the printed form of B, we can see why:

? (PRIN1-TO-STRING B)
":"

The single colon is printed. The lisp reader will not like a single colon -- it is not a valid form. PRIN1-TO-STRING should rather print "
:" or "|:|". Indeed after fixing the problem (more below), the value printed is:

? (PRIN1-TO-STRING B)
"\\:"

The flaw that is exposed is that the prin1 function is not satisfying the condition that in escape mode it must print symbols so that they can be read back in the same readtable and become the same symbol. Although it is safe for certain symbols (such as alphabetical) to be printed without escaping in :preserve readtable-case (because lower and upper case will be kept even without escaping), it is not so for special characters like colon. Such characters must always be escaped, because otherwise mayhem ensues.

Although such symbols are not very common, they are valid. One less contrived example is a symbol with spaces:

? (setf x (read-from-string "|Hi there erik|"))
|Hi there erik|
? (prin1-to-string x)
"|Hi there erik|"
? (setf (readtable-case *readtable*) :preserve)
:PRESERVE
? (PRIN1-TO-STRING X)
"Hi there erik"
? (READ-FROM-STRING (PRIN1-TO-STRING X))
Hi
3
? 

The problem exists in level-l/l1-io.lisp in function write-pname. First, at line 1060 there is a form meant to count the number of "slashes" (single escape character) in the symbol name and possibly switch to using the multiple escape character (well, the spec says you can do it!). This chunk of code specifically excludes handling the case of a readtable-case of :invert or :preserve. Although there is a comment noting that "these nevery slashify alpha-p", the test excludes all cases of symbol name characters not just alphabetical.

The fix is to ensure that the if the readtable-case is :invert or :preserve, the "slashify" is excluded for alphabetical characters, but applied otherwise. This can be accomplished by passing the readcase variable to slashify?, and handling the readtable-case condition there. This is possible because the slashify? function already contains a test both for whether a character is alphabetic and whether it is one of the "safe" non-alphabetic characters (i.e. not normally syntatically relevant).

FWIW this came up in the context of trying to get Shen working in CCL. There were a few minor problems with undeclared globals, but the big issue was figuring this out. Shen has at least one point where there is a comparison to the colon as a symbol. The shen compiler (goes from shen kl -> fixed kl -> common lisp -> compiled cl) attempts to make the colon character safe by wrapping it in pipes, |:|, when it creates the fixed kl file. Shen puts the reader into :preserve case. The Shen kl code is not common lisp but it is s-expressions so shen uses the lisp reader to read in the fixed kl file (that is why it is fixed first, to make it safe for the reader), processes the kl to translate to common lisp, and then writes out common lisp.

Change History

comment:1 Changed 22 months ago by gb

  • Status changed from new to closed
  • Resolution set to fixed

(In [15416]) Fix some cases of improper/insufficient escaping when printing symbols. Fixes ticket:985 in the trunk.

Note: See TracTickets for help on using tickets.