Ticket #632 (new enhancement)
period (.) is being escaped by merge-pathnames
|Reported by:||matt.kaufmann||Owned by:|
|Priority:||major||Milestone:||Clozure CL 1.10|
Merge-pathnames can put an extra backslash (\) in the result of merge-pathnames, as follows (similar results in 32-bit linux, 64-bit linux, and on an Intel Mac):
Welcome to Clozure Common Lisp Version 1.5-dev-r13176M-trunk (LinuxX8632)! ? (MERGE-PATHNAMES "foo.xxx.lx86cl64" "/u/kaufmann/temp/") #P"/u/kaufmann/temp/foo
This is inconvenient for me (and hence, currently at least, for users of a system that I work on, ACL2). I didn't see the extra backslash in tests with SBCL, Allegro CL, CLISP, GCL, CMUCL, or Lispworks. Gary Byers points out that this is actually allowed by the spec (see below), but that he'd be interested in seeing whether anyone likes this escaping behavior and can think of arguments in its favor.
Quoting Gary, with his permission, from an email he sent me 11/25/09:
Section 19.1.1 says that pretty much anything that an implementation does with respect to namestrings that aren't logical pathname namestrings is conformant, and that portable programs can't assume much of anything about the syntax of such namestrings (or about how they're translated to PATHNAME objects and back again.)
When translating a namestring into a pathname, most implementations probably follow a convention where a dot character in a namestring separates the substring used to provide the pathname's PATHNAME-NAME from the substring used to provide its PATHNAME-TYPE. When there's more than one dot involved or when dots appear at the beginning or end of the namestring, things are less clear: does ".emacs" describe a pathname whose name is NIL and type is "emacs", a pathname whose name is ".emacs" and whose type is NIL, or something else ? Which parts of "a.b.c" are parsed as a pathname name and which parts are parsed as a pathname type ?
When generating a NAMESTRING from a pathname (as happens when printing a pathname), CCL tries to avoid some potential ambiguity by quoting/escaping characters (dots, usually) that're constituents of some component and which might otherwised be used to separate components; this quoting/escaping involves prefixing the character with a backslash on Unix platforms, so:
.b.c" has name "a.b", type "c" "a.b
.c" has name "a", type "b.c"
a) the rules that CCL uses for introducing/removing escaping can still
lead to ambiguity and prevent round-trip translations
b) people don't expect to see it and report the escaping as a bug.
(b) isn't valid, but it still takes time to write messages like this to explain that this is just "implementation-dependent behavior". It'd be easier to justify that particular behavior if it led to less ambiguity and more reliable round-tripping, but it doesn't.