Ticket #632 (new enhancement)

Opened 5 years ago

Last modified 16 months ago

period (.) is being escaped by merge-pathnames

Reported by: matt.kaufmann Owned by:
Priority: major Milestone: Clozure CL 1.10
Component: other Version: trunk
Keywords: namestring Cc: kaufmann@…

Description

Merge-pathnames can put an extra backslash (\) in the result of merge-pathnames, as follows (similar results in 32-bit linux, 64-bit linux, and on an Intel Mac):

Welcome to Clozure Common Lisp Version 1.5-dev-r13176M-trunk (LinuxX8632)! ? (MERGE-PATHNAMES "foo.xxx.lx86cl64" "/u/kaufmann/temp/") #P"/u/kaufmann/temp/foo
.xxx.lx86cl64" ?

This is inconvenient for me (and hence, currently at least, for users of a system that I work on, ACL2). I didn't see the extra backslash in tests with SBCL, Allegro CL, CLISP, GCL, CMUCL, or Lispworks. Gary Byers points out that this is actually allowed by the spec (see below), but that he'd be interested in seeing whether anyone likes this escaping behavior and can think of arguments in its favor.

Quoting Gary, with his permission, from an email he sent me 11/25/09:

.....

Section 19.1.1 says that pretty much anything that an implementation does with respect to namestrings that aren't logical pathname namestrings is conformant, and that portable programs can't assume much of anything about the syntax of such namestrings (or about how they're translated to PATHNAME objects and back again.)

When translating a namestring into a pathname, most implementations probably follow a convention where a dot character in a namestring separates the substring used to provide the pathname's PATHNAME-NAME from the substring used to provide its PATHNAME-TYPE. When there's more than one dot involved or when dots appear at the beginning or end of the namestring, things are less clear: does ".emacs" describe a pathname whose name is NIL and type is "emacs", a pathname whose name is ".emacs" and whose type is NIL, or something else ? Which parts of "a.b.c" are parsed as a pathname name and which parts are parsed as a pathname type ?

When generating a NAMESTRING from a pathname (as happens when printing a pathname), CCL tries to avoid some potential ambiguity by quoting/escaping characters (dots, usually) that're constituents of some component and which might otherwised be used to separate components; this quoting/escaping involves prefixing the character with a backslash on Unix platforms, so:

"a
.b.c" has name "a.b", type "c" "a.b
.c" has name "a", type "b.c"

Unfortunately:

a) the rules that CCL uses for introducing/removing escaping can still

lead to ambiguity and prevent round-trip translations

b) people don't expect to see it and report the escaping as a bug.

(b) isn't valid, but it still takes time to write messages like this to explain that this is just "implementation-dependent behavior". It'd be easier to justify that particular behavior if it led to less ambiguity and more reliable round-tripping, but it doesn't.

Change History

comment:1 Changed 5 years ago by rme

The second comment to ticket:452 also comments on the namestring escaping behavior.

comment:2 Changed 20 months ago by avodonosov

I support _not_ escaping the period.

The escaping poses interoperability problems if I create file name in CCL and pass it to external program.

Unix commands, like ls "my\.file\.name.asd", cat 'my\.file\.name.txt' do not understand the backlash - they can't find the file actually named my.file.name.asd. Even if CLHS allows anything to be printed representation of pathname, IMHO it is desirable that printed representation was a valid file name on the given OS.

Other lisp implementations also do not understand such file names passed from CCL. I have a program with passes file names from CCL to other lisps. Also see quicklisp issue caused by the escaping:  https://github.com/quicklisp/quicklisp-client/issues/71.

Another argument, is that software which relies on file types, e.g. Windows, handles files with multiple periods in the file name by looking at the last period. I.e. my.file.name.txt on Windows is a .txt file. Unix software, e.g Apache web server use the same approach. I would argue that resolving the ambiguity by considering the last period as the separator as an established practice. I suggest that CCL adopted this approach.

This approach implies that file types can not have a period.

One may consider an approach of not escaping periods in the file name, but escaping periods in file type. I don't mind if periods in file type are escaped, or not escaped - it is unlikely I will ever see such file. IMHO it doesn't worth spending mental energy to implement periods escaping in file types.

(Even if periods in file types are escaped, we can't be sure backlash is the best choice. Only if we will see some existing OS which really has concept of file types, and the file types are represented by a period-separated extension of the file name, and the OS allow periods in the file types and defines some escaping rule. Then we can follow this escaping rule in CCL and justify it because it is useful when interacting with this existing OS.)

comment:3 Changed 20 months ago by rme

When you want a namestring that you can pass to a foreign function or external program, the thing to use is ccl:native-translated-namestring.

(I'm not arguing for or against the current behavior of escaping dots; I'm just pointing out that there's a way to deal with the problem.)

comment:4 Changed 20 months ago by avodonosov

Thanks!

comment:5 Changed 16 months ago by max

  • Keywords namestring added; merge-pathnames removed
  • Priority changed from normal to major
  • Milestone set to Clozure CL 1.10

I also support _not_ escaping the output of NAMESTRING. It serves no practical purpose as far as I can tell. Furthermore the current behaviour would force me to do the following:

  1. Write a compatability function NATIVE-NAMESTING which looks like this: (defun native-namestring (path)

#+ccl (native-translated-namestring path) #-ccl (namestring path))

  1. Change every usage of NAMESTRING to use said compat function.

The flipside is that some code may depend on CCL's current behaviour but I do estimate that more code will be incidentally fixed than broken by this change.

I set the priority to major, because I think this saves work and confusion.

Note: See TracTickets for help on using tickets.