Opened 6 years ago

Last modified 2 years ago

#1089 assigned defect

Cannot support non-ANSI (in this case, Chinese) path names

Reported by: ekd123 Owned by: gb
Priority: normal Milestone:
Component: ANSI CL Compliance Version: trunk
Keywords: Cc:

Description

Clozure CL seems to have a full support of Unicode. Unfortunately, the pathnames are still poor. These problems are all about pathnames so I believe there's something wrong with CCL's pathname code. Well, maybe not, but they are exactly bugs.

(I don't know which "Component" should this ticket go to, so I chose "ANSI CL Compliance".)

Problem 1: CCL can't start properly if CCL is placed in a directory whose (full) pathname contains Chinese characters

Results in this case:

> Error: The value NIL is not of the expected type (OR STRING PATHNAME STREAM).
> While executing: PATHNAME-DIRECTORY, in process Initial(0).
> Type :GO to continue, :POP to abort, :R for a list of available restarts.
> If continued: Skip (possibly crucial) startup function CCL::INIT-LOGICAL-DIRECTORIES.
> Type :? for other options.
1 > :pop
> Error: There is no package named "ASDF/FIND-SYSTEM" .
> While executing: CCL::%FASL-NVPACKAGE, in process listener(1).
> Type :GO to continue, :POP to abort, :R for a list of available restarts.
> If continued: Retry finding package with name "ASDF/FIND-SYSTEM".
> Type :? for other options.
1 > :pop
Welcome to Clozure Common Lisp Version 1.9-r15757  (LinuxX8664)!
? 

Problem 2: Can't load a file whose (full) pathname contains Chinese characters

? (load "~/下載/hello.lisp")
> Error: File "~/下載/hello.lisp" does not exist.
> While executing: CCL::%LOAD, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > 

Okay, let's see what shell shows:

mike@laptop ~/下載> cat hello.lisp 
(format t "hello 世界~%")

Well... What's just happened? Let's see what will happen if hello.lisp isn't here

? (load "~/hello.lisp")
hello 世界
#P"/home/mike/hello.lisp"

Change History (6)

comment:1 Changed 6 years ago by gb

  • Owner set to gb
  • Status changed from new to assigned

You need to know how the filesystem encodes the pathname's characters; this varies from OS to OS. Windows stores pathnames as a sequence of little-endian UTF-16 codes terminated by a 0; OS/X uses UTF-8 in a particular decomposed form terminated by a 0 byte. Linux filesystems generally just view filemames as 0-terminated sequences of bytes; what characters those bytes denote (e.g., how the characters are encoded) isn't specified by the OS. Chinese characters might be encoded in UTF-8 or in some legacy encoding, and AFAIK this is just a matter of local convention.

If you know how pathnames are encoded on your version of Linux, you can use (SETF (CCL:PATHNAME-ENCODING-NAME) encoding-name) to advise CCL of this. (If you don't know, I don't know of an alternative to guessing; I'd be glad to know of a way for CCL to make an educated guess about this; the LC_CTYPE environment variable is often unreliable) By default, CCL will assume that pathnames are encoded in ISO-8859-1, and of course that assumption leads to the kinds of errors you're seeing.

Your first problem shows some issues related to loading code apparently loaded from your init file; I don't know that it has anything to do with characters in the name of the directory that CCL started from.

(I'm claiming that environment variables like "LC_CTYPE" and "LANG" aren't reliable; that may be true, but they're presumably what the shell is using and it seems to work for you - so much so that you may not even realize that there's an issue here. If CCL used them and they weren't set correctly, then that wouldn't look like a CCL problem.)

comment:2 Changed 6 years ago by rme

Maybe a command-line option to the lisp kernel to specify the pathname encoding?

comment:3 Changed 6 years ago by ekd123

I think that it would be better if CCL used UTF-8 as default pathname encoding in non-Windows systems. After all, more systems have used UTF-8 as default encoding.

comment:4 follow-up: Changed 6 years ago by gb

It would be even better if it used what the filesystem used, wouldn't it ?

comment:5 in reply to: ↑ 4 Changed 6 years ago by ekd123

Replying to gb:

It would be even better if it used what the filesystem used, wouldn't it ?

Yes. If CCL can't infer, choose utf-8 instead.

comment:6 Changed 2 years ago by rme

  • Milestone Clozure CL 1.9 deleted

Milestone Clozure CL 1.9 deleted

Note: See TracTickets for help on using tickets.