Ticket #358 (assigned defect)

Opened 3 months ago

Last modified 4 days ago

filesystem character encoding

Reported by: stassats Assigned to: gb (accepted)
Priority: minor Milestone:
Component: Runtime (threads, GC) Version: trunk
Keywords: Cc:

Description

CCL doesn't properly deal with filenames with unicode characters which are beyond latin-1.

Neither DIRECTORY does list correct filenames, nor OPEN can acces files with unicode pathnames.

-K utf-8 is supplied, *default-file-character-encoding* => :UTF-8. That is on 64 and 32 bit linux.

Change History

10/21/08 09:39:52 changed by gb

  • status changed from new to assigned.

To the best of my knowledge, Linux is completely ignorant of pathname encoding: a file or directory just has a NUL-terminated string for a name, and whether that's encoded in UTF-8 or ASCII or ... is up to the application. (This is in contrast to the approach taken by Darwin - for example - where filenames are internally represented in a kind of weird decomposed UTF-8, regardless of the filesystem. Windows uses UTF-16; FreeBSD either uses UTF-8 or plans to standardize on that in the near future.)

Even if there's no good way for OPEN or DIRECTORY or ... to guess what encoding's in use, there should at least be a default for Linux and other OSes that don't impose or follow encoding conventions.

(Whatever that's called - perhaps *DEFAULT-PATHNAME-ENCODING* - the way that a pathname is encoded doesn't generally have anything to with how its contents are encoded.)

10/21/08 09:48:52 changed by stassats

Maybe it's reasonable to determine default encoding by the value of LC_CTYPE or LANG.

01/03/09 14:56:06 changed by rme

r11200, r11202 adds some support for specifying the encoding used for filenames.