#756 closed defect (invalid)
read-line() broken for utf-16 and ucs-2
Reported by: | rigus | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Compiler | Version: | trunk |
Keywords: | Cc: |
Description
I get a crash when executing something like
(with-open-file (stream ucs-2-file :external-format :ucs-2) (print (read-line stream)))
The same for :utf-16. read-char() is OK.
- Paul
Change History (5)
comment:1 Changed 10 years ago by gb
comment:2 Changed 10 years ago by rigus
Your example works fine for me. My guess is that it doesn't work for files missing a BOM (U+FEFF). Then CCL is either looping forever and consuming RAM, or Emacs (running Slime) crashes. If I recall correctly, BOM usage is optional.
comment:3 Changed 10 years ago by rigus
I get the crash with a really big file (the one I originally run into the problem with). When I modify your file and remove the BOM, i get:
Unexpected end of file on #<BASIC-FILE-CHARACTER-INPUT-STREAM ("home:example-no-bom.txt"/7 UCS-2) #x302000DBDE6D>, near position 158 [Condition of type END-OF-FILE]
comment:4 Changed 10 years ago by gb
- Resolution set to invalid
- Status changed from new to closed
BOM usage isn't exactly "optional": if you say that a file is encoded in UCS-2 (or UTF-16), you're saying that the file either begins with a BOM or is implicitly big-endian (UCS-2BE, UTF-16BE.) If it's in fact UCS-2LE, then the first call to READ-LINE will likely try to read the entire file into a string whose characters have byte-reversed codes and return a second value of T, and subsequent calls will immediately return EOF.
(See http://tools.ietf.org/html/rfc2781. Note that http://en.wikipedia.org/wiki/UTF-16/UCS-2 claims that some Windows software assumes little-endian encoding when no BOM is present, and this behavior may be what you expect.)
If you have data that's encoded as UCS-2LE (or UTF-16LE) with no BOM, you don't want to claim that it's UCS-2/UTF-16. E.g., you want to say:
(with-open-file (f path :external-format :ucs-2le) (read-line f) ...)
comment:5 Changed 10 years ago by rigus
Sorry for my confusion, and thanks for the clarification. This makes sense.
- Paul
What crash or error do you get ?
prints the 3 lines of text that were written to the file.