Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#1292 closed enhancement (wontfix)

Error in %MAKE-READTABLE-ITERATOR

Reported by: fare Owned by:
Priority: normal Milestone:
Component: other Version: trunk
Keywords: Cc:

Description

I recently recompiled my ccl from svn trunk, and got this error:

ccl --eval '(require "asdf")' --eval '(asdf:make "named-readtables")'

Error: Undefined function CCL::RDTAB.ALIST called with arguments (#<error printing READTABLE #x30200095218D>) . While executing: %MAKE-READTABLE-ITERATOR, in process listener(1).

Apparently, named-readtables relied on the old rdtab.alist to build a %make-readtable-iterator.

Can you provide some API that iterates over your sparse-vector and/or your readtables?

Here is the corresponding issue in said library: https://github.com/melisgl/named-readtables/issues/5

Change History (9)

comment:1 Changed 4 years ago by gb

There was never any public interface to the old implementation either and when the implementation changed code that depended on implementation details broke.

My personal belief is that that's exactly what should happen to such code.

The issue isn't one of providing a public inerface to sparse vectors or not (hint: read the source. there isn't that much of it. Think a little.) A CL:SIMPLE-VECTOR would be less ... sparse, but would otherwise be a reasonable data structure to map fom characters/codes to functions. Exactly how CCL uses whatever data structure it usesnnis private - not because it's some sort of big secret or something but because it's something that an implementation should be free to change at any time. If that means that MAKE-READTABLE-ITERATOR stops working, feel free to blame me for that.

comment:2 Changed 4 years ago by fare

It's quite OK to break this internal data structure and provide a better one.

But there ought to be a better way of identifying macro-characters than iterating over one million characters and querying the readtable for each one.

Could CCL export an interface to iterate over the interesting parts of a readtable?

comment:3 Changed 4 years ago by gb

COPY-READTABLE copies the parts of the sparse vector that are actually present. It uses a simple but private interface to sparse vectors. That interface will likely remain private, whether you agree with or understand the reasons for keeping it that way or not.

If you want to iterate over the conents of the data structure that's used to store macro characters in the current trunk and understand that you'd be using a private interface to do so and understand the consequences of doing so, go ahead.

I think that there are good reasons for keeping some implementation details private. I'm not trying to be rude (if I was, I'd do a better job of it) but I haven't heard anything but requests to make a supported, public interface to things that I don't believe should have such an interface, and I don't see any reason to change my mind about that.

If you want to know more details about the current implementation and understand that those details are subject to change, fine. That isn't what I hear you asking for, and I have alredy explained why I don't want to provide what you think you want.

Again, there was never any sort of "public inerface" to the old implementation. That didn't stop whoever wrote %MAKE-READTABLE-ITERATOR from poking around in the details enough to write a version of that function that broke when the implementation changed. If you want to poke around in the new implementation and understand that the implemenatation might change, I have no objection to that. If you want to know what macro-characters are defined in a given readtable, you can obviously iterate over all codea from 0 to CHAR-CODE-LIMIT and call GET-MACRO-CHARACTER yourself. If you think that that would be too slow, the obvious thing to speed it up would be to speed up GET-MACRO-CHARACTER, not to invent some new interface that's sort of like it. GET-MACRO-CHARACTER can and should exploit whatever implementation details it wants to.

comment:4 Changed 4 years ago by fare

The problem is not that GET-MACRO-CHARACTER is slow. The problem is that 1,114,112 code points is a LOT of characters to iterate over and that iterating over an entire charset just to find the few interesting entries in the READTABLE is not going to be very efficient.

I submitted a patch to named-readtables to make it aware of your sparse-vector data structure.

comment:5 Changed 4 years ago by alms

On my laptop it takes 0.03 seconds to call GET-MACRO-CHARACTER on every character up to CHAR-CODE-LIMIT. I guess it depends on how often you run through that loop. But if it's something that happens once or twice when you're setting up your system, saving some portion of 0.03 seconds doesn't seem worth the cost of non-portable code. But YMMV I guess.

comment:6 Changed 4 years ago by gb

The work done by GET-MACRO-CHARACTER depends on the structure and contents of the readtable, We can argue about whether calling that function ~1000000 times is or is not "fast enough" in some particilar use case, but that isn't really relevant. I don't think that there is any other way to write COUNT-MACRO-CHARACERS in portable CL withot callinf GET-MACRO-CHARACTER for every character for every possible macro character, and whether that's "fast enouugh" for a particular case depends on a lot of things.. For many users, it pretty much has to be, because they recognize that depending on internal implementation details (and depending on those details not changing) is a bad idea.

Even if I wanted to provide some (largely imaginary) public interface to the current implementation details, I don't have time and I don't know that anyone else does either. If a paying customer needed that and unerstood the risks and tradeoffs involved, i might feel differently. I don't know of any paying customer who would want to invest money on something that they know is likely to break and which they don't have a great need for, and that seems like it should be easier for anyone to undrstand than it has been so far.

comment:7 Changed 4 years ago by fare

I wrote the following function and submitted it to named-readtables, which fixes it. I believe it would be better and easier to maintain in the face of representation change if it lived on the CCL side rather than as a #+clozure function in named-readtables.

(defun %make-readtable-iterator (readtable)
  (flet ((ensure-alist (x)
           (etypecase x
             (list x)
             (ccl::sparse-vector
              (let ((table (ccl::sparse-vector-table x)))
                (loop for i below (length table) for v = (svref table i) when v append
                  (loop with i8 = (ash i 8) for j below (length v) for datum = (svref v j)
                     when datum collect (cons (code-char (+ i8 j)) datum))))))))
    (let ((char-macros
            (ensure-alist (ccl::rdtab.macros readtable))))
      (lambda ()
        (if char-macros
            (destructuring-bind (char . defn) (pop char-macros)
              (if (consp defn)
                  (values t char (car defn) t (ensure-alist (cdr defn)))
                  (values t char defn nil nil)))
            (values nil nil nil nil nil))))))

That said, I suppose it is the whole point of libraries like named-readtables, bordeaux-threads, cffi, etc., to extend Common Lisp in a way that papers over the idiosyncrasies of all the implementations.

comment:8 Changed 4 years ago by gb

  • Resolution set to wontfix
  • Status changed from new to closed

I've thought about it some more, and (though I appeciate your effort) I really don't want to add your code to CCL. As far as I can tell, it deals with both the old and new internal readtable details correctly.

1) libraries like CFFI and Bordeaux Threads depend on implentation-dependent but public APIs that were public before those libraries existed. That is not the case here.

2) CCL has no (other) reason to support the old (alist-based) data structures. Your library has to run in older versions, as well as well as current and future versions.

When the change was first made, I left some of the old code in place, even though all readtables in the image used the new representation and there was no easy way to use the old representation anymore. Some code distributed with CCL - on which some commercial products is based - assumed the old representation and broke. Someone tried to fix that by making some internal function deal with both the old and new representations, and that wasn't really possible, but that wan't clear because the code seemed to still support the old alists. The CCL code that seemed to deal with alist-based readtables has been removed and the cofusion that that presence of that code caused seems to have gone with it.

3) I want to be able to make chnges like this in the future, and I want to be as free as possible to do so. The fact that this change broke some library that depends on private implementation details is unfortunate, but it's neither a reason not to make changes like this in the future nor a reason to make those details public. Public interfaces are intentionally stable and difficult to change.

I don't think that the readtable implementation in CCL has changed in a long time. You were presumably depending on it not changing. No that it has changed, you've decided that it "should" have a public interface to shield your library from future changes. Around the time of the last CCL release, some widely-used library broke because that library was using code lifted from the CCL compiler sources and that code changed. Breaking that library was unfortunate for people that used it, but (fortunately) no one thought that that part of the compiler should offer a stable public interface to shield that library. If you need to find all if the macro characters on a readtable, you know how to do that in the current implementation. There is no guarantee that the implementation won't change, and there never was. Your library needs to support both the old readtable implementation and new one. CCL doesn't.

The library that was (mis)using compiler details is widely used and I'm sure that iy is useful to people that use it, I'm sure that the same things are true of the library that you maintain as well. I don't see any difference between your case and the libary that depended on other implemenration details that might change.

comment:9 Changed 4 years ago by fare

OK. No problem. Thanks a lot for your attention. If you make further changes to this representation, a fair warning is appreciated, but then again, we'll find out pretty quick anyway, I suppose.

Note: See TracTickets for help on using tickets.