Opened 5 years ago

#1326 new defect

ccl:run-program breaks on Solaris when portable allegroserve is loaded and running

Reported by: hhdave Owned by:
Priority: normal Milestone:
Component: Runtime (threads, GC) Version: trunk
Keywords: fork forkall solaris run-program threads Cc:

Description

On Solaris system (sx86cl64 runtime) strange things happen when you try and use ccl:run-program to execute another program IFF you (ql:quickload :aserve) and (net.aserve:start :port 8080 :server :new). Specifically: you get many many of these in the output of run program and sometimes the exec seems to fail in the child process:-

3-aserve-worker: 11/19/15 - 11:50:50 - got error :SITUATION is an invalid initarg to INITIALIZE-INSTANCE for #<STANDARD-CLASS SOCKET-ERROR>.
Valid initargs: (:STREAM :ACTION :CODE :IDENTIFIER).

aserve-accept-6: 11/19/15 - 11:50:50 - accept: error 0 on accept :SITUATION is an invalid initarg to INITIALIZE-INSTANCE for #<STANDARD-CLASS SOCKET-ERROR>.
Valid initargs: (:STREAM :ACTION :CODE :IDENTIFIER).

Obviously the specific error message above is due to acl-compat clobbering the CCL SOCKET-ERROR class, which is a problem in itself, BUT the bigger problem is the strange behaviour of ccl:run-program on Solaris only.

After discussing with rme on IRC he pointed me to this issue where it was decided to use forkall on Solaris instead of fork:- http://trac.clozure.com/ccl/changeset/11176

I tried just switching to fork on Solaris, but that seems to deadlock when trying to run-program, so I guess that is the issue which the above addressed. (cf https://docs.oracle.com/cd/E18752_01/html/816-5137/gen-1.html - search for 'deadlocks').

Nonetheless, Solaris is doing something quite distinctly different from Darwin and, presumably, Linux here, in cloning all the threads in the child and the above Oracle docs mention interesting problems that can happen when another whole set of running threads appear with access to the same descriptors, so I don't think the Solaris version is really doing the Right Thing.

So, I decided to try and fix the trac problem described above in a different way by setting up argv in the thread which will do the exec instead of the other thread. I have attached (hopefully) the two changed function definitions. From initial (very quick) testing this seems to work fine, but I haven't tested it very hard.

To replicate, on a solaris system:-

(ql:quickload :aserve)
(net.aserve:start :port 8080 :server :new)
(ccl:run-program "/bin/ls" '("/"))

Attachments (1)

solaris-run-program-fix.lisp (6.3 KB) - added by hhdave 5 years ago.
suggested fix

Download all attachments as: .zip

Change History (1)

Changed 5 years ago by hhdave

suggested fix

Note: See TracTickets for help on using tickets.