Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#1015 closed defect (fixed)

user code is forced to have a race condition between ccl:external-process-status and ccl:signal-external-process

Reported by: avodonosov Owned by:
Priority: normal Milestone:
Component: IDE Version: trunk
Keywords: Cc:


The use case.

If the external process I started exceeds a timeout, I am trying to kill the process by sending it signal 9 using ccl:signal-external-process.

To know that the process exceeds the timeout, I pool it with ccl:external-process-status and check if the result is :running.

I.e. my code may approximately be represented as:

  (when (and (eq :running
                 (ccl:external-process-status lisp-process))
             (< end-time (get-universal-time)))
    (ccl:signal-external-process lisp-process 9))

But sometimes, the process finishes right after I got the :running from (ccl:external-process-status lisp-process), but before (ccl:signal-external-process lisp-process 9) tries to send it a signal.

In this case, CCL signals SIMPLE-ERROR with text "No such process".

I can not distinguish this error of process being finished, from other errors, because the CCL API doesn't specify particular condition type for this situation, and I can't rely on the text "No such process" as well (I believe the text is linux message for the corresponding errno value).

So, I try to setup a condition handler, and in the handler check the external-process-status again. If it is not :running, I deduce that the condition signaled is caused by the fact that the process has terminated (which is what my code wanted - to kill it, so I can just continue normal processing).

In most cases it works.

But sometimes, after the "No such process" error was signaled and the handler calls external-process-status, external-process-status returns :running. I assume this is because the external process is monitored by an internal CCL background thread, and the background thread hasn't yet synchronized the external-process CCL structure inspected by external-process-status.

As you see, it's not easy to reliable kill and external process by timeout. The API design forces user to deal with these race conditions.

I am not ready to propose a solution.

Maybe synchronization of the external-process object with the real situation before signaling the error would be a good idea?

Change History (4)

comment:1 Changed 8 years ago by gb

  • Resolution set to fixed
  • Status changed from new to closed

(In [15477]) COPY-LIST: if we find that the list is "fairly long", check to see if it's "very long" and change algorithm if so.

CHECK-SEQUENCE-BOUNDS: take length as an &optional arg, since some callers may want to avoid doing LENGTH multiple times.

CONSTANTLY: return #'TRUE or #'FALSE if appropriate.

REMOVE, REMOVE-IF, REMOVE-IF-NOT: build result, don't do destructive operations on copy. (Fixes ticket:1015 in the trunk, though other sequence functions may do similar things.)

comment:2 Changed 8 years ago by gb

  • Resolution fixed deleted
  • Status changed from closed to reopened

r15477 purports to fix ticket:1014, not this ticket.

comment:3 Changed 8 years ago by gb

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [15494]) SIGNAL-EXTERNAL-PROCESS now takes an :ERROR-IF-EXITED keyword argument. If attempting to signal the process fails because the PID has already exited and ERROR-IF-EXITED is false, returns NIL rather than signaling an error. ERROR-IF-EXITED defaults to T (which is backwards-compatible with traditional behavior); whether this case is treated as an error or quietly ignored depends on application context.

Fixes ticket:1015 in the trunk; this change should be documented.

comment:4 Changed 8 years ago by rme

(In [15497]) Document new :error-if-exited keyword argument to the signal-external-process function. See ticket:1015.

Note: See TracTickets for help on using tickets.