Ticket #1015 (closed defect: fixed)
user code is forced to have a race condition between ccl:external-process-status and ccl:signal-external-process
|Reported by:||avodonosov||Owned by:|
The use case.
If the external process I started exceeds a timeout, I am trying to kill the process by sending it signal 9 using ccl:signal-external-process.
To know that the process exceeds the timeout, I pool it with ccl:external-process-status and check if the result is :running.
I.e. my code may approximately be represented as:
(when (and (eq :running (ccl:external-process-status lisp-process)) (< end-time (get-universal-time))) (ccl:signal-external-process lisp-process 9))
But sometimes, the process finishes right after I got the :running from (ccl:external-process-status lisp-process), but before (ccl:signal-external-process lisp-process 9) tries to send it a signal.
In this case, CCL signals SIMPLE-ERROR with text "No such process".
I can not distinguish this error of process being finished, from other errors, because the CCL API doesn't specify particular condition type for this situation, and I can't rely on the text "No such process" as well (I believe the text is linux message for the corresponding errno value).
So, I try to setup a condition handler, and in the handler check the external-process-status again. If it is not :running, I deduce that the condition signaled is caused by the fact that the process has terminated (which is what my code wanted - to kill it, so I can just continue normal processing).
In most cases it works.
But sometimes, after the "No such process" error was signaled and the handler calls external-process-status, external-process-status returns :running. I assume this is because the external process is monitored by an internal CCL background thread, and the background thread hasn't yet synchronized the external-process CCL structure inspected by external-process-status.
As you see, it's not easy to reliable kill and external process by timeout. The API design forces user to deal with these race conditions.
I am not ready to propose a solution.
Maybe synchronization of the external-process object with the real situation before signaling the error would be a good idea?
- Status changed from reopened to closed
- Resolution set to fixed