Ticket #1052 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

Foreign callbacks into a thread which is about to die

Reported by: stassats Owned by:
Priority: normal Milestone:
Component: Runtime (threads, GC) Version: trunk
Keywords: Cc:

Description

Qt-webkit uses pthread_key_create with a destructor, and inside that destructor it calls back into lisp. But apparently lisp-thread infrastructure is already dismantled when the destructor is called.

Here's a test-case:

// gcc test.c -lpthread -shared -o test.so -fPIC

#include <pthread.h>

void finish(void* arg) {
  int (*f)(int, int) = (int(*)(int,int))arg;
  f(1,2);
}

int test (int (*f)(int, int))
{
  pthread_key_t key;
  pthread_key_create(&key, finish);
  pthread_setspecific(key, f);
}

(eval-when (:compile-toplevel :load-toplevel :execute)
  (asdf:load-system :cffi))

(cffi:load-foreign-library "/tmp/test.so")

(cffi:defcfun (test-c "test") :int
  (function :pointer))

(cffi:defcallback (test-callback)
    :short ((a :int) (b :int))
  (print (list a b))
  (finish-output)
  1)

(defun test ()
  (ccl:process-run-function
   nil
   (lambda () (test-c (cffi:get-callback 'test-callback)))))

(test)

One idea I had for fixing this is creating a new thread-key with a value of 1. The order of destructor execution is not specified, but it's specified that they are called PTHREAD_DESTRUCTOR_ITERATIONS times if the key still has a value. So, when the destructor is called the first time, it'll just set its value to 0 and do nothing, then when it's called with 0, do thread clean up. That way other key destructor will get a chance to be called on the first iteration before the clean up is completed.

Change History

comment:1 Changed 2 years ago by gb

If I wanted to install CFFI, I could presumably make a guess as to how this fails. I'd rather not have to install CFFI ...

At the lisp kernel level, a thread is represented by a data structure called a Thread Context Record (TCR). When a thread is created by Lisp code, it creates/initializes a TCR and tries to allocate lisp stacks and other thread-specific things, then it adds its TCR to a global list and goes off and runs lisp code. When that code returns, code deallocates those stacks and other thread-specific things, removes the TCR from the global list, and (on some platforms) deallocates the TCR. (On most flavors of Unix other than Darwin, the TCR is in thread-local storage and the threads library/kernel will deallocate it later.) We also try to use the pthread_key_create() mechanism, but (IIRC) that's really only used if the thread doesn't exit normally.

A thread needs to have a TCR in order to run lisp code. (Among other things, the TCR's presence on a list of TCRs enables the GC to find it and control it.) When a thread calls back into lisp, it (generally) calls a function called get_tcr(true) to find and if necessary create its TCR. If the TCR didn't exist (as could be the case if the thread was created by foreign code and had not previously called into lisp), it'll be created and some other callbacks will be invoked to ensure that when the thread runs the (original) callback it'll do so with a lisp PROCESS object in effect and with a standard set of special variable bindings in effect. (Among other things, that set of bindings will include *CURRENT-PROCESS*)

My model of how things should work in your example is that the thread exits normally and should cleanup its TCR, then the callback is invoked via the pthread key finalization mechanism. The callback glue should call get_tcr(true), get_tcr() should realize that the TCR doesn't exist, and things should work as they do in the case where a callback is invoked from a "foreign" thread (one that lisp didn't create): the code should run on a new PROCESS object, and the new TCR should eventually get cleaned up by the pthread key termination mechanism.

Ah, now I get it. In the function shutdown_thread_tcr() (in lisp-kernel/thread_manager.c), right after

    UNLOCK(lisp_global(TCR_AREA_LOCK),current);

try adding

#ifdef HAVE_TLS
    current_tcr = NULL;
#endif

As is, on platforms that use the toolchain's thread-local-storage mechanims (HAVE_TLS is defined), the callback will call get_tcr(), which will return the value of current_tcr; we've just spent a few screens full of code de-initializing the TCR, and we really want the callback to run with that reinitialized.

Please try this and let me know if it works. If it does, I'll have avoided dealing with asdf and cffi, and I'm always glad to do that ...

comment:2 Changed 2 years ago by stassats

That seems to do it. But just in case, here's a cffi-less test-case:

(ccl:open-shared-library "/tmp/test.so")

(defun test-c (function)
  (external-call "test" :address function :signed-int))

(ccl:defcallback callback
    (:signed-int a :signed-int b :signed-short)
  (let ((a a) (b b))
    (block test-callback
      (print (list a b))
      (finish-output)
      1)))

(defun test ()
  (ccl:process-run-function
   nil
   (lambda () (test-c callback))))

(test)

comment:3 Changed 2 years ago by gb

  • Status changed from new to closed
  • Resolution set to fixed

(In [15577]) In shutdown_thread_tcr(), when HAVE_TLS is defined and the tcr is current_tcr, set current_tcr to NULL and zero out the tcr's contents, just in case.

This enables destructors established with pthread_key_create to make callbacks on a newly-initialized TCR and seems to fix ticket:1052 in the trunk.

Note: See TracTickets for help on using tickets.