Opened 9 years ago

Closed 9 years ago

#649 closed defect (fixed)

Win32: Process terminates if many threads are created.

Reported by: a_gavrilov Owned by: gb
Priority: major Milestone:
Component: Runtime (threads, GC) Version: 1.4
Keywords: win32 threads crash Cc:

Description

If the program creates many short-lived threads, eventually it starts failing, and subsequently the process silently exits:

E:\Software\ccl>wx86cl.exe
Welcome to Clozure Common Lisp Version 1.4  (WindowsX8632)!
? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++
NIL
? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++
NIL
? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++
NIL
? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++
NIL
? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++
NIL
? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
CreateThread: 0x8 Not enough storage is available to process this command.

+++++++++++++++++++++++++++++++++++++++++++++++++
> Error: Can't create thread
> While executing: CCL::NEW-TCR, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > :pop

? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))
CreateThread: 0x8 Not enough storage is available to process this command.

++
> Error: Can't create thread
> While executing: CCL::NEW-TCR, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > :pop

? (dotimes (i 100) (format t "+") (process-run-function :foo (lambda () nil)))

E:\Software\ccl>

This causes a lot of annoying random disconnects with slime.

Process Explorer from sysinternals shows process Virtual Size starting from 1GB, but growing by about 1.5MB per thread created until it reaches 2GB, where everything dies.

Change History (4)

comment:1 Changed 9 years ago by gb

  • Status changed from new to assigned

It's failing to deallocate some of a thread's stacks when the thread exits, because of some confusion about how Windows memory-mapping primitives work.

comment:2 Changed 9 years ago by a_gavrilov

This appears to fix it:

--- thread_manager.c	(revision 13378)
+++ thread_manager.c	(working copy)
@@ -1583,8 +1587,15 @@
 void
 free_stack(void *s)
 {
+  /* Can't use UnMapMemory() because it only uses MEM_DECOMMIT.
+     Note also that the second parameter must be 0, or it won't work.
+     (With 0 it releases the whole area reserved by VirtualAlloc) */
+#ifdef WINDOWS
+  VirtualFree(s, 0, MEM_RELEASE);
+#else
   size_t size = *((size_t *)s);
-  UnMapMemory(s, size);
+  munmap(s, size);
+#endif
 }
 
 Boolean threads_initialized = false;

Btw, I don't quite understand why UnMapMemory? uses decommit: the posix version appears to do a full munmap.

The program also leaks a lot of thread handles; this seems to fix it:

--- thread_manager.c	(revision 13378)
+++ thread_manager.c	(working copy)
@@ -1425,6 +1425,10 @@
     tcr->tlb_limit = 0;
     free(tcr->tlb_pointer);
     tcr->tlb_pointer = NULL;
+#ifdef WIN32
+    if (tcr->osid != 0)
+      CloseHandle((HANDLE)tcr->osid);
+#endif
     tcr->osid = 0;
     tcr->interrupt_pending = 0;
     tcr->termination_semaphore = NULL;
@@ -1838,6 +1842,9 @@
 
   if (thread_handle == NULL) {
     wperror("CreateThread");
+  } else {
+    // Nobody uses the return value
+    CloseHandle(thread_handle);
   }
   return (LispObj) ptr_to_lispobj(thread_handle);
 }
@@ -2131,8 +2138,10 @@
          mark the TCR as dead and kill thw Windows thread. */
       tcr->osid = 0;
       if (!TerminateThread((HANDLE)osid, 0)) {
+        CloseHandle((HANDLE)osid);
         result = false;
       } else {
+        CloseHandle((HANDLE)osid);
         shutdown_thread_tcr(tcr);
       }
 #else 

I haven't actually tested these changes much, though - only the above test case.

comment:3 Changed 9 years ago by rme

r13431 tries to address this.

comment:4 Changed 9 years ago by rme

  • Resolution set to fixed
  • Status changed from assigned to closed

merged to 1.4 in r13433; new Windows lisp kernel binaries added to 1.4 branch in r13587 and r13588.

Note: See TracTickets for help on using tickets.