Ticket #1044 (reopened defect)

Opened 23 months ago

Last modified 23 months ago

Allocation error when creating/killing many threads

Reported by: jlawrence Owned by:
Priority: normal Milestone:
Component: Runtime (threads, GC) Version: trunk
Keywords: Cc:

Description

(defun test (thread-count)
  (let ((threads (loop repeat thread-count
                       collect (ccl:process-run-function
                                "test" #'sleep 999))))
    ;;(sleep 0.01)
    (mapc #'ccl:process-kill threads)))

(defun run ()
  (loop
     (test 10)
     (ccl:gc)
     (ccl:gc)
     (format t ".")))

Within 500 iterations,

Fatal error: :   Kernel memory allocation failure.  
Can't allocate pointer of size 667648 bytes.

or

> Error: Can't create thread
> While executing: CCL::NEW-TCR, in process listener(1).

No problem when the SLEEP call is added. I can only reproduce on 32-bit Linux and 32-bit Darwin. I have rebuilt with [15557] from ticket:1042.

Change History

comment:1 Changed 23 months ago by gb

  • Status changed from new to closed
  • Resolution set to invalid

I couldn't reproduce this on either 32-bit Linux or Darwin, but I certainly believe that it could happen. What you're doing isn't that much different from:

;;; I'm just typing this into a web browser; I'm not even sure if the parens match
;;; but hopefully it's readable ...
(let* ((threads ()))
  (loop ;; forever, or for a large enough number of times ...
    (push (process-run-function "test" (lambda () (loop (sleep 1)))) threads)))

Obviously, something that tries to create an infinite number of threads will eventually exhaust finite resources (memory for stacks, OS limits). Your example differs from the code above in that it calls PROCESS-KILL on each process soon after it's created; PROCESS-KILL arranges (via PROCESS-INTERRUPT) for the target thread to try to exit cleanly when it's able to run, but there's no guarantee that any of the threads that've been told to exit cleanly will get a chance to do so before the next call to your TEST function creates more threads (and as the number of threads increases, the chance that any of them will get enough of a timeslice to exit cleanly decreases.) Uncommenting the call to SLEEP increases the chance that threads that are waiting for CPU time to kill themselves get that CPU time; otherwise and in general, if your example creates threads faster than they die it'll eventually reach a point where thread creation (or stack allocation for new threads) will fail. (Whether this happens or not likely depends on the number and speed of available CPU cores and other factors; the fact that I can't reproduce this isn't at all conclusive.)

I have no reason to believe that CCL is claiming that resources are exhausted when those resources are in fact available; the different ways in which this can fail have to do with whether the resource allocation attempt happened in the creating thread or the newly-created one. I sometimes think that it should try harder than it does to consistently fail cleanly; the counterargument is that if you're in such bad shape that you can't allocate a few hundred KB for a stack, you probably aren't too far from dying abruptly anyway.

People may find the explanation above unintuitive if they think that a thread has exited at the point where a call to PROCESS-KILL on that thread returns. I don't think that the documentation says or implies this to be true, but it's often true in practice and people may assume more than they should about it.

Since I couldn't reproduce this I don't know with absolute certainty that this explanation is correct, but there seems to be a lot of strong circumstantial evidence to support it. If you see evidence to the contrary, please feel free to reopen this.

comment:2 Changed 23 months ago by jlawrence

Inside the loop ccl:all-processes always reports 2 or 3 threads. I took that as a sign -- alongside the gc calls -- that threads are not building up. Is that a true sign or a false sign?

On 32-bit Darwin, allocation failure happens as well with (lambda ()) instead of #'sleep 999. And ticket:1042 fails in the same way.

comment:3 Changed 23 months ago by gb

A thread created by PROCESS-RUN-FUNCTION does:

1) some initialization, much of which happens in C (and involves creating extra stacks for Lisp use, creating some semaphores and other objects, and registering the thread with the lisp runtime/GC)

2) some further initialization in Lisp; this includes adding the thread to a global list of processes (a copy of that list is returned by ALL-PROCESSES)

3) calls its initial function

4) removes itself from the global list of processes

5) returns to C code where some of its stacks and synchronization objects are deallocated and where it's removed from the low-level list of known threads.

6) exits, letting the OS do further resource cleanup (including deallocating the stack that the thread has been running on.)

Until a thread completes steps 5 and 6 of its lifecycle, it's consuming a few MB of stack space (by default) and other OS resources, and those resources are in visibly limited supply in 32-bit CCL. (You can probably have a few hundred threads that're in some phase of their lifecycle at any point in time in 32-bit CCL; whatever the number is, it's likely to be many times more than the number of CPU cores that are available to run them on.

I think that your test program can create threads faster than those threads can die and therefore exhaust those resources.

If a thread's initial function does nothing (the (lambda ()) case) or if the thread is forced (via PROCESS-KILL) to exit (to start to exit) before that initial function does much of anything, then it'll spend much of its lifetime in phases where it's not visible to ALL-PROCESSES, so yes, that's a false sign.

CCL's GC isn't concurrent, which means that while the GC is running in some thread other threads are prohibited from running. Your code is basically saying:

  (loop
    (create-some-threads)
    (issue-requests-for-those-threads-to-kill-themselves)
    (prevent-those-threads-from-running-by-invoking-GC))

and I think that the basic question is whether those threads are allowed to run (and finish killing themselves) before new threads are created. The GC might have something to do with scheduler behavior (threads that're suspended by a software signal might take a while to get scheduled again when they're able to resume; threads that have been suspended for a while might or might not have priority over newly created threads. All of this depends on the scheduler and other factors; little of it has anything to do with the GC beyond the fact that the GC suspends and resumes threads. [*]

Again, I can't reproduce this and I can't be positive that the explanation I'm offering is correct. I can easily convince myself that it this explanation does explain the behavior that you're seeing. The fact that calling SLEEP affects this is consistent with my explanation (in that it increases the likelyhood that more threads will run to completion) and isn't consistent with other explanations (e.g., threads might just leak significant amounts of memory, but calling SLEEP generally isn't sufficient to fix memory leaks.)


[*] The GC and some other lower-level parts of the CCL runtime have a lower-level view of what a lisp thread is and of what lisp threads exist. At some points - such as when telling other threads to suspend themselves - the GC can notice that a thread has died an abnormal death, and in that (very rare) case the GC can try to arrange to do as much cleanup after that thread as it can. Even in that case, a thread is an OS-level object; a thread can exist before and after the GC is aware of it and the GC doesn't have much influence over how long a thread lives. Your comments and questions seem to suggest that you think it does, and if you've found what I said above difficult to understand you might find it easier if you reexamine any similar assumptions that you may be making.

comment:4 Changed 23 months ago by jlawrence

The two gc calls are only there for good measure. They are not necessary to reproduce the issue.

CCL more often crashes entirely with "Fatal Error:...", and when it doesn't the image is in a poor state. No amount of waiting for ghost threads to finish helps (threads not reported by ccl:all-processes).

It is surprising that you cannot reproduce this, since I can reproduce it on two very different machines with different operating systems.

The following is with the latest from svn.

$ ccl --no-init --load test.lisp --eval '(run)'
...[snip dots]...Fatal error: :   Kernel memory allocation failure.  
Can't allocate pointer of size 667648 bytes.
$ ccl --no-init --load test.lisp --eval '(run)'
...[snip dots]...
> Error: Can't create thread
> While executing: CCL::NEW-TCR, in process listener(1).
> Type :GO to continue, :POP to abort, :R for a list of available restarts.
> If continued: Skip evaluation of (run)
> Type :? for other options.
1 > :pop
Welcome to Clozure Common Lisp Version 1.9-dev-r15564M-trunk  (LinuxX8632)!
? (length (ccl:all-processes))
11
? (ccl:gc)
NIL
? (ccl:gc)
NIL
? (ccl:process-run-function "foo" (lambda ()))
> Error: Can't create thread
> While executing: CCL::NEW-TCR, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > :pop

? (room)
Approximately 16,646,144 bytes of memory can be allocated 
before the next full GC is triggered. 

                   Total Size             Free                 Used
Lisp Heap:       20709376 (20224K)   16646144 (16256K)    4063232 (3968K)
Stacks:         761010016 (743174K)  760683096 (742855K)     326920 (319K)
Static:          14512456 (14172K)          0 (0K)       14512456 (14172K)
2084.250 MB reserved for heap expansion.
NIL

A fresh image looks like:

$ ccl --no-init --eval '(room)'
Approximately 16,646,144 bytes of memory can be allocated 
before the next full GC is triggered. 

                   Total Size             Free                 Used
Lisp Heap:       20316160 (19840K)   16646144 (16256K)    3670016 (3584K)
Stacks:           6357408 (6208K)     6355500 (6207K)        1908 (2K)
Static:          14512456 (14172K)          0 (0K)       14512456 (14172K)
2084.625 MB reserved for heap expansion.
Welcome to Clozure Common Lisp Version 1.9-dev-r15564M-trunk  (LinuxX8632)!

comment:5 Changed 23 months ago by jlawrence

  • Status changed from closed to reopened
  • Resolution invalid deleted

comment:6 Changed 23 months ago by gb

As the ROOM output above shows, in the fresh image you've allocated about 3MB each for stacks for 2 threads; in the image where you couldn't create more threads, the fact that ~750MB of stack space is still allocated suggests that ~250 threads have been created and have not run to completion.

A 32-bit process can't (by definition) access more than 32 bits (4GB) of address space; OSes generally make somewhere between 2 and 3 GB of that actually available. (This is orthogonal to the amount of physical and virtual memory in the system: there may be substantially more memory than address space available to a 32-bit process.) ROOM is saying that over 2GB of CCL's address space is reserved for heap expansion and that another ~750MB is used by stacks that CCL knows about, and this is probably about all of the address space that you can use; thread- and stack-creation failures in this situation aren't at all inconsistent with what ROOM is telling you.

You say that the situation doesn't seem to improve over time (e.g., that the ~250 threads that're tying things up don't seem to be dying.) That may be true, but whenever I've gotten into similar situations I've found that the process is excruciatingly slow but eventually does complete. I don't know for sure that Lisp code isn't somehow preventing the dying threads from making progress (by holding a lock too long, for instance) but most ways in which I can imagine that happening would prevent GC or ROOM from running to completion.

I'm not sure at this point what we're discussing or whether we're agreeing or disagreeing with each other about something. I've claimed that your code could reach this state simply by creating threads faster than they're able to die, and nothing that I've seen is inconsistent with that. If your code creates 10 threads per iteration and 9 of them die before the next iteration (there's a net "gain" of one thread per iteration), then after a few hundred iterations you could have a few hundred more threads than you started with and this seems to be exactly what you're seeing.

You seem to think that there's a CCL bug here. It's honestly not clear to me what you think that bug is. It's -possible- that CCL is somehow preventing the ~250 threads from finishing (and freeing their stacks and otherwise cleaning up after themselves), but competing for CPU time and other resources with ~249 other threads generally does a good job of bringing things to (nearly) a dead halt without any "help" from CCL.

I can think of a couple of further things to look at.

I assume that at the point where your test errors for you that the list returned by (ALL-PROCESSES) contains few processes other than the original 2 and that most of the ~250 threads whose stacks are still visible to ROOM are exiting from lisp and are either stuck in the final stages of shutdown or making excruciatingly slow progress in those stages.

If some/all of the dying threads are still running lisp code and that code conses (allocates lisp objects), then it's possible that a large number of threads each doing "a little bit" of consing is forcing the GC to run very frequently. Calling

(ccl:gc-verbose t)

will cause the GC to print messages to the C "stderr" stream whenever it's invoked. If you enable GC-VERBOSE after the error and see frequent (several times/second) messages, then we'd know one factor that'd contribute to "excruciatingly slow progress." If so, doing:

(ccl:egc nil)

would cause the GC to run less often (and may speed up progress a bit.)

In general, if the dying threads are making progress (and dying), the total size of all stacks (as reported by ROOM) will decrease; I'd expect this to happen very slowly until threads finish dying and to speed up as contention decreases. If several successive calls to ROOM (at several-second intervals) don't show any decrease in total stack size, then that might indicate a bug. If progress is being made (but is being made very slowly), well that's the sort of thing that happens when each thread is only able to get ~1/250th of available CPU time.

comment:7 Changed 23 months ago by jlawrence

As I said, no amount of waiting for ghost threads to finish helps. (ccl:gc-verbose t) shows no activity. 'top' shows no activity. The numbers from (room) don't change.

CCL is in a state where it can't create threads. ccl:all-processes reports N+2 threads, where the N are mine. I can kill N threads and then create N threads, but no more.

You say "excruciatingly slow", but what does that mean, exactly? This is a Core-i7 3.4Ghz and I've had a post-failure REPL open for 15 minutes. Are you saying that, if I sleep on it, by morning Garbage Collecting Santa will have paid me a visit?

I've only encountered this issue while running stress tests; I'm not aware of any practical consequences. Nonetheless I thought it might be of interest, but if you think it's too esoteric then close it.

comment:8 Changed 23 months ago by gb

  • Status changed from reopened to closed
  • Resolution set to invalid

I have no reason to think that this is a CCL bug. I don't know that whatever happens when a C program creates too many threads too quickly is any different from what you're seeing, and you're correct in guessing that I don't think that that's too interesting a question.

comment:9 Changed 23 months ago by jlawrence

OK, but just to clarify again -- your responses largely assume that progress is being made on threads being cleaned up, while I've been reporting that no progress is being made.

comment:10 Changed 23 months ago by gb

  • Status changed from closed to reopened
  • Resolution invalid deleted

I tried

(dotimes (i 1000)
  (let* ((procs ()))
    (dotimes (j 10)
      (push (process-run-function "sleepy" (lambda () (loop (sleep 1)))) procs))
    (dolist (p procs) (process-kill p))))

on x8632 Linux and that ran to completion without error, but ROOM indicated that 3 threads hadn't exited (though only 2 were on ALL-PROCESSES.) Poking around in GDB revealed that the 3 extra threads were apparently looping around and calling SLEEP from lisp code.

Whatever could cause that (something like "being interrupted during thread initialization, and somehow neither exiting or initializing properly" or "running a thread's initial function more than once, somehow, sort of") it's more likely to be a CCL bug than an OS bug and may be more likely to affect real code.

It's entirely possible that something like this is what you've been seeing.

Note: See TracTickets for help on using tickets.