Opened 5 years ago

Last modified 5 years ago

#1279 new defect

Infinite loop on unmasked foreign SIGFPE

Reported by: jared Owned by:
Priority: normal Milestone:
Component: Foreign Function Interface Version: trunk
Keywords: sigfpe, x87, signal, loop Cc:

Description

Hi,

I think CCL/Linux/X86-64 may have a bug that causes an infinite loop when foreign code causes an unmasked x87 floating point exception. A tarball is attached with a distilled example to try to allow you to reproduce easily; see below for additional instructions.

I have no idea what the "right" behavior is in this case, but there are probably any number of approaches that would be better than an unrecoverable loop. Speculation: it would be nice for the FFI to provide some kind of hook so that a custom C function could be provided to handle the error, but for the 99% of users who don't care to supply such a function, any unmasked SIGFPEs could simply be converted into ordinary Lisp errors.

Note: it appears that this kind of an infinite loop is a common problem with SIGFPE handlers. The handler has to do something very special (e.g., muck with the FPSW to turn off the exception bits, adjust the program counter, etc.) or else it will just be called again by the operating system. Some URLs that came up while searching Google that be helpful:

The tarball includes a file, standalone.c, that includes a working example of a SIGFPE signal handler that does not go into an infinite loop, and also establishes that the tiny assembly code routine will indeed produce a SIGFPE.

Instructions:

  • Extract the tarball somewhere.
  • (Optional) Run "make" to build libtemp.so.0.1. You can try just using the version I've included in the tarball, but I don't know how likely it is to be compatible with your system. If you need to build it yourself, you will need a copy of NASM (http://www.nasm.us) to assemble the tempasm.o file.
  • Run CCL and submit the following to verify that the test code causes an infinite loop:
      (load "temp.lisp")
    

Thanks!

Jared

Attachments (1)

ccl-x87-exceptions-bugreport.tar.gz (8.2 KB) - added by jared 5 years ago.
example code for reproducing the problem

Download all attachments as: .zip

Change History (2)

Changed 5 years ago by jared

example code for reproducing the problem

comment:1 Changed 5 years ago by gb

except in very limited contexts, CCL has no idea of how to handle any kind of exception that occurs in foreign code.

one of those limited contexts involves SIGFPE. and I would guess that the infinite loop that you are seeing stems from the fact that our SIGFPE handler runs, finds that it doesn't know what happened (it has no idea of what an x87 is), and causes the thread on which the exception occurred to simply resume execution at the point of the SIGFPE.)

If CCL isn't involved, most foreign code will simply treat SIGFPE as being fatal to the (unix) process. I really believe that handling exceptions in foreign code is difficult, largely because most foreign code doesn't expect them to occur.

I understand that some specific piece of foreign code might be able to tolerate having a lisp error be signaled in response to whatever causes a SIGFPE, but I don't think that one can safely assume that all such code (pr even very much of it) is tolerant of that.

Note: See TracTickets for help on using tickets.