Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#899 closed defect (fixed)

segfault in LinuxX8632/WindowsX8632 build using postmodern library

Reported by: emit Owned by: rme
Priority: normal Milestone:
Component: Compiler Version: trunk
Keywords: segfault sigsegv linuxx8632 postmodern Cc:


I've reproduced the problem with the binary release (1.7-r14925M) and also a rebuild from svn (1.7-r15146M). There is no such problem when using the 64 bit linux version. I've tested the LinuxX8632 in 32bit linux 2.6.32 (debian squeeze) and also 64bit install 3.0 (arch linux). The postgresql server is version 8.4 (not relevant).

steps to reproduce:

;; install latest postmodern somehow. 
;; I've tried the one from quicklisp dist and 
;; also the latest git version from 

(ql:quicklisp :postmodern) 
(pomo:connect-toplevel "dbname" "user" "pass" "host")
(pomo:query "select '42'")

expected result: (("42")) <--- ok in sbcl and LinuxX8664 ccl

in LinuxX8632 ccl I get:

> Error: Database error: Unexpected end of file on #<BASIC-TCP-STREAM :CLOSED #x188F5736>
> While executing: (:INTERNAL #:G5027 CL-POSTGRES:EXEC-QUERY), in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.

At first it looks like a bug in the library, but tcpdump reveals mangled data in which the length of simple-describe-message packet is declared incorrectly as 1. This four byte length value in the packet's header is computed through define-message macro in postmodern/cl-postgres/messages.lisp.

During macro expansion it appears to correctly compute, but when it finally gets sent through (simple-describe-message socket) in cl-postgres:send-query, it's accessing some garbage.

Through defmacro integer-writer (postmodern/cl-postgres/communicate.lisp), a bunch of functions called uint1 uint2 uint4 etc are created which send bytes to socket through write-byte. I changed all the occurrences of write-byte in this macro to my-write-byte and defined

(defun my-write-byte (value socket)
  (format t "~a~%" value)
  (write-byte value socket))

I'm not sure if it's relevant but I also commented out all the #.optimize. After intercepting all the write-bytes with my-write-bytes, it dumps the value of each byte sent (for integer fields); when it comes to accessing the length it now dumps the following error

;;; (pomo:query "select '42'")
;; protocol.lisp (send-query ..
80      ; 'P' --- Parse command packet (simple-parse-message ..)
19      ; 19 bytes (length of parse packet)
;;;;; string select '42' is sent here
0       ; number of param data types (0)
68      ; 'D' --- Describe command (simple-describe-message ..)
> Error: Fault during read of memory address #x0
> While executing: CCL::VALID-HEADER-P, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
1 > q
;;;; length of message = 1 in 32bit ccl. 6 in sbcl and 64bit ccl

No backtrace is available. Last byte (least significant byte of the length value) should be 6, which tells postgresql that the packet is six bytes: 4 byte length field + 2 bytes of data. If I check the value of the length variable, "base-length", computed during macro expansion of define-message simple-describe-message, it appears correct (=6).

Sending the 0x00000001 as length of the packet naturally causes postgresql to output the following error to syslog: invalid message length and the connection is dropped.

If I repeat with gdb attached to the process I get:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf740ab70 (LWP 5227)]
0x1040547e in ?? ()
(gdb) bt
#0  0x1040547e in ?? ()
#1  0x10404c65 in ?? ()
#2  0x104400a5 in ?? ()
#3  0x1044749d in ?? ()
#4  0x08054445 in _SPFret1valn () at ../x86-spentry32.s:955
#5  0x100d8d55 in ?? ()
#6  0x08054445 in _SPFret1valn () at ../x86-spentry32.s:955
#7  0x10506c9d in ?? ()
#8  0x10502c25 in ?? ()
#9  0x1086eb75 in ?? ()
#10 0x100899f5 in ?? ()

I'm not certain but I'm inclined to believe this is a bug in ccl.

Change History (5)

comment:1 Changed 7 years ago by emit

  • Summary changed from segfault in LinuxX8632 build using postmodern library to segfault in LinuxX8632/WindowsX8632 build using postmodern library

Just confirmed same behavior with WindowsX8632 in winxp64. Works with wx86cl64.exe but does not with wx86cl.exe.

comment:2 Changed 7 years ago by rme

  • Owner set to rme

comment:3 Changed 7 years ago by rme

Test case:

Compile and load a file with the following contents:

(in-package :cl-user)

(declaim (inline write-u8))

(defun write-u8 (socket value)
  (declare (type (unsigned-byte 32) value))
  (write-byte (ldb (byte 8 0) value) socket)

(defun foo ()
  (with-output-to-vector (v)
    (write-u8 v #x99)))

Evaluating (foo) results in #(38), not in #(153) as expected. Note that (ash #x99 (- x8632::fixnumshift)) is 38.

We are incorrectly loading the unboxed value #x99 into arg_y before calling write-byte.

comment:4 Changed 7 years ago by gb

  • Resolution set to fixed
  • Status changed from new to closed

(In [15149]) If the result of a "natural" arithmetic/logical operation is a constant (as may be evident during late constant-folding), use new function X862-NATURAL-CONSTANT to process it (and decide whether to box, etc.)

Fixes ticket:899 in the trunk. Need to check PPC, ARM backends to see if they have the same bug.

comment:5 Changed 7 years ago by rme

(In [15154]) Fix ticket:899 in 1.7 branch also.

Note: See TracTickets for help on using tickets.