source: trunk/source/level-0/ARM/arm-numbers.lisp @ 15601

Last change on this file since 15601 was 15093, checked in by gb, 8 years ago

New Linux ARM binaries.

The image and FASL versions changed on the ARM, but (if I did it right)
not on other platforms.

(The image and FASL versions are now architecture-specific. This may
make it somewhat easier and less disruptive to change them, since the
motivation for such a change is often also architecture-specific.)
The FASL and current image version are defined (in the "TARGET" package)
in the architecture-specific *-arch.lisp files; the min, max, and current
image versions are defined in the *constants*.h file for the architecture.

Most of the changes are ARM-specific.

Each TCR now contains a 256-word table at byte offset 256. (We've
been using about 168 bytes in the TCR, so there are still 88 bytes/22
words left for expansion.) The table is initialized at TCR-creation
time to contain the absolute addresses of the subprims (there are
currently around 130 defined); we try otherwise not to reference
subprims by absolute address. Jumping to a subprim is:

(ldr pc (:@ rcontext (:$ offset-of-subprim-in-tcr-table)))

and calling one involves loading its address from that table into a
register and doing (blx reg). We canonically use LR as the register,
since it's going to be clobbered by the blx anyway and there doesn't
seem to be a performance hazard there. The old scheme (which involved
using BA and BLA pseudoinstructions to jump to/call a hidden jump table
at the end of the function) is no longer supported.

ARM Subprims no longer need to be aligned (on anything more than an
instruction boundary.) Some remnants of the consequences of an old
scheme (where subprims had to "fit" in small regions and sometimes
had to jump out of line if they would overflow that region's bounds)
still remain, but we can repair that (and it'll be a bit more straightforward
to add new ARM subprims.) We no longer care (much) about where subprims
are mapped in memory, and don't have to bias suprimitive addresses by
a platform-specific constant (and have to figure out whether or not we've
already done so) on (e.g.) Android.

Rather than setting the first element (fn.entrypoint) of a
newly-created function to the (absolute) address of a subprim that updates
that entrypoint on the first call, we use a little LAP function to correct
the address before the function can be called.

Non-function objects that can be stored in symbols' function cells
(the UNDEFINED-FUNCTION object, the things that encapsulate
special-operator names and global macro-functions) need to be
structured like FUNCTIONS: the need to have a word-aligned entrypoint
in element 0 that tracks the CODE-VECTOR object in element 1. We
don't want these things to be of type FUNCTION, but do want the GC to
adjust the entrypoint if the codevector moves. We've been essentially
out of GVECTOR subtags on 32-bit platforms, largely because of the
constraints that vector/array subtags must be greater than other
subtags and numeric types be less. The first constraint is probably
reasonable, but the second isn't: other typecodes (tag-list, etc) may
be less than the maximum numeric typecode, so tests like NUMBERP can't
reliably involve a simple comparison. (As long as a mask of all
numeric typecodes will fit in a machine word/FIXNUM, a simple LOGBITP
test can be used instead.) Removed all portable and ARM-specific code
that made assumptions about numeric typecode ordering, made a few more
gvector typecodes available, and used one of them to define a new
"pseudofunction" type. Made the GC update the entrypoints of
pseudofunctions and used them for the undefined-function object and
for the function cells of macros/special-operators.

Since we don't need the subprim jump table at the end of each function
anymore, we can more easily revive the idea of embedded pc-relative
constant data ("constant pools") and initialize FPRs from constant
data, avoiding most remaining traffic between FPRs and GPRs.

I've had a fairly-reproducible cache-coherency problem: on the first
GC in the cold load, the thread misbehaves mysteriously when it
resumes. The GC tries to synchronize the I and D caches on the entire
range of addresses that may contain newly-moved code-vectors. I'm not
at all sure why, but walking that range and flushing the cache for
each code-vector individually seems to avoid the problem (and may actually
be faster.)

Fix ticket:894

Fixed a few typos in error messages/comments/etc.

I -think- that the non-ARM-specific changes (how FASL/image versions are
defined) should bootstrap cleanly, but won't know for sure until this is
committed. (I imagine that the buildbot will complain if not.)

File size: 6.9 KB
Line 
1;-*- Mode: Lisp; Package: CCL -*-
2;;;
3;;;   Copyright (C) 2009 Clozure Associates
4;;;   Copyright (C) 1994-2001 Digitool, Inc
5;;;   This file is part of Clozure CL. 
6;;;
7;;;   Clozure CL is licensed under the terms of the Lisp Lesser GNU Public
8;;;   License , known as the LLGPL and distributed with Clozure CL as the
9;;;   file "LICENSE".  The LLGPL consists of a preamble and the LGPL,
10;;;   which is distributed with Clozure CL as the file "LGPL".  Where these
11;;;   conflict, the preamble takes precedence. 
12;;;
13;;;   Clozure CL is referenced in the preamble as the "LIBRARY."
14;;;
15;;;   The LLGPL is also available online at
16;;;   http://opensource.franz.com/preamble.html
17
18
19
20(in-package "CCL")
21
22(defarmlapfunction %fixnum-signum ((number arg_z))
23  (cmp number (:$ 0))
24  (movlt arg_z '-1)
25  (movgt arg_z '1)
26  (bx lr))
27
28(defarmlapfunction %ilogcount ((number arg_z))
29  (let ((arg imm0)
30        (shift imm1)
31        (temp imm2))
32    (unbox-fixnum arg number)
33    (movs shift arg)
34    (mov arg_z '0)
35    (b @test)
36    @next
37    (sub  temp shift (:$ 1))
38    (ands shift shift temp)
39    (add arg_z arg_z '1)
40    @test
41    (bne @next)
42    (bx lr)))
43
44(defarmlapfunction %iash ((number arg_y) (count arg_z))
45  (unbox-fixnum imm1 count)
46  (unbox-fixnum imm0 number)
47  (rsbs imm2 imm1 (:$ 0))
48  (blt @left)
49  (mov imm0 (:asr imm0 imm2))
50  (box-fixnum arg_z imm0)
51  (bx lr)
52  @left
53  (mov arg_z (:lsl number imm1))
54  (bx lr))
55
56(defparameter *double-float-zero* 0.0d0)
57(defparameter *short-float-zero* 0.0s0)
58
59
60(defarmlapfunction %sfloat-hwords ((sfloat arg_z))
61  (ldr imm0 (:@ sfloat (:$ arm::single-float.value)))
62  (digit-h temp0 imm0)
63  (digit-l temp1 imm0)
64  (vpush1 temp0)
65  (vpush1 temp1)
66  (add temp0 vsp '2)
67  (set-nargs 2)
68  (spjump .SPvalues))
69
70
71; (integer-length arg) = (- 32 (clz (if (>= arg 0) arg (lognot arg))))
72(defarmlapfunction %fixnum-intlen ((number arg_z)) 
73  (unbox-fixnum imm0 arg_z)
74  (clz imm1 imm0)
75  (cmp imm1 (:$ 0))
76  (bne @nonneg)
77  (mvn imm1 imm0)
78  (clz imm1 imm1)
79  @nonneg
80  (rsb imm1 imm1 (:$ 32))
81  (box-fixnum arg_z imm1)
82  (bx lr))
83
84
85
86
87
88
89;;; Caller guarantees that result fits in a fixnum.
90(defarmlapfunction %truncate-double-float->fixnum ((arg arg_z))
91  (get-double-float d0 arg)
92  (ftosizd s2 d0)
93  (fmrs imm0 s2)
94  (box-fixnum arg_z imm0)
95  (bx lr))
96
97
98
99(defarmlapfunction %truncate-short-float->fixnum ((arg arg_z))
100  (get-single-float s0 arg imm0)
101  (ftosizs s2 s0)
102  (fmrs imm0 s2)
103  (box-fixnum arg_z imm0)
104  (bx lr))
105
106
107
108;;; DOES round to even
109
110(defarmlapfunction %round-nearest-double-float->fixnum ((arg arg_z))
111  (get-double-float d0 arg)
112  (ftosid s2 d0)
113  (fmrs imm0 s2)
114  (box-fixnum arg_z imm0)
115  (bx lr))
116
117
118
119(defarmlapfunction %round-nearest-short-float->fixnum ((arg arg_z))
120  (get-single-float s0 arg imm0)
121  (ftosis s2 s0)
122  (fmrs imm0 s2)
123  (box-fixnum arg_z imm0)
124  (bx lr))
125
126
127
128
129
130
131;;; maybe this could be smarter but frankly scarlett I dont give a damn
132;;; ticket:666 describes one reason to give a damn.
133(defarmlapfunction %fixnum-truncate ((dividend arg_y) (divisor arg_z))
134  (let ((unboxed-quotient imm0)
135        (unboxed-dividend imm0)
136        (unboxed-divisor imm1)
137        (unboxed-remainder imm1)
138        (quotient arg_y)
139        (remainder arg_z))
140    (build-lisp-frame)
141    (mov fn nfn)
142    (cmp divisor '-1)   
143    (unbox-fixnum unboxed-dividend dividend)
144    (unbox-fixnum unboxed-divisor divisor)
145    (beq @neg)
146    (sploadlr .SPsdiv32)
147    (blx lr)
148    (box-fixnum quotient unboxed-quotient)
149    (box-fixnum remainder unboxed-remainder)
150    (stmdb (:! vsp) (quotient remainder))
151    (set-nargs 2)
152    (spjump .SPnvalret)
153    @neg
154    (ldr arg_z (:@ fn '*least-positive-bignum*))
155    (rsbs dividend dividend (:$ 0))
156    (ldrvs dividend (:@ arg_z (:$ arm::symbol.vcell)))
157    @ret
158    (mov temp0 (:$ 0))
159    (vpush1 dividend)
160    (vpush1 temp0)
161    (set-nargs 2)
162    (spjump .SPnvalret)))
163
164
165
166
167(defarmlapfunction called-for-mv-p ()
168  (ref-global imm0 ret1valaddr)
169  (ldr imm1 (:@ sp (:$ arm::lisp-frame.savelr)))
170  (cmp imm1 imm0)
171  (mov arg_z 'nil)
172  (add arg_z arg_z (:$ arm::t-offset))
173  (bx lr))
174
175;;; n1 and n2 must be positive (esp non zero)
176;;; See <http://en.wikipedia.org/wiki/Binary_GCD_algorithm>
177(defarmlapfunction %fixnum-gcd ((n1 arg_y)(n2 arg_z))
178  (mov arg_x rcontext)                  ;need an extra imm reg
179  (unbox-fixnum imm0 n1)
180  (unbox-fixnum imm1 n2)
181  (subs r3 imm0 imm0)                   ; zero power-of-2 counter, set c flag
182  (orrs imm2 imm0 imm1)                 ; preserves carry, set other flags
183  @remove-twos-loop
184  (movsne imm2 (:lsr imm2 (:$ 1)))      ; carry = lsbit
185  (addcc r3 r3 (:$ 1))                  ; increment counter if lsbit 0
186  (bcc @remove-twos-loop)
187  (movs imm0 (:lsr imm0 r3))
188  (movsne imm1 (:lsr imm1 r3))
189  (beq @finish)
190  @check-two-r0
191  (movs imm0 (:lsr imm0 (:$ 1)))
192  (bcc @check-two-r0)
193  @check-two-r1
194  (movs imm1 (:lsr imm1 (:$ 1)))
195  (bcc @check-two-r1)
196  (subs imm1 imm1 imm0)
197  (addcc imm0 imm0 imm1)
198  (rsbcc imm1 imm1 (:$ 0))
199  (bne @check-two-r1)
200  (adc imm0 imm0 imm0)
201  @finish
202  (orr imm0 imm1 (:lsl imm0 r3))
203  (mov rcontext arg_x)
204  (box-fixnum arg_z imm0)
205  (bx lr))
206
207
208
209(defarmlapfunction %mrg31k3p ((state arg_z))
210  (let ((seed temp0)
211        (m1 #x7fffffff))
212    (svref seed 1 state)
213    (u32-ref imm0 1 seed)
214
215    (mov imm1 (:lsr imm0 (:$ 9)))
216    (mov imm2 (:lsl imm0 (:$ 23)))      ;get low 9 bits
217    (mov imm2 (:lsr imm2 (:$ 23)))
218    (add imm1 imm1 (:lsl imm2 (:$ 22)))
219
220    (u32-ref imm0 2 seed)
221    (add imm1 imm1 (:lsr imm0 (:$ 24)))
222    (bic imm2 imm0 (:$ #xff000000))
223    (add imm1 imm1 (:lsl imm2 (:$ 7)))
224
225    (cmp imm1 (:$ m1))
226    (subhi imm1 imm1 (:$ m1))
227
228    (add imm1 imm1 imm0)
229    (cmp imm1 (:$ m1))
230    (subhi imm1 imm1 (:$ m1))
231
232    (u32-ref imm0 1 seed)
233    (u32-set imm0 2 seed)
234    (u32-ref imm0 0 seed)
235    (u32-set imm0 1 seed)
236    (u32-set imm1 0 seed)
237
238    ;; second component
239    (u32-ref imm0 3 seed)
240    (mov imm1 (:$ 20992))
241    (add imm1 imm1 (:$ 77))
242    (mov imm2 (:lsr imm0 (:$ 16)))
243    (mul imm2 imm1 imm2)
244    (mov imm0 (:lsl imm0 (:$ 16)))
245    (add imm0 imm2 (:lsr imm0 (:$ 1)))
246
247    (lri imm2 2147462579)
248    (cmp imm0 imm2)
249    (subhi imm0 imm0 imm2)
250
251    (vpush1 rcontext)
252    (mov rcontext imm0)                 ;save t1
253
254    (u32-ref imm0 5 seed)
255    (mov imm2 (:lsr imm0 (:$ 16)))
256    (mul imm2 imm1 imm2)                ;21069 still in imm1
257    (mov imm1 (:lsl imm0 (:$ 16)))
258    (add imm1 imm2 (:lsr imm1 (:$ 1)))
259
260    (lri imm2 2147462579)
261    (cmp imm1 imm2)
262    (subhi imm1 imm1 imm2)
263
264    (add imm1 imm1 imm0)
265    (cmp imm1 imm2)
266    (subhi imm1 imm1 imm2)
267
268    (add imm1 imm1 rcontext)            ;add in t1 from back when
269    (vpop1 rcontext)
270    (cmp imm1 imm2)
271    (subhi imm1 imm1 imm2)
272
273    (u32-ref imm0 4 seed)
274    (u32-set imm0 5 seed)
275    (u32-ref imm0 3 seed)
276    (u32-set imm0 4 seed)
277    (u32-set imm1 3 seed)
278
279    ;; combination
280    (u32-ref imm0 0 seed)
281    (sub imm2 imm0 imm1)
282    (cmp imm0 imm1)
283    (addls imm2 imm2 (:$ m1))
284    (bic imm2 imm2 (:$ #xe0000000))     ;avoid negative fixnums
285    (box-fixnum arg_z imm2)
286    (bx lr)))
287
288; End of arm-numbers.lisp
Note: See TracBrowser for help on using the repository browser.