TweetFollow Us on Twitter

Float Point 2
Volume Number:2
Issue Number:8
Column Tag:Threaded Code

Floating Point Package, Part II

By Jörg Langowski, EMBL, c/o I.L.L., Grenoble, Cedex, France, Editorial Board

"Fast exp(x) and ln(x) in single precision"

We will continue with numerics this time, in order to give some examples how to put the 32 bit floating point package to practical use, and also because we got feedback that some more information about number crunching would be appreciated.

First, however, it is time for some apologies: the bugs have been creeping into the multiply routine, and when I noticed the last few traces they left, the article was already in press. The problem was that when the number on top of stack was zero, the routine would all of a sudden leave two numbers on the stack, one of which was garbage. This problem has been fixed in the revision, which is printed in Listing 1. I hope there will be no more errors, but please let me know if you find any. A reliable 32 bit package is so important for numerical applications on the Mac!

For many applications, the four basic operations +-*/ by themselves already help a lot in speeding up. However, alone they do not make a functional floating point package. For operations that are not used so frequently, like conversion between integer, single and extended or input/output on can still rely on the built-in SANE routines. But for the standard mathemetical functions you would want to have your own definitions that make full use of the speed of the 32 bit routines.

Developing a complete package of mathematical functions would be a project that is outside the scope of this column. I'll only give you two examples that serve to show that a very reasonable speed can be attained in Forth (here, Mach1) without making too much use of assembly language. The two examples, ln(x) and exp(x) are based on approximations taken from the Handbook of Mathematical Functions by M. Abramowitz and I.A. Stegun, Dover Publications, New York 1970. Furthermore, the routines given here profited a lot from ideas published in the April '86 issue of BYTE on number crunching.

First, we have to realize that a transcendental function like ln(x), using a finite number of calculation steps, can only be approximated over a certain range of input numbers to a certain maximum accuracy. It is intuitively clear that the wider the range of the argument x, the lengthier the calculation gets to achieve the desired accuracy. Therefore, approximation formulas for standard functions are usually given over a very restricted range of x. We have to see that we play some tricks on the input value x so that we can get a reliable approximation over the whole range of allowed floating point numbers, which is approximately 10-38 to 10+38 for the IEEE 32 bit format.

The handbook mentioned gives various approximations for ln(x) with different degrees of accuracy. The accuracy that we need for a 24 bit mantissa is 2-23 10-7, and a suitable approximation for this accuracy would be

ln(1+x)   a1x + a2x2 + a3x3 + a4x4 + a5x5+ a6x6 + a7x7 + a8x8  +  (error),
 [1]

where for 0 ¯ x ¯ 1 the error is less than 3.10-8. The coefficients a1 to a8 are:

a1 =  0.9999964239, a2 =  -0.4998741238, 
a3 =  0.3317990258, a4 =  -0.2407338084,
a5 =  0.1676540711, a6 =  -0.0953293897, 
a7 =  0.0360884937, a8 =  -0.0064535442 . 

To calculate eqn. [1] more rapidly, it is of course convenient to write it as

ln(1+x)   x.(a1 + x.(a2 + x.(a3 + x.(a4 + x.(a5 + x.(a6+ x.(a7 + x.a8))))))
 [2]

where by consecutive addition of coefficients and multiplication by the argument the polynomial may be evaluated with a minimum of operations. ln.base in Listing 2 calculates eqn. [2] and gives a good approximation for ln(x) in the range of x=1 2.

For numbers outside this range, we have to realize that

 ln(a.x) = ln(a) + ln(x),

and in the special case when a = 2n,

 ln(2n .x) = n.ln(2) + ln(x).

Now, all our floating point numbers are already split up in such a way; they contain a binary exponent n and a mantissa x such that x is between 1 and 2. So it remains to separate the exponent and mantissa, calculate eqn.[2] for the mantissa and add n times ln(2), which is a constant that we can calculate and store beforehand.

The separation of exponent and mantissa is done in get.exp, which will leave the biased exponent on top of stack, followed by the mantissa in the format of a 32-bit floating point number between 1 and 2. We now have to multiply the exponent by ln(2), an (integer) times (real) multiplication. Instead of writing another routine do do this, we use a faster method that, however, is a little memory consuming: we build a lookup table for all values of n.ln(2) with n between -127 and +128, the allowed range of exponents. Since the exponent is biased by +127, we can use it directly to index the table. The table consumes 1K of memory, so I wouldn't use it on a 48K CP/M system, but with 0.5 to 1 megabyte on a Mac, this can be justified. The lookup table is created using the SANE routines; this takes a couple of seconds, but it is done only for the initialization.

For faster indexing, I also defined the word 4* in assembly, which does not exist in Mach1 (it does, of course, in MacForth).

The final definition ln first separates exponent and mantissa and then computes ln(x) from those separate parts. Note that ln as well as ln.base are written completely in Forth. Fine-tuning of those routines, using assembler, should speed them up by another factor of 1.5 to 2 (wild guess). Still, you already gain a factor of 12 over the SANE routine (use speed.test to verify). The accuracy is reasonably good; the value calculated here differs from the 'exact' extended precision value by approximately 1 part in 107 to 108, just about the intrinsic precision of 32-bit floating point.

Let's now proceed to the inverse of the logarithm, the exponential. The handbook gives us the approximation

e-x    a1x + a2x2 + a3x3 + a4x4 + a5x5+ a6x6 + a7x7 + (error),

with the coefficients

a1 =  -0.9999999995, a2 =  0.4999999206, 
a3 =  -0.1666653019, a4 =  0.0416573745,
a5 =  -0.0083013598, a6 =  0.0013298820, 
a7 =  -0.0001413161 .

This approximation is valid to within 2.10-10 for x between 0 and ln(2) 0.6, and we use it for x = 0 1 for our purposes here, which still is sufficiently precise for a 24 bit mantissa.

Again, we have to scale down the input value of x in order to get it into the range of validity of the approximation. This time, we use the relationship

 e(N+f) = eN  . ef  ,

where N is the integer and f the fractional part of x. eN will be looked up in a table and ef calculated from the approximation. To get N, we need a real-to-integer conversion routine; this routine, together with its integer-to-real counterpart, is coded in assembler with some Forth code to get the signs correct (words s>i and i>s). The fractional part is calculated by subtracting the integer part from the input number; this is done in Forth without giving up too much in speed. exp puts it all together and calculates ex for the whole possible range of x values.

As before, the lookup table for the eN values is initialized separately, using the SANE routines.

The benchmark, speed.test, shows a 24 fold speed increase of this exponential function as compared to the 80-bit SANE version.

Other mathematical standard functions can be defined in a way very similar to the examples that I gave here. A good source of some approximations is the handbook mentioned above, also, many interesting ideas regarding numerical approximations can be found in BYTE 4/86.

Feedback dept.

Let's turn to some comments that I received through electronic mail on Bitnet and BIX.

Here comes a comment (through BIX) on the IC! bug in NEON, which leads to a very interesting observation regarding the 68000 instruction set:

Memo #82583

From: microprose

Date: Fri, 23 May 86 21:44:08 EDT

To: jlangowski

Cc: mactutor

Message-Id: <memo.82583>

Subject: "IC!" bug -- why it happens

Just got my April '86 MacTutor, and I thought I'd answer your question about the bug in the "IC!" word. Register A7 in the 68000 is always used as the stack pointer, and as such must always be kept word-aligned. As a special case, the pre-decrement and post-increment addressing modes, when used with a byte-sized operand, automatically push or pop an extra padding byte to keep the stack word-aligned. In the case of MOVE.B (A7)+,<dest>, this causes the most-significant byte of the word at the top of the stack to be transferred; then the stack pointer is adjusted by 2 (not 1). I would guess that a similar thing is happening with ADDQ #3,A7; since you mention nothing about a stack underflow, it seems that this instruction is adding 2 to A7, not 4 as I would have suspected. (Otherwise, in combination with the following instruction, an extra word is being removed from the stack.) Since the desired byte is at the bottom of the longword, your solution is the best one (assuming that D0 is a scratch register).

I should point out that this is based only on the material printed in your column, as I do not own Neon. I do, however, have Mach 1 (V1.2), and I am looking forward to more coverage of it in future issues of MacTutor.

Russell Finn

MicroProse Software

[Thank you for that observation. In fact, I tried to single step - with Macsbug - through code that looked like the following:

 NOP
 NOP
 MOVE.L A7,D0
>>>>> ADDQ.L #3,A7     <<<<<
 MOVE.L D0,A7
 etc. etc.

I didn't even get a chance to look at the registers! As soon as the program hits the ADDQ.L instruction, the screen goes dark, bing! reset! Also, running right through that piece of code (setting a breakpoint after the point where A7 was restored) resulted in the same crash. Therefore, this behavior should have nothing to do with A7 being used intermediately by Macsbug. I see two explanations: Either an interrupt occuring while A7 is set to a wrong value or a peculiarity of the 68000, which makes the machine go reset when this instruction is encountered (???). At any rate, the designers of NEON never seem to have tested their IC! definition, otherwise they would have noticed it]

A last comment: we have received a nicely laid out newsletter of the MacForth User's group, which can be contacted at

MFUG,

3081 Westville Station, New Haven, CT 06515.

With the variety of threaded code systems for the Macintosh around and being actively used, I think it is a good idea to keep the topics dealt with in this column as general as possible; even though I am using Mach1 for my work at the moment, most of the things apply to other Forths as well.

What would help us a great deal, of course, is feedback from you readers 'out there'. If you have pieces of information, notes or even whole articles on Forth aspects that you think would be of interest to others (and if it interested you, it will interest others), please, send them in.

Listing 1: 32 bit FP multiply, first revision (and hopefully the last one)
CODE     S*     
         MOVE.L  (A6)+,D1
         BEQ     @zero
         MOVE.L  (A6)+,D0
         BEQ     @end
         MOVE.L  D0,D2
         MOVE.L  D1,D3
         SWAP.W  D2
         SWAP.W  D3
         CLR.W   D4
         CLR.W   D5
         MOVE.B  D2,D4
         MOVE.B  D3,D5
         BSET    #7,D4
         BSET    #7,D5
(        ANDI.W  #$FF80,D2 )
         DC.L    $0242FF80
(        ANDI.W  #$FF80,D3 )
         DC.L    $0243FF80
         ROL.W   #1,D2
         ROL.W   #1,D3
         SUBI.W  #$7F00,D2
         SUBI.W  #$7F00,D3
         ADD.W   D2,D3
         BVS     @ovflchk
         MOVE.W  D4,D2  
         MULU.W  D1,D2  
         MULU.W  D0,D1  
         MULU.W  D5,D0  
         MULU.W  D4,D5 
         ADD.L   D2,D0  
         MOVE.W  D5,D1 
         SWAP.W  D1
         ADD.L   D1,D0  
         BPL     @nohibit
     ADDI.W  #$100,D3
         BVC     @round
         BRA     @ovflchk
@nohibit ADD.L   D0,D0
@round   BTST    #7,D0
         BEQ     @blk.exp
         BTST    #6,D0
         BNE     @incr
         BTST    #8,D0
         BEQ     @blk.exp
@incr    ADDI.L  #$80,D0
         BCC     @blk.exp
         ADDI.W  #$100,D3
         BVC     @blk.exp
@ovflchk BPL     @makezero
         MOVE.L  #$7F800000,-(A6)  
         RTS
@makezero  CLR.L D0
         MOVE.L  D0,-(A6)
         RTS
@zero    CLR.L D0
         MOVE.L  D0,(A6)
         RTS
@blk.exp ADDI.W  #$7F00,D3
         BLE     @makezero
         ROR.W   #1,D3
(        ANDI.W  #$FF80,D3 )
         DC.L    $0243FF80
         LSR.L   #8,D0
         BCLR    #23,D0
         SWAP.W  D3
         CLR.W   D3
         OR.L    D3,D0
@end     MOVE.L  D0,-(A6)
         RTS     
END-CODE          
Listing 2: Example definitions for exponential and natural logarithm, Mach1 
only forth definitions also assembler also sane
include" add.sub"
include" mul.sp"
include" div.sp"
(  files  I keep my floating point routines )

CODE 4*
     MOVE.L (A6)+,D0
     ASL.L  #2,D0
     MOVE.L D0,-(A6)
     RTS
END-CODE MACH

( extract biased exponent & mantissa 
from 32 bit FP # )

CODE get.exp
     MOVE.L  (A6)+,D0
     MOVE.L  D0,D1
     SWAP.W  D0
     LSR.W   #7,D0
     ANDI.L  #$FF,D0
     MOVE.L  D0,-(A6)
     ANDI.L  #$7FFFFF,D1
     ORI.L   #$3F800000,D1
     MOVE.L  D1,-(A6)
     RTS
END-CODE
   
CODE stoi  
        MOVE.L  (A6)+,D0
        MOVE.L  D0,D1
        SWAP.W  D0
        LSR.W   #7,D0
        SUBI.B  #127,D0
        BMI     @zero
        BEQ     @one
        ANDI.L  #$7FFFFF,D1
        BSET    #23,D1
        CMP.B   #8,D0
        BCC     @long.shift
        LSL.L   D0,D1
        CLR.W   D1
        SWAP.W  D1
        LSR.L   #7,D1
        MOVE.L  D1,-(A6)
        RTS
@long.shift
        LSL.L   #7,D1
        SUBQ.B  #7,D0
        CLR.L   D2
@shifts LSL.L   #1,D1
        ROXL.L  #1,D2
        SUBQ.B  #1,D0
        BNE     @shifts
        CLR.W   D1
        SWAP.W  D1
        LSR.L   #7,D1
        LSL.L   #8,D2
        ADD.L   D2,D2
        OR.L    D2,D1
        MOVE.L  D1,-(A6)
        RTS
@zero   CLR.L   D0
        MOVE.L  D0,-(A6)
        RTS
@one    MOVEQ.L #1,D0
        MOVE.L  D0,-(A6)
        RTS
END-CODE

: s>i dup 0< if stoi negate else stoi then ;

CODE itos
        MOVE.L  (A6)+,D0
        BEQ     @zero
        CLR.L   D1
        MOVE.L  #$7F,D2
@shifts CMPI.L  #1,D0
        BEQ     @one
        LSR.L   #1,D0
        ROXR.L  #1,D1
        ADDQ.L  #1,D2
        BRA     @shifts
@one    LSR.L   #8,D1
        LSR.L   #1,D1
        SWAP.W  D2
        LSL.L   #7,D2
        BCLR    #31,D2
        OR.L    D2,D1
        MOVE.L  D1,-(A6)
        RTS
@zero   MOVE.L  D0,-(A6)
        RTS
END-CODE        
hex
: i>s dup 0< if negate itos 80000000 or
 else itos then ;
decimal
 
: s. s>f f. ;

vocabulary maths also maths definitions

decimal
fp 9 float

-inf f>s constant -infinity
 inf f>s constant  infinity

1.0  f>s constant one
10.  f>s constant ten
100. f>s constant hun
pi f>s constant pi.s
2.718281828  f>s constant eu

( exponential, natural log )

 .9999964239 f>s constant a1ln
-.4998741238 f>s constant a2ln
 .3317990258 f>s constant a3ln
-.2407338084 f>s constant a4ln
 .1676540711 f>s constant a5ln
-.0953293897 f>s constant a6ln
 .0360884937 f>s constant a7ln
-.0064535442 f>s constant a8ln

variable ln2table 1020 vallot
  2.0 fln    f>s constant ln2
: fill.ln2table
    256 0 do ln2 i 127 - i>s s*
             i 4* ln2table + !
          loop
;
: ln.base 
    one s- a8ln over s*
           a7ln s+ over s*
           a6ln s+ over s*
           a5ln s+ over s*
           a4ln s+ over s*
           a3ln s+ over s*
           a2ln s+ over s*
           a1ln s+ s*
;
: ln dup 0> if get.exp
               ln.base
               swap 4* ln2table + @
               s+
            else drop -infinity
            then
;
: lnacc
  1000 0 do 
    i . i i>s ln  dup s.
        i i>f fln fdup f.
          s>f f- f. cr
    loop
;
variable exptable 700 vallot
: fill.exptable
      176 0 do i 87 - i>f fe^x f>s
             i 4* exptable + !
          loop
;
  
-.9999999995 f>s constant a1exp
 .4999999206 f>s constant a2exp
-.1666653019 f>s constant a3exp
 .0416573745 f>s constant a4exp
-.0083013598 f>s constant a5exp
 .0013298820 f>s constant a6exp
-.0001413161 f>s constant a7exp

: exp.base a7exp over s*
           a6exp s+ over s*
           a5exp s+ over s*
           a4exp s+ over s*
           a3exp s+ over s*
           a2exp s+ over s*
           a1exp s+ s*
           one s+
           one swap s/
;
: exp dup s>i swap over i>s s- exp.base swap 
          dup -87 < if 2drop 0
     else dup  88 > if 2drop infinity
     else 87 + 4* exptable + @ 
           ( get exp of integer part ) s* then
     then
;
: expacc
  1000 0 do 
    i . i i>s hun  s/  exp  dup s.
        i i>f 100. f/ fe^x fdup f.
          s>f f- f. cr
    loop
;
:  emptyloop 0  1000 0 do  dup  drop loop  drop ;
: femptyloop 0. 1000 0 do fdup fdrop loop fdrop ;
: testexp  ten one s+ 1000 0 do  dup  exp  drop loop  drop ;
: testfexp        11. 1000 0 do fdup fe^x fdrop loop fdrop ;
: testln  ten one s+ 1000 0 do  dup  ln  drop loop  drop ;
: testfln        11. 1000 0 do fdup fln fdrop loop fdrop ;
: speed.test cr
  ." Testing 32 bit routines..." cr
 ."    empty..." counter emptyloop timer cr
."      exp..." counter testexp timer cr
 ."       ln..." counter testln timer cr cr
    ." Testing SANE routines..." cr
    ."    empty..." counter femptyloop timer cr
    ."      exp..." counter testfexp timer cr
    ."       ln..." counter testfln timer cr
;
 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Whitethorn Games combines two completely...
If you have ever gone fishing then you know that it is a lesson in patience, sitting around waiting for a bite that may never come. Well, that's because you have been doing it wrong, since as Whitehorn Games now demonstrates in new release Skate... | Read more »
Call of Duty Warzone is a Waiting Simula...
It's always fun when a splashy multiplayer game comes to mobile because they are few and far between, so I was excited to see the notification about Call of Duty: Warzone Mobile (finally) launching last week and wanted to try it out. As someone who... | Read more »
Albion Online introduces some massive ne...
Sandbox Interactive has announced an upcoming update to its flagship MMORPG Albion Online, containing massive updates to its existing guild Vs guild systems. Someone clearly rewatched the Helms Deep battle in Lord of the Rings and spent the next... | Read more »
Chucklefish announces launch date of the...
Chucklefish, the indie London-based team we probably all know from developing Terraria or their stint publishing Stardew Valley, has revealed the mobile release date for roguelike deck-builder Wildfrost. Developed by Gaziter and Deadpan Games, the... | Read more »
Netmarble opens pre-registration for act...
It has been close to three years since Netmarble announced they would be adapting the smash series Solo Leveling into a video game, and at last, they have announced the opening of pre-orders for Solo Leveling: Arise. [Read more] | Read more »
PUBG Mobile celebrates sixth anniversary...
For the past six years, PUBG Mobile has been one of the most popular shooters you can play in the palm of your hand, and Krafton is celebrating this milestone and many years of ups by teaming up with hit music man JVKE to create a special song for... | Read more »
ASTRA: Knights of Veda refuse to pump th...
In perhaps the most recent example of being incredibly eager, ASTRA: Knights of Veda has dropped its second collaboration with South Korean boyband Seventeen, named so as it consists of exactly thirteen members and a video collaboration with Lee... | Read more »
Collect all your cats and caterpillars a...
If you are growing tired of trying to build a town with your phone by using it as a tiny, ineffectual shover then fear no longer, as Independent Arts Software has announced the upcoming release of Construction Simulator 4, from the critically... | Read more »
Backbone complete its lineup of 2nd Gene...
With all the ports of big AAA games that have been coming to mobile, it is becoming more convenient than ever to own a good controller, and to help with this Backbone has announced the completion of their 2nd generation product lineup with their... | Read more »
Zenless Zone Zero opens entries for its...
miHoYo, aka HoYoverse, has become such a big name in mobile gaming that it's hard to believe that arguably their flagship title, Genshin Impact, is only three and a half years old. Now, they continue the road to the next title in their world, with... | Read more »

Price Scanner via MacPrices.net

B&H has Apple’s 13-inch M2 MacBook Airs o...
B&H Photo has 13″ MacBook Airs with M2 CPUs and 256GB of storage in stock and on sale for up to $150 off Apple’s new MSRP, starting at only $849. Free 1-2 day delivery is available to most US... Read more
M2 Mac minis on sale for $100-$200 off MSRP,...
B&H Photo has Apple’s M2-powered Mac minis back in stock and on sale today for $100-$200 off MSRP. Free 1-2 day shipping is available for most US addresses: – Mac mini M2/256GB SSD: $499, save $... Read more
Mac Studios with M2 Max and M2 Ultra CPUs on...
B&H Photo has standard-configuration Mac Studios with Apple’s M2 Max & Ultra CPUs in stock today and on Easter sale for $200 off MSRP. Their prices are the lowest available for these models... Read more
Deal Alert! B&H Photo has Apple’s 14-inch...
B&H Photo has new Gray and Black 14″ M3, M3 Pro, and M3 Max MacBook Pros on sale for $200-$300 off MSRP, starting at only $1399. B&H offers free 1-2 day delivery to most US addresses: – 14″ 8... Read more
Department Of Justice Sets Sights On Apple In...
NEWS – The ball has finally dropped on the big Apple. The ball (metaphorically speaking) — an antitrust lawsuit filed in the U.S. on March 21 by the Department of Justice (DOJ) — came down following... Read more
New 13-inch M3 MacBook Air on sale for $999,...
Amazon has Apple’s new 13″ M3 MacBook Air on sale for $100 off MSRP for the first time, now just $999 shipped. Shipping is free: – 13″ MacBook Air (8GB RAM/256GB SSD/Space Gray): $999 $100 off MSRP... Read more
Amazon has Apple’s 9th-generation WiFi iPads...
Amazon has Apple’s 9th generation 10.2″ WiFi iPads on sale for $80-$100 off MSRP, starting only $249. Their prices are the lowest available for new iPads anywhere: – 10″ 64GB WiFi iPad (Space Gray or... Read more
Discounted 14-inch M3 MacBook Pros with 16GB...
Apple retailer Expercom has 14″ MacBook Pros with M3 CPUs and 16GB of standard memory discounted by up to $120 off Apple’s MSRP: – 14″ M3 MacBook Pro (16GB RAM/256GB SSD): $1691.06 $108 off MSRP – 14... Read more
Clearance 15-inch M2 MacBook Airs on sale for...
B&H Photo has Apple’s 15″ MacBook Airs with M2 CPUs (8GB RAM/256GB SSD) in stock today and on clearance sale for $999 in all four colors. Free 1-2 delivery is available to most US addresses.... Read more
Clearance 13-inch M1 MacBook Airs drop to onl...
B&H has Apple’s base 13″ M1 MacBook Air (Space Gray, Silver, & Gold) in stock and on clearance sale today for $300 off MSRP, only $699. Free 1-2 day shipping is available to most addresses in... Read more

Jobs Board

Medical Assistant - Surgical Oncology- *Apple...
Medical Assistant - Surgical Oncology- Apple Hill Location: WellSpan Medical Group, York, PA Schedule: Full Time Sign-On Bonus Eligible Remote/Hybrid Regular Apply Read more
Omnichannel Associate - *Apple* Blossom Mal...
Omnichannel Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Cashier - *Apple* Blossom Mall - JCPenney (...
Cashier - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Blossom Mall Read more
Operations Associate - *Apple* Blossom Mall...
Operations Associate - Apple Blossom Mall Location:Winchester, VA, United States (https://jobs.jcp.com/jobs/location/191170/winchester-va-united-states) - Apple Read more
Business Analyst | *Apple* Pay - Banco Popu...
Business Analyst | Apple PayApply now " Apply now + Apply Now + Start applying with LinkedIn Start + Please wait Date:Mar 19, 2024 Location: San Juan-Cupey, PR Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.