TweetFollow Us on Twitter

Fast Square Root Calc

Volume Number: 14 (1998)
Issue Number: 1
Column Tag: Assembler Workshop

Fast Square Root Calculation

by Guillaume Bédard, Frédéric Leblanc, Yohan Plourde and Pierre Marchand, Québec, Canada

Optimizing PowerPC Assembler Code to beat the Toolbox

Introduction

The calculation of the square root of a floating-point number is a frequently encountered task. However, the PowerPC processors don't have a square root instruction. The implementation presented here performs the square root of a double-precision number over the full range of representation of the IEEE 754 standard for normalized numbers (from 2.22507385851E-308 to 1.79769313486E308) with an accuracy of 15 or more decimal digits. It is very fast, at least six times faster than the Toolbox ROM call.

Theory of Operation

A floating point number has three components: a sign, a mantissa and an exponential part. For example, the number +3.5 x 10^4 (35 000) has a plus sign, a mantissa of 3.5 and an exponential part of 104. The mantissa consists of an integer part and a fractional part f.

A double precision number in IEEE 754 format has the same components: a sign bit s, an 11-bit exponent e and a 52 bit fraction f. The exponential part is expressed in powers of 2 and the exponent is biased by adding 1023 to the value of e. The mantissa is normalized to be of the form 1.f. Since the integer part of the normalized mantissa is always 1, it doesn't have to be included in the representation. The number is thus represented as follows: (-1)s x 1.f x 2^e+1023.

For example, the number 5.0 can be expressed in binary as 101.0, which means 101.0 x 2^0, which in turn is equal to 1.010 x 2^2, obtained by dividing the mantissa by 4 and multiplying 2^0 by 4. Therefore, the normalized mantissa is 1.010 and the exponent 2. Fraction f is then .01000000.... The biased exponent is obtained by adding 1023 to e and is 1025, or 10000000001 in binary. The double precision IEEE representation of 5.0 is finally:

-----------------------------------------------------------------------------
|  0 | 10000000001 | 01000000000000000000000000000000000000000000000000000  |
-----------------------------------------------------------------------------

or 4014000000000000 in hexadecimal notation for short.

First Approximation

Given this representation, a first approximation to the square root of a number is obtained by dividing the exponent by 2. If the number is an even power of 2 such as 16 or 64, the exact root is obtained. If the number is an odd power of 2 such as 8 or 32, 1/SQRT(2) times the square root is obtained. In general, the result will be within a factor SQRT(2) of the true value.

Refining the Approximation

The Newton-Raphson method is often used to obtain a more accurate value for the root x of a function f(x) once an initial approximation x0 is given:

[1]

This becomes, in the case of the square root of n, = x2 - n:where f(x)

[2]

An excellent approximation to the square root starting with the initial approximation given above is obtained within 5 iterations using equation [2]. This algorithm is already pretty fast, but its speed is limited by the fact that each iteration requires a double-precision division which is the slowest PowerPC floating-point instruction with 32 cycles on the MPC601 (Motorola, 1993).

Eliminating Divisions

Another approach is to use equation [1] with the function.

In this case, equation [1] becomes:

[3]

There is still a division by n, but since n is constant (it's the original number whose root we want to find), it can be replaced by multiplying by 1/n, which can be calculated once before the beginning of the iteration process. The five 32-cycle divisions are thus replaced by this single division followed by 5 much faster multiplications (5 cycles each). This approach is approximately three times faster than the preceding one. However, care must be taken for large numbers since the term in x02 can cause the operation to overflow.

Use of a table

Finally, an approach that is even faster consists in using a table to obtain a more accurate first approximation. In order to do so, the range of possible values of fraction f (0 to ~1) is divided into 16 sub-ranges by using the first 4 bits of f as an index into a table which contains the first two coefficients of the Taylor expansion of the square root of the mantissa (1.0 to ~2) over that sub-range.

The Taylor expansion is given in general by:

[4]

the first two terms of which yield, in the case where f(x) = SQRT(x):

[5]

The square root of x is thus approximated by 16 straight-line segments. The table therefore contains the values of

A =

and B =

for each of the 16 sub-ranges as shown in Figure 1. This first approximation gives an accuracy of about 1.5 %.

Figure 1. Approximation by straight line segment.

To reach the desired accuracy of 15 digits, equation [2] is applied twice to the result of equation [5]. To avoid having to perform two divisions by repeating the iteration, the two iterations are folded together as follows, which contains only one division:

and

[6]

In order to perform these calculation, the exponent of x and n is reduced to -1 (1022 biased), so that floating-point operations apply only to the values of the mantissa and don't overflow if the exponent is very large. The value of these numbers will therefore be in the range 0.5 to 1.0 since the mantissa is in the range 1.0 to 2.0. If the original exponent was odd, the mantissa is multiplied by SQRT(2) before applying equation [6].

Finally, the original exponent divided by two is restored at the end.

The Code

The SQRoot function shown in Listing 1 has been implemented in CodeWarrior C/C++ version 10.

Listing 1: SQRoot.c

// On entry, fp1 contains a positive number between 2.22507385851E-308
// and 1.79769313486E308. On exit, the result is in fp1.

asm long double SQRoot(long double num);   // prototype

float Table[35] = {
0.353553390593, 0.707106781187, 0.364434493428, 0.685994340570,
0.375000000000, 0.666666666667, 0.385275875186, 0.648885684523,
0.395284707521, 0.632455532034, 0.405046293650, 0.617213399848,
0.414578098794, 0.603022689156, 0.423895623945, 0.589767824620,
0.433012701892, 0.577350269190, 0.441941738242, 0.565685424949,
0.450693909433, 0.554700196225, 0.459279326772, 0.544331053952,
0.467707173347, 0.534522483825, 0.475985819116, 0.525225731439,
0.484122918276, 0.516397779494, 0.492125492126, 0.508000508001,
1.414213562373, 0.000000000000, 0.000000000000 };

asm long double Sqrt(long double num) {

   lwz   r3,Table(rtoc)      // address of Table[]
   lhz   r4,24(sp)           // load
                             // Sign(1)+Exponent(11)+Mantissa(4)
   andi.   r5,r4,0xF         // keep only Mantissa(4)
   ori   r5,r5,0x3FE0        // exponent = -1+BIAS = 1022
   sth   r5,24(sp)           // save reduced number

   rlwinm   r5,r5,3,25,28    // take 8*Mantissa(4) as index
   lfd   fp1,24(sp)          // load reduced number
   lfsux   fp4,r5,r3         // load coefficient A
   lfs   fp5,4(r5)           // load coefficient B
   lfs   fp3,128(r3)         // load SQRT(2)
   fmr   fp2,fp1             // copy reduced number
   rlwinm.   r5,r4,31,18,28  // divide exponent by 2
   beq   @@2                 // if (exponent == 0) then done

   fmadd   fp2,fp2,fp5,fp4   // approximation SQRT(x) = A + B*x
   andi.   r4,r4,0x10        // check if exponent even
   beq   @@1                 // if (exponent even) do iteration
   fmul   fp2,fp2,fp3        // multiply reduced number by SQRT(2)
   fadd   fp1,fp1,fp1        // adjust exponent of original number

@@1:   fadd   fp3,fp2,fp2    // 2*x
   fmul   fp5,fp2,fp1        // x*n
   fadd   fp3,fp3,fp3        // 4*x
   fmadd   fp4,fp2,fp2,fp1   // x*x + n
   fmul   fp5,fp3,fp5        // 4*x*x*n
   fmul   fp6,fp2,fp4        // denominator = x*(x*x + n)
   fmadd   fp5,fp4,fp4,fp5   // numerator = (x*x + n)*(x*x + n) +
                             // 4*x*x*n
   fdiv   fp1,fp5,fp6        // double precision division
   andi.   r5,r5,0x7FF0      // mask exponent 
   addi   r5,r5,0x1FE0       // rectify new exponent

@@2:   sth   r5,132(r3)      // save constant C (power of 2) 
   lfd   fp2,132(r3)         // load constant C
   fmul   fp1,fp1,fp2        // multiply by C to replace exponent
   blr                       // done, the result is in fp1
}

Performance

The code presented above runs in less than 100 cycles, which means less than 1 microsecond on a 7200/75 Power Macintosh and is more than six times faster than the ROM code. The code could be modified to make use of the floating reciprocal square root estimate instruction (frsqrte) that is available on the MPC603 and MPC604 processors, and which has an accuracy of 5 bits. It is not available on the MPC601, however. The method used here could also be used to evaluate other transcendental functions.

Performance was measured by running the code a thousand times and calling a simple timing routine found in (Motorola, 1993), that we called myGetTime(). It uses the real-time clock of the MPC 601 processor (RTCU and RTCL registers) and is shown in Listing 2. The routine would have to be modified to run on MPC603 or MPC604 processors, since they don't have the same real-time clock mechanism.

The code doesn't support denormalized numbers (below 2.22507385851E-308). This could easily be implemented albeit at the cost of a slight reduction in performance.

Listing 2: myGetTime.c

asm long myGetTime()
   {
lp:    mfspr   r4,4           // RTCU
   mfspr   r3,5               // RTCL
   mfspr   r5,4               // RTCU again
    cmpw      r4,r5           // if RTCU has changed, try again
    bne      lp
    rlwinm   r3,r3,25,7,31    // shift right since bits 25-31 are
                              // not used
    blr                       // the result is in r3. 1 unit is
                              // worth 128 ns.
   }

To run the code, a very simple interface using the SIOUX library is provided in Listing 3.

Listing 3: main.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fp.h>

void main()
   {
   long double   num, num2;
   long startTime, endTime, time;
   short i;


   do {
   printf("%2s","> ");           // caret
   scanf("%Lf",&num);           // read long double
   if (num < 0.0) num = 0.0;     // replace by 0.0 if negative
   startTime = myGetTime();
   for (i = 0; i < 1000; i++)    // repeat 1000 times
   num2 = SQRoot(num);              // call our function
   endTime = myGetTime();
   time = endTime - startTime;
   if (num > 1e-6 && num < 1e7)
      printf("%7s%Lf\n","root = ",num2);   // show result
   else
      printf("%7s%Le\n","root = ",num2);
   printf("%7s%d\n","time = ", time);      // show elapsed time
   }
   while (1);                              // repeat until Quit
   }

References

PowerPC 601 RISC Microprocessor User's Manual, Motorola MPC601UM/AD Rev 1, 1993.


The first three authors are undergraduate students in Computer Science at Université Laval in Québec, Canada. This work was done as an assignment in a course on Computer Architecture given by the fourth author.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

BBEdit 11.1.1 - Powerful text and HTML e...
BBEdit is the leading professional HTML and text editor for the Mac. Specifically crafted in response to the needs of Web authors and software developers, this award-winning product provides a... Read more
CrossOver 14.1.3 - Run Windows apps on y...
CrossOver can get your Windows productivity applications and PC games up and running on your Mac quickly and easily. CrossOver runs the Windows software that you need on Mac at home, in the office,... Read more
Little Snitch 3.5.3 - Alerts you about o...
Little Snitch gives you control over your private outgoing data. Track background activity As soon as your computer connects to the Internet, applications often have permission to send any... Read more
OmniGraffle Pro 6.2.3 - Create diagrams,...
OmniGraffle Pro helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use... Read more
OmniFocus 2.2 - GTD task manager with iO...
OmniFocus helps you manage your tasks the way that you want, freeing you to focus your attention on the things that matter to you most. Capturing tasks and ideas is always a keyboard shortcut away in... Read more
Cocktail 8.4 - General maintenance and o...
Cocktail is a general purpose utility for OS X that lets you clean, repair and optimize your Mac. It is a powerful digital toolset that helps hundreds of thousands of Mac users around the world get... Read more
PDFKey Pro 4.3 - Edit and print password...
PDFKey Pro can unlock PDF documents protected for printing and copying when you've forgotten your password. It can now also protect your PDF files with a password to prevent unauthorized access and/... Read more
Kodi 15.0.beta1 - Powerful media center...
Kodi (was XBMC) is an award-winning free and open-source (GPL) software media player and entertainment hub that can be installed on Linux, OS X, Windows, iOS, and Android, featuring a 10-foot user... Read more
DiskCatalogMaker 6.4.12 - Catalog your d...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast. Finder-like intuitive look and feel. Super-fast search algorithm. Can compress catalog data... Read more
Macs Fan Control 1.3.0.0 - Monitor and c...
Macs Fan Control allows you to monitor and control almost any aspect of your computer's fans, with support for controlling fan speed, temperature sensors pane, menu-bar icon, and autostart with... Read more

MooVee - Your Movies Guru (Entertainmen...
MooVee - Your Movies Guru 1.0 Device: iOS iPhone Category: Entertainment Price: $1.99, Version: 1.0 (iTunes) Description: MooVee helps you effortlessly manage your movies, on your iPhone. | Read more »
Geometry Wars 3: Dimensions (Games)
Geometry Wars 3: Dimensions 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: Enjoy the next chapter in the award-winning Geometry Wars franchise and enjoy stunning, console-quality... | Read more »
CHAOS RINGS Ⅲ (Games)
CHAOS RINGS Ⅲ 1.0.0 Device: iOS Universal Category: Games Price: $19.99, Version: 1.0.0 (iTunes) Description: The newest addition to the popular smartphone RPG series is finally here! ・CHAOS RINGS Overview | Read more »
The Popular Insight Series of Travel Gui...
Getting around in a country when you can't understand the primary language can be tough. Fortunately there are several options available to help wold travellers with the important stuff like giving directions to a cab driver or asking where the... | Read more »
Desktop Dungeons is Now on the iPad Desp...
Desktop Dungeons has been a well-loved roguelike on PC for quite some time, and now it's finally available for the iPad! Just the iPad, though. Sorry iPhone users. [Read more] | Read more »
Moleskine Timepage – Calendar for iCloud...
Moleskine Timepage – Calendar for iCloud, Google & Exchange 1.0 Device: iOS iPhone Category: Productivity Price: $4.99, Version: 1.0 (iTunes) Description: The most elegant calendar for your pocket and wrist, Timepage is a... | Read more »
QuizUp Gets Social in its New Update
Plain Vanilla Corp has released a new and improved version of their popular trivia game, QuizUp. The app now emphasizes social play so you can challenge friends from all over the world. [Read more] | Read more »
The Deep (Games)
The Deep 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Swipe Controls Delve into the deep in this retro rogue-like! Swipe to move your diver around and keep away from the enemies as you... | Read more »
Sproggiwood (Games)
Sproggiwood 1.2.8 Device: iOS Universal Category: Games Price: $9.99, Version: 1.2.8 (iTunes) Description: Sproggiwood was developed for devices with at least 1GB of RAM. We recommend you only download Sproggiwood if your device... | Read more »
Battle of Gods: Ascension (Games)
Battle of Gods: Ascension 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: TURN-BASED TACTICAL COMBATFight tactical battles against the forces of Hades! In Battle of Gods: Ascension you play... | Read more »

Price Scanner via MacPrices.net

Apple refurbished 2014 13-inch Retina MacBook...
The Apple Store has Apple Certified Refurbished 2014 13″ Retina MacBook Pros available for up to $400 off original MSRP, starting at $979. An Apple one-year warranty is included with each model, and... Read more
What Would the ideal Apple Productivity Platf...
For the past four years I’ve kept a foot in both the Mac and iPad camps respectively. my daily computing hours divided about 50/50 between the two devices with remarkable consistency. However, there’... Read more
PageMeUp 1.2.1 Ten Dollar Page Layout Applica...
Paris, France-based Softobe, an OS X software development company, has announced that their PageMeUp v. 1.2.1, is available on the Mac App Store for $9.99. The license can be installed on up to 5... Read more
Eight New Products For USB Type-C Application...
Fresco Logic, specialists in advanced connectivity technologies and ICs, has introduced two new product families targeting the Type-C connector recently introduced across a number of consumer... Read more
Scripps National Spelling Bee Launches Buzzwo...
Scripps National Spelling Bee fans can monitor the action at the 2015 Spelling Bee with the new Buzzworthy app for iOS, Android and Windows mobile devices. The free Buzzworthy app provides friendly... Read more
13-inch 2.5GHz MacBook Pro on sale for $120 o...
B&H Photo has the 13″ 2.5GHz MacBook Pro on sale for $979 including free shipping plus NY sales tax only. Their price is $120 off MSRP, and it’s the lowest price for this model (except for Apple’... Read more
27-inch 3.3GHz 5K iMac on sale for $1899, $10...
B&H Photo has the new 27″ 3.3GHz 5K iMac on sale for $1899.99 including free shipping plus NY tax only. Their price is $100 off MSRP. Read more
Save up to $50 on iPad Air 2, NY tax only, fr...
B&H Photo has iPad Air 2s on sale for up to $50 off MSRP including free shipping plus NY sales tax only: - 16GB iPad Air 2 WiFi: $469 $30 off - 64GB iPad Air 2 WiFi: $549.99 $50 off - 128GB iPad... Read more
Updated Mac Price Trackers
We’ve updated our Mac Price Trackers with the latest information on prices, bundles, and availability on systems from Apple’s authorized internet/catalog resellers: - 15″ MacBook Pros - 13″ MacBook... Read more
New 13-inch 2.9GHz Retina MacBook Pro on sale...
B&H Photo has the 13″ 2.9GHz/512GB Retina MacBook Pro on sale for $1699.99 including free shipping plus NY tax only. Their price is $100 off MSRP, and it’s the lowest price for this model from... Read more

Jobs Board

*Apple* Solutions Consultant - Retail Sales...
**Job Summary** As an Apple Solutions Consultant (ASC) you are the link between our customers and our products. Your role is to drive the Apple business in a retail Read more
*Apple* Watch SW Application Project Manager...
**Job Summary** The Apple Watch software team is looking for an Application Engineering Project Manager to work on new projects for Apple . The successful candidate Read more
Engineering Manager for *Apple* Maps on the...
…the Maps App Team get to take part in just about any new feature in Apple Maps, often contributing a majority of the feature work. In our day-to-day engineering work, we Read more
Senior Software Engineer - *Apple* SIM - Ap...
Changing the world is all in a day039s work at Apple . If you love innovation, here039s your chance to make a career of it. You039ll work hard. But the job comes with Read more
Lead *Apple* Solutions Consultant - Retail...
**Job Summary** Job Summary The Lead ASC is an Apple employee who serves as the Apple business manager and influencer in a hyper-business critical Reseller's store Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.