Fast Square Root Calc

Volume Number: 14 (1998)
Issue Number: 1
Column Tag: Assembler Workshop

# Fast Square Root Calculation

by Guillaume Bédard, Frédéric Leblanc, Yohan Plourde and Pierre Marchand, Québec, Canada

## Introduction

The calculation of the square root of a floating-point number is a frequently encountered task. However, the PowerPC processors don't have a square root instruction. The implementation presented here performs the square root of a double-precision number over the full range of representation of the IEEE 754 standard for normalized numbers (from 2.22507385851E-308 to 1.79769313486E308) with an accuracy of 15 or more decimal digits. It is very fast, at least six times faster than the Toolbox ROM call.

## Theory of Operation

A floating point number has three components: a sign, a mantissa and an exponential part. For example, the number +3.5 x 10^4 (35 000) has a plus sign, a mantissa of 3.5 and an exponential part of 104. The mantissa consists of an integer part and a fractional part f.

A double precision number in IEEE 754 format has the same components: a sign bit s, an 11-bit exponent e and a 52 bit fraction f. The exponential part is expressed in powers of 2 and the exponent is biased by adding 1023 to the value of e. The mantissa is normalized to be of the form 1.f. Since the integer part of the normalized mantissa is always 1, it doesn't have to be included in the representation. The number is thus represented as follows: (-1)s x 1.f x 2^e+1023.

For example, the number 5.0 can be expressed in binary as 101.0, which means 101.0 x 2^0, which in turn is equal to 1.010 x 2^2, obtained by dividing the mantissa by 4 and multiplying 2^0 by 4. Therefore, the normalized mantissa is 1.010 and the exponent 2. Fraction f is then .01000000.... The biased exponent is obtained by adding 1023 to e and is 1025, or 10000000001 in binary. The double precision IEEE representation of 5.0 is finally:

```-----------------------------------------------------------------------------
|  0 | 10000000001 | 01000000000000000000000000000000000000000000000000000  |
-----------------------------------------------------------------------------
```

or 4014000000000000 in hexadecimal notation for short.

## First Approximation

Given this representation, a first approximation to the square root of a number is obtained by dividing the exponent by 2. If the number is an even power of 2 such as 16 or 64, the exact root is obtained. If the number is an odd power of 2 such as 8 or 32, 1/SQRT(2) times the square root is obtained. In general, the result will be within a factor SQRT(2) of the true value.

## Refining the Approximation

The Newton-Raphson method is often used to obtain a more accurate value for the root x of a function f(x) once an initial approximation x0 is given:

[1]

This becomes, in the case of the square root of n, = x2 - n:where f(x)

[2]

An excellent approximation to the square root starting with the initial approximation given above is obtained within 5 iterations using equation [2]. This algorithm is already pretty fast, but its speed is limited by the fact that each iteration requires a double-precision division which is the slowest PowerPC floating-point instruction with 32 cycles on the MPC601 (Motorola, 1993).

## Eliminating Divisions

Another approach is to use equation [1] with the function.

In this case, equation [1] becomes:

[3]

There is still a division by n, but since n is constant (it's the original number whose root we want to find), it can be replaced by multiplying by 1/n, which can be calculated once before the beginning of the iteration process. The five 32-cycle divisions are thus replaced by this single division followed by 5 much faster multiplications (5 cycles each). This approach is approximately three times faster than the preceding one. However, care must be taken for large numbers since the term in x02 can cause the operation to overflow.

## Use of a table

Finally, an approach that is even faster consists in using a table to obtain a more accurate first approximation. In order to do so, the range of possible values of fraction f (0 to ~1) is divided into 16 sub-ranges by using the first 4 bits of f as an index into a table which contains the first two coefficients of the Taylor expansion of the square root of the mantissa (1.0 to ~2) over that sub-range.

The Taylor expansion is given in general by:

[4]

the first two terms of which yield, in the case where f(x) = SQRT(x):

[5]

The square root of x is thus approximated by 16 straight-line segments. The table therefore contains the values of

A =

and B =

for each of the 16 sub-ranges as shown in Figure 1. This first approximation gives an accuracy of about 1.5 %.

Figure 1. Approximation by straight line segment.

To reach the desired accuracy of 15 digits, equation [2] is applied twice to the result of equation [5]. To avoid having to perform two divisions by repeating the iteration, the two iterations are folded together as follows, which contains only one division:

and

[6]

In order to perform these calculation, the exponent of x and n is reduced to -1 (1022 biased), so that floating-point operations apply only to the values of the mantissa and don't overflow if the exponent is very large. The value of these numbers will therefore be in the range 0.5 to 1.0 since the mantissa is in the range 1.0 to 2.0. If the original exponent was odd, the mantissa is multiplied by SQRT(2) before applying equation [6].

Finally, the original exponent divided by two is restored at the end.

## The Code

The SQRoot function shown in Listing 1 has been implemented in CodeWarrior C/C++ version 10.

Listing 1: SQRoot.c

```// On entry, fp1 contains a positive number between 2.22507385851E-308
// and 1.79769313486E308. On exit, the result is in fp1.

asm long double SQRoot(long double num);   // prototype

float Table[35] = {
0.353553390593, 0.707106781187, 0.364434493428, 0.685994340570,
0.375000000000, 0.666666666667, 0.385275875186, 0.648885684523,
0.395284707521, 0.632455532034, 0.405046293650, 0.617213399848,
0.414578098794, 0.603022689156, 0.423895623945, 0.589767824620,
0.433012701892, 0.577350269190, 0.441941738242, 0.565685424949,
0.450693909433, 0.554700196225, 0.459279326772, 0.544331053952,
0.467707173347, 0.534522483825, 0.475985819116, 0.525225731439,
0.484122918276, 0.516397779494, 0.492125492126, 0.508000508001,
1.414213562373, 0.000000000000, 0.000000000000 };

asm long double Sqrt(long double num) {

lwz   r3,Table(rtoc)      // address of Table[]
// Sign(1)+Exponent(11)+Mantissa(4)
andi.   r5,r4,0xF         // keep only Mantissa(4)
ori   r5,r5,0x3FE0        // exponent = -1+BIAS = 1022
sth   r5,24(sp)           // save reduced number

rlwinm   r5,r5,3,25,28    // take 8*Mantissa(4) as index
lfd   fp1,24(sp)          // load reduced number
lfsux   fp4,r5,r3         // load coefficient A
lfs   fp5,4(r5)           // load coefficient B
fmr   fp2,fp1             // copy reduced number
rlwinm.   r5,r4,31,18,28  // divide exponent by 2
beq   @@2                 // if (exponent == 0) then done

fmadd   fp2,fp2,fp5,fp4   // approximation SQRT(x) = A + B*x
andi.   r4,r4,0x10        // check if exponent even
beq   @@1                 // if (exponent even) do iteration
fmul   fp2,fp2,fp3        // multiply reduced number by SQRT(2)

fmul   fp5,fp2,fp1        // x*n
fmadd   fp4,fp2,fp2,fp1   // x*x + n
fmul   fp5,fp3,fp5        // 4*x*x*n
fmul   fp6,fp2,fp4        // denominator = x*(x*x + n)
fmadd   fp5,fp4,fp4,fp5   // numerator = (x*x + n)*(x*x + n) +
// 4*x*x*n
fdiv   fp1,fp5,fp6        // double precision division
addi   r5,r5,0x1FE0       // rectify new exponent

@@2:   sth   r5,132(r3)      // save constant C (power of 2)
lfd   fp2,132(r3)         // load constant C
fmul   fp1,fp1,fp2        // multiply by C to replace exponent
blr                       // done, the result is in fp1
}
```

## Performance

The code presented above runs in less than 100 cycles, which means less than 1 microsecond on a 7200/75 Power Macintosh and is more than six times faster than the ROM code. The code could be modified to make use of the floating reciprocal square root estimate instruction (frsqrte) that is available on the MPC603 and MPC604 processors, and which has an accuracy of 5 bits. It is not available on the MPC601, however. The method used here could also be used to evaluate other transcendental functions.

Performance was measured by running the code a thousand times and calling a simple timing routine found in (Motorola, 1993), that we called myGetTime(). It uses the real-time clock of the MPC 601 processor (RTCU and RTCL registers) and is shown in Listing 2. The routine would have to be modified to run on MPC603 or MPC604 processors, since they don't have the same real-time clock mechanism.

The code doesn't support denormalized numbers (below 2.22507385851E-308). This could easily be implemented albeit at the cost of a slight reduction in performance.

Listing 2: myGetTime.c

```asm long myGetTime()
{
lp:    mfspr   r4,4           // RTCU
mfspr   r3,5               // RTCL
mfspr   r5,4               // RTCU again
cmpw      r4,r5           // if RTCU has changed, try again
bne      lp
rlwinm   r3,r3,25,7,31    // shift right since bits 25-31 are
// not used
blr                       // the result is in r3. 1 unit is
// worth 128 ns.
}
```

To run the code, a very simple interface using the SIOUX library is provided in Listing 3.

Listing 3: main.c

```#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fp.h>

void main()
{
long double   num, num2;
long startTime, endTime, time;
short i;

do {
printf("%2s","> ");           // caret
if (num < 0.0) num = 0.0;     // replace by 0.0 if negative
startTime = myGetTime();
for (i = 0; i < 1000; i++)    // repeat 1000 times
num2 = SQRoot(num);              // call our function
endTime = myGetTime();
time = endTime - startTime;
if (num > 1e-6 && num < 1e7)
printf("%7s%Lf\n","root = ",num2);   // show result
else
printf("%7s%Le\n","root = ",num2);
printf("%7s%d\n","time = ", time);      // show elapsed time
}
while (1);                              // repeat until Quit
}
```

## References

PowerPC 601 RISC Microprocessor User's Manual, Motorola MPC601UM/AD Rev 1, 1993.

The first three authors are undergraduate students in Computer Science at Université Laval in Québec, Canada. This work was done as an assignment in a course on Computer Architecture given by the fourth author.

Community Search:
MacTech Search:

Chromium 45.0.2454.85 - Fast and stable...
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all Internet users to experience the web. Version 45.0.2454.85: Note: Does not contain the "... Read more
OmniFocus 2.2.5 - GTD task manager with...
OmniFocus helps you manage your tasks the way that you want, freeing you to focus your attention on the things that matter to you most. Capturing tasks and ideas is always a keyboard shortcut away in... Read more
iFFmpeg 5.7.1 - Convert multimedia files...
iFFmpeg is a graphical front-end for FFmpeg, a command-line tool used to convert multimedia files between formats. The command line instructions can be very hard to master/understand, so iFFmpeg does... Read more
VOX 2.6 - Music player that supports man...
VOX is a beautiful music player that supports many filetypes. The beauty is in its simplicity, yet behind the minimal exterior lies a powerful music player with a ton of features and support for all... Read more
Box Sync 4.0.6567 - Online synchronizati...
Box Sync gives you a hard-drive in the Cloud for online storage. Note: You must first sign up to use Box. What if the files you need are on your laptop -- but you're on the road with your iPhone? No... Read more
Carbon Copy Cloner 4.1.4 - Easy-to-use b...
Carbon Copy Cloner backups are better than ordinary backups. Suppose the unthinkable happens while you're under deadline to finish a project: your Mac is unresponsive and all you hear is an ominous,... Read more
OmniGraffle Pro 6.3.1 - Create diagrams,...
OmniGraffle Pro helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use... Read more
Monosnap 3.1.2 - Versatile screenshot ut...
Monosnap lets you capture screenshots, share files, and record video and .gifs! Capture: Capture full screen, just part of the screen, or a selected window Make your crop area pixel perfect with... Read more
Alfred 2.7.2 - Quick launcher for apps a...
Alfred is an award-winning productivity application for OS X. Alfred saves you time when you search for files online or on your Mac. Be more productive with hotkeys, keywords, and file actions at... Read more
Microsoft Remote Desktop 8.0.19 - Connec...
With Microsoft Remote Desktop, you can connect to a remote PC and your work resources from almost anywhere. Experience the power of Windows with RemoteFX in a Remote Desktop client designed to help... Read more

## Latest Forum Discussions

Goat Simulator MMO Simulator (Games)
Goat Simulator MMO Simulator 1.0 Device: iOS Universal Category: Games Price: \$4.99, Version: 1.0 (iTunes) Description: ** IMPORTANT - SUPPORTED DEVICESiPhone 4S, iPad 2, iPod Touch 5 or better.** Coffee Stain Studios brings next-gen... | Read more »
Worms™ 4 (Games)
Worms™ 4 1.02 Device: iOS Universal Category: Games Price: \$4.99, Version: 1.02 (iTunes) Description: The latest instalment in the worldwide mega hit franchise! Coming soon to iPhone, iPad and iPod touch. When the guys and girls at... | Read more »
The Deer God (Games)
The Deer God 1.0 Device: iOS Universal Category: Games Price: \$6.99, Version: 1.0 (iTunes) Description: 30% off launch sale!!! “It can be a struggle, but it's all worth it when you're shooting fire out your antlers.” Kotaku “The... | Read more »
AppSpy's Patreon campaign kicks off
Occasionally you'll see us use AppSpy's videos here on 148Apps to support an article we've written. That's because we're part of Steel Media, and AppSpy is Steel's video arm, so we're all part of one happy family. [Read more] | Read more »
We're Sorry to Report that Moonrise...
Moonrise is a very promising-looking, Pokemon-esque monster collecting and battling game that we were really looking forward to reviewing, but unfortunately it looks like that's never going to happen. [Read more] | Read more »
The Latest Update for The Sims FreePlay...
Commerce has gotten a little more active with the newest update for The Sims FreePlay, making Sunset Mall more of a hangout than ever before. [Read more] | Read more »
This Week at 148Apps: August 24-28, 2015
The Apps of August With 148Apps How do you know what apps are worth your time and money? Just look to the review team at 148Apps. We sort through the chaos and find the apps you're looking for. The ones we love become Editor’s Choice, standing out... | Read more »
NASCAR in Real Racing 3? Sure, Why Not?
I have to give Firemonkeys credit - it's very cool of them to add NASCAR to Real Racing 3 via an update rather than making a separate game for it. But that's a different discussion for another time; for now let's sit back and enjoy driving in... | Read more »
The nuyu is an Inexpensive Activity Moni...
Today, Health o Meter nuyu has announced a series of health and fitness-related products, including the aforementioned activity monitor along with a wireless scale. All at a decent pricepoint, no less. [Read more] | Read more »
The Makers of Overkill are Trying Someth...
Craneballs, the studio responsible for the Overkill series, is taking a little break from all that violence (a little break) to bring us Cube Worm - a 3D take on one of the most classic PC/calculator games in existence. [Read more] | Read more »

## Price Scanner via MacPrices.net

It looks like we may not have to wait much longer to see what finally materializes as a new, larger-panel iPad (Pro/Plus?) Usually reliable Apple product prognosticator KGI Securities analyst Ming-... Read more
eFileCabinet Announces SMB Document Managemen...
Electronic document management (EDM) eFileCabinet, Inc., a hosted solutions provider for small to medium businesses, has announced that its SecureDrawer and eFileCabinet Online services will be... Read more
San Francisco’s WaterField Designs today unveiled their all-leather Cozmo 2.0 — an elegant attach laptop bag with carefully-designed features to suit any business environment. The Cozmo 2.0 is... Read more
Apple’s 2015 Back to School promotion: Free B...
Purchase a new Mac or iPad at The Apple Store for Education and take up to \$300 off MSRP. All teachers, students, and staff of any educational institution qualify for the discount. Shipping is free,... Read more
128GB MacBook Airs on sale for \$100 off MSRP,...
B&H Photo has 11″ & 13″ MacBook Airs with 128GB SSDs on sale for \$100 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 11″ 1.6GHz/128GB MacBook Air: \$799.99, \$100 off MSRP... Read more
13-inch 2.5GHz MacBook Pro (refurbished) avai...
The Apple Store has Apple Certified Refurbished 13″ 2.5GHz MacBook Pros available for \$829, or \$270 off the cost of new models. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.... Read more
27-inch 3.2GHz iMac on sale for \$1679, save \$...
B&H Photo has the 27″ 3.2GHz iMac on sale for \$1679.99 including free shipping plus NY sales tax only. Their price is \$120 off MSRP. Read more
Apple and Cisco Partner to Deliver Fast-Lane...
Apple and Cisco have announced a partnership to create a “fast lane” for iOS business users by optimizing Cisco networks for iOS devices and apps. The alliance integrates iPhone with Cisco enterprise... Read more
Apple offering refurbished 2015 13-inch Retin...
The Apple Store is offering Apple Certified Refurbished 2015 13″ Retina MacBook Pros for up to \$270 (15%) off the cost of new models. An Apple one-year warranty is included with each model, and... Read more
Apple refurbished 2015 MacBook Airs available...
The Apple Store has Apple Certified Refurbished 2015 11″ and 13″ MacBook Airs (the latest models), available for up to \$180 off the cost of new models. An Apple one-year warranty is included with... Read more

## Jobs Board

Simply Mac *Apple* Specialist- Repair Techn...
Simply Mac is the greatest premier retailer of Apple products expertise in North America. We're looking for dedicated individuals to provide personalized service and Read more
Simply Mac *Apple* Specialist- Service Repa...
Simply Mac is the greatest premier retailer of Apple products expertise in North America. We're looking for dedicated individuals to provide personalized service and Read more
*Apple* Desktop Analyst - KDS Staffing (Unit...
…field and consistent professional recruiting achievement. Job Description: Title: Apple Desktop AnalystPosition Type: Full-time PermanentLocation: White Plains, NYHot Read more
Simply Mac- *Apple* Specialist- Store Manag...
Simply Mac is the largest premier retailer for Apple products and solutions. We're looking for dedicated individuals with a passion to simplify and enhance the Read more
*Apple* Evangelist - JAMF Software (United S...
The Apple Evangelist is responsible for building and cultivating strategic relationships with Apple 's small and mid-market business development field teams. This Read more