TweetFollow Us on Twitter



Enhancing PowerPC Native Speed


[IMAGE 055-057_Balance_of_Power1.GIF]

When you convert your applications to native PowerPC code, they run lightning fast. To get the most out of RISC processors, however, you need to pay close attention to your code structure and execution. Fast code is no longer measured solely by an instruction timing table. The Power PC 601 processor includes pipelining, multi-issue and speculative execution, branch prediction, and a set associative cache. All these things make it hard to know what code will run fastest on a Power Macintosh.

Writing tight code for the PowerPC processor isn't hard, especially with a good optimizing compiler to help you. In this column I'll pass on some of what I've learned about tuning Power PC code. There are gotchas and coding habits to avoid, and there are techniques for squeezing the most from your speed-critical native code. For a good introduction to RISC pipelining and related concepts that appear in this column, see "Making the Leap to PowerPC" in Issue 16.

The power of RISC lies in the ability to execute one or more instructions every machine clock cycle, but RISC processors can do this only in the best of circumstances. At their worst they're as slow as CISC processors. The following loop, for example, averages only one calculation every 2.8 cycles:

float a[], b[], c[], d, e;
for (i=0; i < gArraySize; i++) {
  e = b[i] + c[i] / d;
  a[i] = MySubroutine(b[i], e);

By restructuring the code and using other techniques from this column, you can make significant improvements. This next loop generates the same result, yet averages one calculation every 1.9 cycles -- about 50% faster.

reciprocalD = 1 / d;
for (i=0; i < gArraySize; i+=2) {
  float result, localB, localC, localE;
  float result2, localB2, localC2, localE2;

  localB = b[i];
  localC = c[i];
  localB2 = b[i+1];
  localC2 = c[i+1];

  localE = localB + (localC * reciprocalD);
  localE2 = localB2 + (localC2 * reciprocalD);
  InlineSubroutine(&result, localB, localE);
  InlineSubroutine(&result2, localB2, localE2);

  a[i] = result;
  a[i+1] = result2;

The rest of this column explains the techniques I just used for that speed gain. They include expanding loops, scoping local variables, using inline routines, and using faster math operations.

Your compiler is your best friend, and you should try your hardest to understand its point of view. You should understand how it looks at your code and what assumptions and optimizations it's allowed to make. The more you empathize with your compiler, the more you'll recognize opportunities for optimization.

An optimizing compiler reorders instructions to improve speed. Executing your code line by line usually isn't optimal, because the processor stalls to wait for dependent instructions. The compiler tries to move instr uctions that are independent into the stall points. For example, consider this code:

first = input * numerator;
second = first / denominator;
output = second + adjustment;

Each line depends on the previous line's result, and the compiler will be hard pressed to keep the pipeline full of useful work. This simple example could cause 46 stalled cycles on the PowerPC 601, so the compiler will look at other nearby code for independent instructions to move into the stall points.

Loops are often your most speed-critical code, and you can improve their performance in several ways. Loop expanding is one of the simplest methods. The idea is to perform more than one independent operation in a loop, so that the compiler can reorder more work in the pipeline and thus prevent the processor from stalling.

For example, in this loop there's too little work to keep the processor busy:

float a[], b[], c[], d;
for (i=0; i < multipleOfThree; i++) {
  a[i] = b[i] + c[i] * d;

If we know the data always occurs in certain sized increments, we can do more steps in each iteration, as in the following:

for (i=0; i < multipleOfThree; i+=3) {
  a[i] = b[i] + c[i] * d;
  a[i+1] = b[i+1] + c[i+1] * d;
  a[i+2] = b[i+2] + c[i+2] * d;

On a CISC processor the second loop wouldn't be much faster, but on the Power PC processor the second loop is twice as fast as the first. This is because the compiler can schedule independent instructions to keep the pipeline constantly moving. (If the data doesn't occur in nice increments, you can still expand the loop; just add a small loop at the end to handle the extra iterations.)Be careful not to expand a loop too much, though. Very large loops won't fit in the cache, causing cache misses for each iteration. In addition, the larger a loop gets, the less work can be done entirely in registers. Expand too much and the compiler will have to use memory  to store intermediate results, outweighing your marginal gains. Besides, you get the biggest gains from the first few expansions.

If you're new to RISC, you'll be impressed by the number of registers available on the PowerPC chip -- 32 general registers and 32 floating-point registers. By having so many, the processor can often avoid slow memory operations. Your compiler will take advantage of this when it can, but you can help it by carefully scoping your variables and using lots of local variables.

The "scope" of a variable is the area of code in which it is valid. Your compiler examines the scope of each variable when it schedules registers, and your code can provide valuable information about the usage of each variable. Here's an example:

for (i=0; i < gArraySize; i++) {
  a[i] = MyFirstRoutine(b[i], c[i]);
  b[i] = MySecondRoutine(a[i], c[i]);

In this loop, the global variable gArraySize is scoped for the whole program. Because we call a subroutine in the loop, the compiler can't tell if gArraySize will change during each iteration. Since the subroutine might modify gArraySize, the compiler has to be conservative. It will reload gArraySize from memory on every iteration, and it won't optimize the loop any further. This is wastefully slow.

On the other hand, if we use a local  variable, we tell the compiler that gArraySize and c[i] won't be modified and that it's all right to just keep them handy in registers. In addition, we can store data as temporary variables scoped only within the loop. This tells the compiler how we intend to use the data, so that the compiler can use free registers and discard them after the loop. Here's what this would look like:

arraySize = gArraySize;
for (i=0; i < arraySize; i++) {
  float localC;
  localC = c[i];
  a[i] = MyFirstRoutine(b[i], localC);
  b[i] = MySecondRoutine(a[i], localC);

These minor changes give the compiler more information about the data, in this instance accelerating the resulting code by 25%.

Be wary of code that looks complicated. If each line of source code contains complicated dereferences and typecasting, chances are the object code has wasteful memory instructions and inefficient register usage. A great compiler might optimize well anyway, but don't count on it. Judicious use of temporary variables (as mentioned above) will help the compiler understand exactly what you're doing -- plus your code will be easier to read.

Excessive memory dereferencing is a problem exacerbated by the heavy use of handles on the Macintosh. Code often contains double memory dereferences, which is important when memory can move. But when you can guarantee that memory won't  move, use a local pointer, so that you only dereference a handle once. This saves load instructions and allows fur ther optimizations. Casting data types is usually a free operation -- you're just telling the compiler that you know you're copying seemingly incompatible data. But it's not  free if the data types have different bit sizes, which adds conversion instructions. Again, avoid this by using local variables for the commonly casted data.

I've heard many times that branches are "free" on the PowerPC processor. It's true that often the pipeline can keep moving even though a branch is encountered, because the branch execution unit will try to resolve branches very early in the pipeline or will predict the direction of the branch. Still, the more subroutines you have, the less your compiler will be able to reorder and intelligently schedule instructions. Keep speed-critical code together, so that more of it can be pipelined and the compiler can schedule your registers better. Use inline routines for short operations, as I did in the improved version of the first example loop in this column.

As with all processors, the PowerPC chip has performance tradeoffs you should know about. Some are processor model specific. For example, the PowerPC 601 has 32K of cache, while the 603 has 16K split evenly into an instruction cache and a data cache. But in general you should know about floating-point performance and the virtues of memory alignment.

Floating-point multiplication is wicked fast -- up to nine times  the speed of integer multiplication. Use floating-point multiplication if you can. Floating-point division takes 17 times as long, so when possible multiply by a reciprocal instead of dividing.

Memory accesses go fastest if addressed on 64-bit memory boundaries. Accesses to unaligned data stall while the processor loads different words and then shifts and splices them. For example, be sure to align floating-point data to 64-bit boundaries, or you'll stall for four cycles while the processor loads 32-bit halves with two 64-bit accesses.

Native PowerPC code runs really fast, so in many cases you don't need to worry about tweaking its performance at all. For your speed-critical code, though, these tips I've given you can make the difference between "too slow" and "fast enough."


  • High-Performance Computing  by Kevin Dowd (O'Reilly & Associates, Inc., 1993).
  • High-Performance Computer Architecture  by Harold S. Stone (Addison-Wesley, 1993).
  • PowerPC 601 RISC Microprocessor User's Manual (Motorola, 1993).

DAVE EVANS may be able to tune PowerPC code for Apple, but for the last year he's been repeatedly thwarted when tuning his 1978 Harley-Davidson XLCH motorcycle. Fixing engine stalls, poor timing, and rough starts proved difficult, but he was recently rewarded with the guttural purr of a well-tuned Harley. *

Code examples were compiled with the PPCC compiler using the speed optimization option, and then run on a Power Macintosh 6100/66 for profiling. A PowerPC 601 microsecond timing library is provided on this issue's CD. *


Community Search:
MacTech Search:

Software Updates via MacUpdate

Civilization VI 1.1.0 - Next iteration o...
Sid Meier’s Civilization VI is the next entry in the popular Civilization franchise. Originally created by legendary game designer Sid Meier, Civilization is a strategy game in which you attempt to... Read more
Network Radar 2.3.3 - $17.99
Network Radar is an advanced network scanning and managing tool. Featuring an easy-to-use and streamlined design, the all-new Network Radar 2 has been engineered from the ground up as a modern Mac... Read more
Quicken 5.5.6 - Complete personal financ...
Quicken makes managing your money easier than ever. Whether paying bills, upgrading from Windows, enjoying more reliable downloads, or getting expert product help, Quicken's new and improved features... Read more
Civilization VI 1.1.0 - Next iteration o...
Sid Meier’s Civilization VI is the next entry in the popular Civilization franchise. Originally created by legendary game designer Sid Meier, Civilization is a strategy game in which you attempt to... Read more
Network Radar 2.3.3 - $17.99
Network Radar is an advanced network scanning and managing tool. Featuring an easy-to-use and streamlined design, the all-new Network Radar 2 has been engineered from the ground up as a modern Mac... Read more
Printopia 3.0.8 - Share Mac printers wit...
Run Printopia on your Mac to share its printers to any capable iPhone, iPad, or iPod Touch. Printopia will also add virtual printers, allowing you to save print-outs to your Mac and send to apps.... Read more
ForkLift 3.2.1 - Powerful file manager:...
ForkLift is a powerful file manager and ferociously fast FTP client clothed in a clean and versatile UI that offers the combination of absolute simplicity and raw power expected from a well-executed... Read more
BetterTouchTool 2.417 - Customize multi-...
BetterTouchTool adds many new, fully customizable gestures to the Magic Mouse, Multi-Touch MacBook trackpad, and Magic Trackpad. These gestures are customizable: Magic Mouse: Pinch in / out (zoom... Read more
Little Snitch 4.0.6 - Alerts you about o...
Little Snitch gives you control over your private outgoing data. Track background activity As soon as your computer connects to the Internet, applications often have permission to send any... Read more
Google Chrome 65.0.3325.181 - Modern and...
Google Chrome is a Web browser by Google, created to be a modern platform for Web pages and applications. It utilizes very fast loading of Web pages and has a V8 engine, which is a custom built... Read more

Latest Forum Discussions

See All

Construction Simulator 2 reaches its fir...
Construction Simulator 2 debuted iOS and Android devices exactly one year ago, and publisher Astragon is marking the game’s first anniversary with a range of time-limited discounts. It’s been a successful debut for the civil engineering sim, which... | Read more »
All the best games on sale for iPhone an...
This week's list of games on sale for the iPhone and iPad isn't too bad really. There's some gems on here, as well as some games that have had their prices cut low enough that you can look past the rough edges and questionable decisions. [Read... | Read more »
The best games that came out for iPhone...
It's not a huge surprise that there's not a massive influx of new, must-buy games on the App Store this week. After all, GDC is happening, so everyone's busy at parties and networking and dying from a sinister form of jetlag. That said, there are... | Read more »
Destiny meets its mobile match - Everyth...
Shadowgun Legends is the latest game in the Shadowgun series, and it's taking the franchise in some interesting new directions. Which is good news. The even better news is that it's coming out tomorrow, so if you didn't make it into the beta you... | Read more »
How PUBG, Fortnite, and the battle royal...
The history of the battle royale genre isn't a long one. While the nascent parts of the experience have existed ever since players first started killing one another online, it's really only in the past six years that the genre has coalesced into... | Read more »
Around the Empire: What have you missed...
Oh hi nice reader, and thanks for popping in to check out our weekly round-up of all the stuff that you might have missed across the Steel Media network. Yeah, that's right, it's a big ol' network. Obviously 148Apps is the best, but there are some... | Read more »
All the best games on sale for iPhone an...
It might not have been the greatest week for new releases on the App Store, but don't let that get you down, because there are some truly incredible games on sale for iPhone and iPad right now. Seriously, you could buy anything on this list and I... | Read more »
Everything You Need to Know About The Fo...
In just over a week, Epic Games has made a flurry of announcements. First, they revealed that Fortnite—their ultra-popular PUBG competitor—is coming to mobile. This was followed by brief sign-up period for interested beta testers before sending out... | Read more »
The best games that came out for iPhone...
It's not been the best week for games on the App Store. There are a few decent ones here and there, but nothing that's really going to make you throw down what you're doing and run to the nearest WiFi hotspot in order to download it. That's not to... | Read more »
Death Coming (Games)
Death Coming Device: iOS Universal Category: Games Price: $1.99, Version: (iTunes) Description: --- Background Story ---You Died. Pure and simple, but death was not the end. You have become an agent of Death: a... | Read more »

Price Scanner via

Thursday roundup of the best 13″ MacBook Pro...
B&H Photo has new 2017 13″ MacBook Pros on sale for up to $200 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only. Their prices are the lowest available for... Read more
Sale: 9.7-inch 2017 WiFi iPads starting at $2...
B&H Photo has 9.7″ 2017 WiFi Apple iPads on sale for $40 off MSRP for a limited time. Shipping is free, and pay sales tax in NY & NJ only: – 32GB iPad WiFi: $289, $40 off – 128GB iPad WiFi: $... Read more
Roundup of Certified Refurbished iPads, iPad...
Apple has Certified Refurbished 9.7″ WiFi iPads available for $50-$80 off the cost of new models. An Apple one-year warranty is included with each iPad, and shipping is free: – 9″ 32GB WiFi iPad: $... Read more
Back in stock! Apple’s full line of Certified...
Save $300-$300 on the purchase of a 2017 13″ MacBook Pro today with Certified Refurbished models at Apple. Apple’s refurbished prices are the lowest available for each model from any reseller. A... Read more
Wednesday deals: Huge sale on Apple 15″ MacBo...
Adorama has new 2017 15″ MacBook Pros on sale for $250-$300 off MSRP. Shipping is free, and Adorama charges sales tax in NJ and NY only: – 15″ 2.8GHz Touch Bar MacBook Pro Space Gray (MPTR2LL/A): $... Read more
Apple offers Certified Refurbished Series 3 A...
Apple has Certified Refurbished Series 3 Apple Watch GPS models available for $50, or 13%, off the cost of new models. Apple’s standard 1-year warranty is included, and shipping is free. Numerous... Read more
12″ 1.2GHz Space Gray MacBook on sale for $11...
B&H Photo has the Space Gray 12″ 1.2GHz MacBook on sale for $100 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 12″ 1.2GHz Space Gray MacBook: $1199 $... Read more
Mac minis available for up to $150 off MSRP w...
Apple has restocked Certified Refurbished Mac minis starting at $419. Apple’s one-year warranty is included with each mini, and shipping is free: – 1.4GHz Mac mini: $419 $80 off MSRP – 2.6GHz Mac... Read more
Back in stock: 13-inch 2.5GHz MacBook Pro (Ce...
Apple has Certified Refurbished 13″ 2.5GHz MacBook Pros (MD101LL/A) available for $829, or $270 off original MSRP. Apple’s one-year warranty is standard, and shipping is free: – 13″ 2.5GHz MacBook... Read more
Apple restocks Certified Refurbished 2017 13″...
Apple has Certified Refurbished 2017 13″ MacBook Airs available starting at $849. An Apple one-year warranty is included with each MacBook, and shipping is free: – 13″ 1.8GHz/8GB/128GB MacBook Air (... Read more

Jobs Board

Payments Counsel - *Apple* Pay (payments, c...
# Payments Counsel - Apple Pay (payments, credit/debit) Job Number: 112941729 Santa Clara Valley, California, United States Posted: 26-Feb-2018 Weekly Hours: 40.00 Read more
Firmware Engineer - *Apple* Accessories - A...
# Firmware Engineer - Apple Accessories Job Number: 113452350 Santa Clara Valley, California, United States Posted: 28-Feb-2018 Weekly Hours: 40.00 **Job Summary** Read more
*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 113501424 Norman, Oklahoma, United States Posted: 15-Feb-2018 Weekly Hours: 40.00 **Job Summary** Are you passionate about Read more
*Apple* Inc. Is Look For *Apple* Genius Te...
Apple Inc. Is Look For Apple Genius Technical Customer Service Minneapolis Mn In Minneapolis - Apple , Inc. Apple Genius Technical Customer Service Read more
*Apple* Genius Technical Customer Service Co...
Apple Genius Technical Customer Service Columbus Oh Apple Inc. - Apple , Inc. Apple Genius Technical Customer Service Columbus Oh - Apple , Inc. Job Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.