TweetFollow Us on Twitter

December 94 - Balance of Power: PowerPC Branch Prediction

Balance of Power: PowerPC Branch Prediction

Dave Evans

The PowerPC processors try to predict which way your code will execute. This sounds surprisingly astrological for a digital machine, but it becomes very useful for a pipelined processor and will often speed up your code. In this column I'll go over why and how this works, focusing especially on the new PowerPC 604 processor prediction techniques, and I'll answer the question "Can a Power Macintosh really tell the future?"

PSYCHIC DECISIONS

Typically about one-seventh of the instructions in your code are branches, either to call subroutines or to make logical decisions in your program. The PowerPC processor would ordinarily tend to stall at branches, since it tries to work on more than one instruction at a time and it's not always sure which code it should execute after a branch. It could either take the branch or fall through, and often the processor won't know which until a couple of cycles later.

So the PowerPC processors allow for speculative execution, meaning they'll guess at the most probable direction the branch will go and then will issue those instructions. But the processor doesn't let the instructions commit until it's sure the guess was correct. Usually it guesses right, and a few instructions are already completed when the branch is decided. If the guess was wrong, it throws out those results and starts over with the correct code.

This predictive skill helps keep the processor executing successfully without stalls, and better prediction techniques will yield better overall performance. The new PowerPC 604 processor improves on earlier prediction techniques; I'll discuss all of them in detail below.

But first, a relevant astrological note: The "birthday" of the 601 makes it a Taurus, whereas then 603 is a Libra. The 604 chip had a birthday in April, so it's an Aries.

TAURUS AND LIBRA ARE COMPATIBLE

The PowerPC 601 and 603 processors use basically the same techniques to predict branches. For simple unconditional branches, for example, they both process and remove the branch early in the instruction issue stage. This operation, called branch folding, keeps the instruction stream moving without having to wait for the branch to be processed. The branch is handled early, and the new instructions are fetched from the cache immediately.

For conditional branches, both processors first try to handle the branch early in the instruction issue stage. If the condition being tested has already been evaluated, the branch is folded out of the instruction stream. But if the condition being tested is still in the pipeline, the processor must guess at the branch direction.

Prediction of guessed branches are based on two things: the direction of the branch and a software "hint" bit. If the direction is negative -- backward in your code -- the branch is taken (because loops often iterate a few times backward before falling through, and this heuristic is more often true). All other branches fall through by default. The hint bit is a way for the compiler to reverse this heuristic: if the bit is set, the prediction will be reversed.

As far as I know there are no compilers that allow you to specify the hint bit in your code, although this could be a valuable feature. Also, profilers or similar tools could take statistics on your code flow and then set the bits for you from trial runs of your software.

THE TEMPERAMENT OF ARIES

The PowerPC 604 has much better branch prediction, which means better performance. Because branch statements most often repeat themselves, it remembers recent branch results to make its predictions:
  • It has a cache of the last 64 branches that it has taken, and any time it sees one of these branches again it will immediately predict to the same branch destination. This technique, called dynamic branch prediction, is used on the Pentium and other processors with great results.

  • It keeps a history of all other branches and predicts based on the recent directions that branch took.
The cache technique has the advantage of being very fast. When the 604 fetches an instruction, it also sends the instruction's address to the branch cache. If the instruction is a recently executed branch, the cache will return the address of where the branch last went. This is immediately used to fetch the next instruction. Because this all occurs during the fetch of the branch instruction itself, there's no delay in fetching the first predicted instruction.

For conditional branches that aren't in the branch cache, the 604 keeps a history of recent times it saw that instruction. It keeps 512 such histories, each two bits wide, to remember whether the branch was taken during the last few executions. The processor hashes the instruction address to keep the branch histories distinct, and hash collisions are very rare.

Each history is set to one of four states: strongly taken, taken, not taken, and strongly not taken. The current state determines the branch prediction as taken or not taken. After the branch commits, the state is updated. Each update adjusts the state one step toward strongly taken or strongly not taken. The two intermediate steps are a hedge so that it will usually take two mistakes before a prediction changes. Because branches tend to repeat, this algorithm generally results in the following prediction:

  • If the branch was taken during the last two executions, the 604 predicts it will again be taken.

  • If the branch wasn't taken during both of the last two executions, the 604 predicts it again won't be taken.
Also with the 604, branches on the count register base their prediction on the current count value. This will usually predict loops correctly and yield good performance, since loops count down for a number of iterations before the final iteration causes an incorrect prediction.

But these techniques also come with a tradeoff: the 604 has an extra pipeline stage to dispatch instructions. This means instructions take longer to get through the pipe, and mispredicted branches are more expensive.

ARIES RISING

The 604 is the fastest PowerPC processor yet, and I can't talk about it here without also going into why it's such a fast engine. Besides its advanced branch prediction hardware, it has significantly more integer and floating-point hardware, which yields improved overall performance. Given that it's produced with a more advanced silicon process than the original 601, it's clocked above 80 MHz and offers blazingly fast computation for your code.

As a backbone for the chip, the instruction issuing and control logic allow the 604 to issue up to four instructions per clock, compared to the 601's and 603's effective three. As mentioned above, however, its pipeline has one extra decode stage and branches are issued and handled in their own branch unit. To help it speculatively execute more instructions than the other chips, it also comes with twice the number of "rename" registers than the 603. Twelve extra general-purpose and eight extra floating-point registers are available to hold speculatively produced results until a branch commits. The 604 is also the first PowerPC processor that can speculatively execute two branches at once. This, combined with advanced branch prediction, should keep the processor screaming even through complex code flow.

What most people will notice, however, is the additional integer math performance on the 604. At any one time, the 604 can have two add-subtract instructions and one multiply-divide instruction completing in a cycle. IBM says that it therefore has three integer units, but the multiply-divide hardware is also used for logical and bit manipulation operations. The bottom line is much better integer performance than the Power Macintosh 8100/80. As an example of this, the following code should execute nearly twice as fast on the 604 than on the 601:

do {
   unsigned long   datapoint;
   datapoint = *(dataarray + datasize);
   if (datapoint > kThreshold) {
      if (datapoint > kMaxLong - accumulate)
         MyOverflowError();
      accumulate += datapoint;
      samplecount += 1;
      }
   } while (datasize--);
Looking at this code, we see a few integer operations that will be dual-issued on the 604. As long as the datapoint values aren't too erratic, the 604 will better predict the first if statement's branch: it will assume that the current datapoint is on the same side of the threshold as on the previous iteration, which in fact is where it will tend to be. And the second if statement, which checks for an overflow, will (barring an exception) get predicted correctly out of the loop. The 601 or 603 may predict it incorrectly. So even though one integer unit will be busy doing the math, the overflow checking will effectively occur without stalling the pipeline.

The floating-point hardware was also supercharged. On the 601 and 603 processors, a single-precision floating-point instruction can issue and complete each cycle, but double-precision numbers take twice as long. The 604 allows one full double-precision multiply-add instruction to be issued and one to complete each cycle. The chip is twice as fast as the 601 and 603 for these double-precision calculations.

THE FUTURE IS IN THE STARS

So can Power Macintosh tell your future? It certainly tries to with the prediction techniques described above, and in doing so yields better performance. With the simple methods of the 601 and 603, or the dynamic prediction of the 604, your Power Macintosh will speculatively execute your code with seemingly psychic results.

What about the future of the Power Macintosh? The PowerPC architecture allows excellent growth. When I saw the specifications for the first processor, the 601, I was very impressed. It's an excellent design and it has proven to be a potent engine for the Macintosh. When I saw the specifications for the follow-on chips, however, I was really blown away. The 603 and 604 offer incredible performance for the price, and prove that the PowerPC architecture scales well both into low-cost/low-energy solutions and to the cutting edge in performance. And the technology applied to the 604 can be expanded in future chips, adding more execution units and advanced caches at higher clock speeds. The latest IBM POWER2 processors can issue two load/store, two logic/branch, two floating-point, and two integer instructions per cycle. These processors point to the future of PowerPC performance.

So without any additional tuning on your part, PowerPC will continue to improve your performance in the future. I also feel compelled to reiterate this advice from my previous columns: tune your critical code. Tuning often trades performance for code readability and maintainability, so carefully choose which code to tune and use code profilers (and the stars?) to guide your way.

DAVE EVANS (Aquarius, January 20-February 18) Look for opportunities to communicate. You are bound to have fun. Love is in the air; don't work too much or you'll miss it. Apple continues to hold promise for you. Compatible with Sagittarius.

Thanks to Phil Sohn, Peter Steinauer, and Eric Traut for reviewing this column.

This page was last modified on Sunday, April 06 1997 04:24
 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Hopper Disassembler 4.3.16- - Binary dis...
Hopper Disassembler is a binary disassembler, decompiler, and debugger for 32- and 64-bit executables. It will let you disassemble any binary you want, and provide you all the information about its... Read more
Default Folder X 5.2.2 - Enhances Open a...
Default Folder X attaches a toolbar to the right side of the Open and Save dialogs in any OS X-native application. The toolbar gives you fast access to various folders and commands. You just click on... Read more
EtreCheck 4.0.1 - For troubleshooting yo...
EtreCheck is an app that displays the important details of your system configuration and allow you to copy that information to the Clipboard. It is meant to be used with Apple Support Communities to... Read more
Carbon Copy Cloner 5.0.9 - Easy-to-use b...
Carbon Copy Cloner backups are better than ordinary backups. Suppose the unthinkable happens while you're under deadline to finish a project: your Mac is unresponsive and all you hear is an ominous,... Read more
QuickBooks 17.2.25.638 R26 - Financial m...
QuickBooks helps you manage your business easily and efficiently. Organize your finances all in one place, track money going in and out of your business, and spot areas where you can save. Built for... Read more
Monosnap 3.4.10 - Versatile screenshot u...
Monosnap lets you capture screenshots, share files, and record video and .gifs! Features Capture Capture full screen, just part of the screen, or a selected window Make your crop area pixel... Read more
Vivaldi 1.14.1077.50 - An advanced brows...
Vivaldi is a browser for our friends. In 1994, two programmers started working on a web browser. Our idea was to make a really fast browser, capable of running on limited hardware, keeping in mind... Read more
Viber 8.2.0 - Send messages and make fre...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device, so... Read more
QuickBooks 17.2.25.638 R26 - Financial m...
QuickBooks helps you manage your business easily and efficiently. Organize your finances all in one place, track money going in and out of your business, and spot areas where you can save. Built for... Read more
Carbon Copy Cloner 5.0.9 - Easy-to-use b...
Carbon Copy Cloner backups are better than ordinary backups. Suppose the unthinkable happens while you're under deadline to finish a project: your Mac is unresponsive and all you hear is an ominous,... Read more

Latest Forum Discussions

See All

Our top 5 characters from casual RPG Cre...
Creature Quest definitely lives up to its name with a host of collectible creatures based on fantasy tales and world mythologies. To celebrate Creature Quest’s first birthday, we’re going to lay out what we think are the five best characters in the... | Read more »
Around the Empire: What have you missed...
Did you know that Steel Media has a whole swathe of other sites dedicated to all aspects of mobile gaming? Sure you'll get the very best iPhone news, reviews, and opinions right here at 148Apps, but we don't want you missing out on a single piece... | Read more »
All the best games on sale for iPhone an...
Oh hi there, and welcome to our round-up of the best games that are currently on sale for iPhone and iPad. You thought I didn't see you there, did you, skulking behind the bushes? Trust me though, the bushes aren't where the best deals are. The... | Read more »
The Battle of Polytopia Guide - How to H...
A new update just released for The Battle of Polytopia (formerly Super Tribes), which introduces online multiplayer. For all the fans of Midjiwan’s lite take on Civilization, this is certainly welcome news, but playing online isn’t as easy and... | Read more »
Here are the very best mobile games to p...
It's Valentine's Day! Did you get loads of cards and chocolates and other tacky, simple expressions of human affection? Did you send out tat because you find it almost impossible to express emotion unless there's a section dedicated to it at your... | Read more »
Florence (Games)
Florence 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Florence is an interactive storybook from the award-winning lead designer of Monument Valley about the heart-racing highs and... | Read more »
Purrfect Date (Games)
Purrfect Date 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Cats are a lil’ like marmite. Either you absolutely head-over-heels love’ em, or… nahhh, who are we kidding?! Everyone... | Read more »
More monsters to collect and evolve in C...
A laid-back mix of RPG and TCG, Creature Quest is all about building your deck, evolving your creatures and winning in battle. It’s the creation of VC Mobile, set up by Might and Magic producer Jon Van Caneghem. There are elements of that classic... | Read more »
Check out this awesome hands-on with the...
Well, PlayerUnknown's Battlegrounds has come out on mobile. This isn't a clone, this isn't a riff on the battleroyale mechanics of the game, it's the official mobile port by Tencent. But there's a little bit of a hitch. [Read more] | Read more »
Hostage Negotiator (Entertainment)
Hostage Negotiator 1.1.0 Device: iOS Universal Category: Entertainment Price: $3.99, Version: 1.1.0 (iTunes) Description: Official app of the board game by AJ Porfirio and Van Ryder Games. In Hostage Negotiator, you play the part of... | Read more »

Price Scanner via MacPrices.net

Saturday Sale: Amazon offers 13″ 1.8GHz/256GB...
Amazon has the 13″ 1.8GHz/256B Apple MacBook Air on sale today for $250 off MSRP including free shipping: – 13″ 1.8GHz/256GB MacBook Air (MQD42LL/A): $949.99, $250 off MSRP Their price is the lowest... Read more
Roundup of Apple Certified Refurbished 12″ Ma...
Apple has Certified Refurbished 2017 12″ Retina MacBooks available for $200-$240 off the cost of new models. Apple will include a standard one-year warranty with each MacBook, and shipping is free.... Read more
Apple offers Certified Refurbished 10″ and 12...
Apple is now offering Certified Refurbished 2017 10″ and 12″ iPad Pros for $100-$190 off MSRP, depending on the model. An Apple one-year warranty is included with each model, and shipping is free: –... Read more
Apple Canada offers Certified Refurbished Mac...
 Canadian shoppers can save up to $560 on the purchase of a 2017 current-generation MacBook Pro, MacBook, or MacBook Air with Certified Refurbished models at Apple Canada. Apple’s refurbished prices... Read more
Sale! 13″ MacBook Airs for up to $180 off MSR...
B&H Photo has 13″ MacBook Airs on sale for $50-$120 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 13″ 1.8GHz/128GB MacBook Air (MQD32LL/A): $899, $... Read more
Sale! New 8-core iMac Pro for $4799, $200 off...
Adorama has the 8-core iMac Pro on sale for $4799 including free shipping plus NY & NJ sales tax only. Their price is $200 off MSRP, and it’s the currently lowest price available for an iMac Pro. Read more
Sale! Walmart lowers prices even more on 9″ i...
Walmart has lowered their sale price on 9.7″ Apple iPads to $80 off MSRP for a limited time. Sale prices are for online orders only, in-store prices may vary: – 9″ 32GB iPad: $249.99 $80 off – 9″... Read more
Roundup of 13″ MacBook Pro sales, models avai...
B&H Photo has 13″ MacBook Pros on sale for up to $200 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only. Their prices are the lowest available for these... Read more
Roundup of 15″ MacBook Pros sale, models up t...
B&H Photo has 15″ MacBook Pros on sale for up to $200 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 15″ 2.8GHz Touch Bar MacBook Pro Space Gray (... Read more
How to save up to $350 on the purchase of a 2...
B&H Photo has iMacs on sale for up to $150 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 27″ 3.8GHz iMac (MNED2LL/A): $2149 $150 off MSRP – 27″ 3.... Read more

Jobs Board

*Apple* Retail - Multiple Positions - Apple,...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Retail - Multiple Positions - Apple,...
Job Description:SalesSpecialist - Retail Customer Service and SalesTransform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Retail - Multiple Positions - Apple,...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 113501424 Norman, Oklahoma, United States Posted: 15-Feb-2018 Weekly Hours: 40.00 **Job Summary** Are you passionate about Read more
Senior Program Manager - *Apple* Music - Ap...
# Senior Program Manager - Apple Music Job Number: 113393020 Seattle, Washington, United States Posted: 02-Feb-2018 Weekly Hours: 40.00 **Job Summary** Apple is Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.