TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Microsoft Office 2016 15.25 - Popular pr...
Microsoft Office 2016 - Unmistakably Office, designed for Mac. The new versions of Word, Excel, PowerPoint, Outlook and OneNote provide the best of both worlds for Mac users - the familiar Office... Read more
FileZilla 3.21.0 - Fast and reliable FTP...
FileZilla (ported from Windows) is a fast and reliable FTP client and server with lots of useful features and an intuitive interface. Version 3.21.0: Fixed Vulnerabilities Fixed a string format... Read more
Fantastical 2.2.5 - Create calendar even...
Fantastical 2 is the Mac calendar you'll actually enjoy using. Creating an event with Fantastical is quick, easy, and fun: Open Fantastical with a single click or keystroke Type in your event... Read more
The Hit List 1.1.26 - Advanced reminder...
The Hit List manages the daily chaos of your modern life. It's easy to learn - it's as easy as making lists. And it's powerful enough to let you plan, then forget, then act when the time is right.... Read more
Typinator 6.10 - Speedy and reliable tex...
Typinator turbo-charges your typing productivity. Type a little. Typinator does the rest. We've all faced projects that require repetitive typing tasks. With Typinator, you can store commonly used... Read more
EtreCheck 3.0.2 - For troubleshooting yo...
EtreCheck is an app that displays the important details of your system configuration and allow you to copy that information to the Clipboard. It is meant to be used with Apple Support Communities to... Read more
FileZilla 3.21.0 - Fast and reliable FTP...
FileZilla (ported from Windows) is a fast and reliable FTP client and server with lots of useful features and an intuitive interface. Version 3.21.0: Fixed Vulnerabilities Fixed a string format... Read more
EtreCheck 3.0.2 - For troubleshooting yo...
EtreCheck is an app that displays the important details of your system configuration and allow you to copy that information to the Clipboard. It is meant to be used with Apple Support Communities to... Read more
Microsoft Office 2016 15.25 - Popular pr...
Microsoft Office 2016 - Unmistakably Office, designed for Mac. The new versions of Word, Excel, PowerPoint, Outlook and OneNote provide the best of both worlds for Mac users - the familiar Office... Read more
Typinator 6.10 - Speedy and reliable tex...
Typinator turbo-charges your typing productivity. Type a little. Typinator does the rest. We've all faced projects that require repetitive typing tasks. With Typinator, you can store commonly used... Read more

Bowmasters tips, tricks and hints
At least for this writer, archery was one of the more pleasant surprises of the 2016 Rio Olympics. As opposed to target shooting with guns, which was dreadfully boring, watching people shoot arrows at targets was pretty darn cool. [Read more] | Read more »
Best apps for watching live TV
The Olympics have come and gone, leaving nearly everyone in a temporary state of "What the heck am I going to watch on TV right now?" Besides old reruns of Golden Girls, but that goes without saying. [Read more] | Read more »
What is Flip Diving, and why has it take...
Move over Pokemon GO. There's a new king in town, and it's "the world's #1 cliff diving game." [Read more] | Read more »
5 places where Pokemon GO is still numbe...
In the U.S., the bloom is off the Pokemon Go rose ever so slightly. It's still doing great, sitting atop the top grossing chart as it has for some time, but it's no longer among the top 10 free apps in downloads, possibly because darn near... | Read more »
Madden NFL Mobile: How defense has chang...
Saying that defense is not a priority in Madden NFL Mobile is a bit of an understatement. In asynchronous head-to-head play, you don't take control of your defenders at all, as the AI manages them while your opponent plays offense. When it's your... | Read more »
Feed Hawk (News)
Feed Hawk 1.0.1 Device: iOS Universal Category: News Price: $2.99, Version: 1.0.1 (iTunes) Description: Feed Hawk makes it easy to subscribe to the RSS feed of the website you are visiting. From within Safari, simply open a share... | Read more »
Reigns character guide: Who's who i...
Know your foes. Keep your friends close, but your enemies closer. And there are probably some other cliches that would apply to your perilous spot on the throne in Reigns as well. [Read more] | Read more »
Match 3 puzzler Small Lime is now availa...
Set to hit Android and IOS on the 17th August, Small Lime is the newest match 3 mobile game, and hopes to throw something a little different into the mix. If you love match 3 puzzles, but are tired of the same old ideas being re-hashed again and... | Read more »
Deus Ex GO tips, tricks, and hints
When Square Enix Montreal first hit us with Hitman GO,it was seen as a clever board game twist on a property that you wouldn't normally think would fit that kind of format. Lara Croft GOexpanded things even further while keeping some of the same... | Read more »
Leap of Fate (Games)
Leap of Fate 1.0 Device: iOS Universal Category: Games Price: $3.99, Version: 1.0 (iTunes) Description: *** Minimum hardware: iPad 4, iPad mini 2, iPhone 5s. *** | Read more »

Price Scanner via MacPrices.net

Typinator 6.10 comes with 50 improvements – G...
Ergonis Software today announced release of Typinator 6.10, a new version of their text expander utility for macOS. Typinator 6.10 comes with 50 improvements, including new features, compatibility... Read more
Taxi Sim 2016 Puts Users Behind the Wheel in...
Ovilex Soft today announces Taxi Sim 2016, an update to their ultra-realistic 3D driving simulator app for iOS and Android devices — literally a global event what with the company’s nearly 450,000... Read more
11-inch 1.6GHz/128GB MacBook Air on sale for...
Amazon has the current-generation 11″ 1.6GHz/128GB MacBook Air (sku MJVM2LL/A) on sale for $788 for a limited time. Their price is $111 off MSRP, and it’s the lowest price available for this model. Read more
Apple refurbished Mac minis available for up...
Apple has Certified Refurbished Mac minis available starting at $419. Apple’s one-year warranty is included with each mini, and shipping is free: - 1.4GHz Mac mini: $419 $80 off MSRP - 2.6GHz Mac... Read more
Apple refurbished 13-inch Retina MacBook Pros...
Apple has Certified Refurbished 13″ Retina MacBook Pros available for up to $270 off the cost of new models. An Apple one-year warranty is included with each model, and shipping is free: - 13″ 2.7GHz... Read more
12-inch 32GB and 128GB WiFi iPad Pros on sale...
B&H Photo has 12″ 32GB & 128GB WiFi Apple iPad Pros on sale for up to $70 off MSRP, each including free shipping. B&H charges sales tax in NY only: - 12″ Space Gray 32GB WiFi iPad Pro: $... Read more
Apple refurbished 11-inch MacBook Airs availa...
Apple has Certified Refurbished 11″ MacBook Airs (the latest models), available for up to $170 off the cost of new models. An Apple one-year warranty is included with each MacBook, and shipping is... Read more
WaterField Launches Kickstarter for Intrepid...
San Francisco based WaterField Design have announced their first Kickstarter campaign for the one-of-a-kind Intrepid iPhone Travel Wallet. The all-new design includes iPhone play-through capability... Read more
Five of Top 10 Worldwide Mobile Phone Vendors...
Global sales of smartphones to end users totaled 344 million units in the second quarter of 2016, a 4.3 percent increase over the same period in 2015, according to Gartner, Inc. Overall sales of... Read more
DriveSavers Offers $300 Off Data Recovery Ser...
DriveSavers, with more than 30 years of experience recovering photos, videos, contact lists, financial records and other important data that may have been kept on devices damaged or even destroyed by... Read more

Jobs Board

*Apple* Retail - Multiple Positions Germanto...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Professional Learning Specialist - A...
# Apple Professional Learning Specialist Job Number: 51234379 Portland, Maine, Maine, United States Posted: Aug. 18, 2016 Weekly Hours: 40.00 **Job Summary** The Read more
Lead *Apple* Solutions Consultant - Apple (...
# Lead Apple Solutions Consultant Job Number: 51218465 Richmond, VA, Virginia, United States Posted: Aug. 18, 2016 Weekly Hours: 40.00 **Job Summary** The Lead ASC Read more
*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 51218534 Pleasant Hill, California, United States Posted: Aug. 18, 2016 Weekly Hours: 40.00 **Job Summary** As an Apple Read more
*Apple* Retail - Multiple Positions Chestnut...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.