TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 
AAPL
$118.93
Apple Inc.
-0.07
MSFT
$47.81
Microsoft Corpora
+0.06
GOOG
$541.83
Google Inc.
+1.46

MacTech Search:
Community Search:

Software Updates via MacUpdate

Adobe Photoshop Elements 13.0 - Consumer...
Adobe Photoshop Elements 12--the #1 selling consumer photo editing software--helps you edit pictures with powerful, easy-to-use options and share them via print, the web, Facebook, and more.Version... Read more
Skype 7.2.0.412 - Voice-over-internet ph...
Skype allows you to talk to friends, family and co-workers across the Internet without the inconvenience of long distance telephone charges. Using peer-to-peer data transmission technology, Skype... Read more
HoudahSpot 3.9.6 - Advanced file search...
HoudahSpot is a powerful file search tool built upon MacOS X Spotlight. Spotlight unleashed Create detailed queries to locate the exact file you need Narrow down searches. Zero in on files Save... Read more
RapidWeaver 6.0.3 - Create template-base...
RapidWeaver is a next-generation Web design application to help you easily create professional-looking Web sites in minutes. No knowledge of complex code is required, RapidWeaver will take care of... Read more
iPhoto Library Manager 4.1.10 - Manage m...
iPhoto Library Manager lets you organize your photos into multiple iPhoto libraries. Separate your high school and college photos from your latest summer vacation pictures. Or keep some photo... Read more
iExplorer 3.5.1.9 - View and transfer al...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
MacUpdate Desktop 6.0.3 - Discover and i...
MacUpdate Desktop 6 brings seamless 1-click installs and version updates to your Mac. With a free MacUpdate account and MacUpdate Desktop 6, Mac users can now install almost any Mac app on macupdate.... Read more
SteerMouse 4.2.2 - Powerful third-party...
SteerMouse is an advanced driver for USB and Bluetooth mice. It also supports Apple Mighty Mouse very well. SteerMouse can assign various functions to buttons that Apple's software does not allow,... Read more
iMazing 1.1 - Complete iOS device manage...
iMazing (was DiskAid) is the ultimate iOS device manager with capabilities far beyond what iTunes offers. With iMazing and your iOS device (iPhone, iPad, or iPod), you can: Copy music to and from... Read more
PopChar X 7.0 - Floating window shows av...
PopChar X helps you get the most out of your font collection. With its crystal-clear interface, PopChar X provides a frustration-free way to access any font's special characters. Expanded... Read more

Latest Forum Discussions

See All

Mystery Case Files: Dire Grove, Sacred G...
Mystery Case Files: Dire Grove, Sacred Grove HD Review By Jennifer Allen on November 28th, 2014 Our Rating: iPad Only App - Designed for the iPad A decent new installment for the popular Mystery Case Files series.   | Read more »
Castaway Paradise – Tips, Tricks, and St...
Ahoy there, castaways: Were you curious about our own thoughts regarding this pristine shipwreck? Check out our Castaway Paradise review! Castaway Paradise is out for iOS, finally giving mobile gamers the opportunity to enjoy the idyllic lifestyle... | Read more »
Castaway Paradise VIP Subs are on Sale f...
Castaway Paradise VIP Subs are on Sale for a Limited Time, and a Special Holiday Update is Coming Soon Posted by Rob Rich on November 28th, 2014 [ | Read more »
Primitive Review
Primitive Review By Jordan Minor on November 28th, 2014 Our Rating: :: FOLK ARTUniversal App - Designed for iPhone and iPad True to its name, Primitive is about as straightforward as runners get.   | Read more »
7 tips to get ahead of the competition i...
7 tips to get ahead of the competition in Dynasty of Dungeons Posted by Simon Reed on November 28th, 2014 [ permalink ] Playcrab has launched their action-packed new dungeon crawler, Dynasty of Dungeons, today. | Read more »
Master of Tea Kung Fu Review
Master of Tea Kung Fu Review By Jordan Minor on November 28th, 2014 Our Rating: :: ONE DROP RULESUniversal App - Designed for iPhone and iPad Master of Tea Kung Fu is a creative and complex caffeinated brawler.   | Read more »
Monster Strike Review
Monster Strike Review By Campbell Bird on November 28th, 2014 Our Rating: :: BILLIARD STRATEGYUniversal App - Designed for iPhone and iPad Collect monsters and battle by flinging them across the battlefield in this strangely... | Read more »
Proun+ Review
Proun+ Review By Jennifer Allen on November 28th, 2014 Our Rating: :: TWITCHY RACINGUniversal App - Designed for iPhone and iPad Twitchy racing aplenty in Proun+, an enjoyably tricky title.   | Read more »
Lucha Amigos (Games)
Lucha Amigos 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Forget Ninja Turtles, and meet Wrestlers Turtles! Crazier, Spicier and…Bouncier! Sling carapaces of 7 Luchadores to knock all... | Read more »
Record of Agarest War Zero (Games)
Record of Agarest War Zero 1.0 Device: iOS Universal Category: Games Price: $7.99, Version: 1.0 (iTunes) Description: HyperDevbox Holiday Turkey Black Friday Special Pricing! To celebrate the opening of the holiday season HyperDevbox... | Read more »

Price Scanner via MacPrices.net

Up To 75% Off Infovole Text Apps Over Black F...
Infovole’s entire range of apps, including the Textkraft family of word processors for iPads and iPhones, is being offered at 50-75% off over the Black Friday and Cyber Monday weekend. The five-day... Read more
Black Friday: Up to $60 off Mac minis, NY tax...
 B&H Photo has new 2014 Mac minis on sale for up to $60 off MSRP as part of their Black Friday sale. Shipping is free, and B&H charges NY sales tax only: - 1.4GHz Mac mini: $449.99 $50 off... Read more
Black Friday: 27-inch 5K iMac for $2299, save...
 B&H Photo continues to offer Black Friday sale prices on the 27″ 3.5GHz 5K iMac, in stock today and on sale for $2299 including free shipping plus NY sales tax only. Their price is $200 off MSRP... Read more
Karalux Announces 24K Gold-Plated iPhone 6
Karalux, a Vietnam-based jewellery firm, has launched a unique 24 karat gold-plated iPhone 6 version with gold-cast monolithic dragon on its back panel. The real 24 karat gold plated enclosure doesn’... Read more
Black Friday: 13-inch 2.6GHz Retina MacBook P...
 B&H Photo has lowered their price for the 13″ 2.6GHz/128GB Retina MacBook Pro to $1159 for Black Friday. That’s $140 off MSRP, and it’s the lowest price for this model (except for Apple’s $1099... Read more
View all the Black Friday sales on our Mac Pr...
We’ve updated our Mac Price Trackers with the latest information on prices, bundles, and availability on systems from Apple’s authorized internet/catalog resellers. View Black Friday sale prices at a... Read more
Black Friday: 11-inch MacBook Air for $779, s...
 Best Buy has lowered their price for the 2014 11″ 1.4GHz/128GB MacBook Air to $779.99 for Black Friday. That’s $120 off MSRP. Choose free shipping or free local store pickup (if available). Sale... Read more
Apple Store Black Friday sale for 2014: $100...
BLACK FRIDAY The Apple Store has posted their Black Friday deals for 2014. Receive a $100 PRODUCT(RED) branded iTunes gift card with the purchase of select Macs, $50 with iPads, and $25 with iPods,... Read more
Black Friday: 15% off iTunes Gift Cards
Staples is offering 15% off $50 and $100 iTunes Gift Cards on their online store as part of their Black Friday sale. Click here for more information. Shipping is free. Best Buy is offering $100... Read more
BEVL Releases Dock Tailored for iPhone 6 and...
Seattle based BEVL has released their first product: an iPhone dock that is divergent in build quality, rock-solid function and visual simplicity to complement the iPhone. BEVL is now accepting... Read more

Jobs Board

*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
Senior Event Manager, *Apple* Retail Market...
…This senior level position is responsible for leading and imagining the Apple Retail Team's global event strategy. Delivering an overarching brand story; in-store, Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.