TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 
AAPL
$102.25
Apple Inc.
+0.12
MSFT
$44.88
Microsoft Corpora
+0.01
GOOG
$569.20
Google Inc.
-1.80

MacTech Search:
Community Search:

Software Updates via MacUpdate

Cloud 3.0.0 - File sharing from your men...
Cloud is simple file sharing for the Mac. Drag a file from your Mac to the CloudApp icon in the menubar and we take care of the rest. A link to the file will automatically be copied to your clipboard... Read more
LibreOffice 4.3.1.2 - Free Open Source o...
LibreOffice is an office suite (word processor, spreadsheet, presentations, drawing tool) compatible with other major office suites. The Document Foundation is coordinating development and... Read more
SlingPlayer Plugin 3.3.20.505 - Browser...
SlingPlayer is the screen interface software that works hand-in-hand with the hardware inside the Slingbox to make your TV viewing experience just like that at home. It features an array of... Read more
Get Lyrical 3.8 - Auto-magically adds ly...
Get Lyrical auto-magically add lyrics to songs in iTunes. You can choose either a selection of tracks, or the current track. Or turn on "Active Tagging" to get lyrics for songs as you play them.... Read more
Viber 4.2.2 - Send messages and make cal...
Viber lets you send free messages and make free calls to other Viber users, on any device and network, in any country! Viber syncs your contacts, messages and call history with your mobile device,... Read more
Cocktail 7.6 - General maintenance and o...
Cocktail is a general purpose utility for OS X that lets you clean, repair and optimize your Mac. It is a powerful digital toolset that helps hundreds of thousands of Mac users around the world get... Read more
LaunchBar 6.1 - Powerful file/URL/email...
LaunchBar is an award-winning productivity utility that offers an amazingly intuitive and efficient way to search and access any kind of information stored on your computer or on the Web. It provides... Read more
Maya 2015 - Professional 3D modeling and...
Maya is an award-winning software and powerful, integrated 3D modeling, animation, visual effects, and rendering solution. Because Maya is based on an open architecture, all your work can be scripted... Read more
BBEdit 10.5.12 - Powerful text and HTML...
BBEdit is the leading professional HTML and text editor for the Mac. Specifically crafted in response to the needs of Web authors and software developers, this award-winning product provides a... Read more
Microsoft Office 2011 14.4.4 - Popular p...
Microsoft Office 2011 helps you create professional documents and presentations. And since Office for Mac 2011 is compatible with Office for Windows, you can work on documents with virtually anyone... Read more

Latest Forum Discussions

See All

Hyperlapse Review
Hyperlapse Review By Jennifer Allen on August 28th, 2014 Our Rating: :: SPEEDY VIDEO SNAPSUniversal App - Designed for iPhone and iPad Want to make a great time-lapse video quickly? Hyperlapse is perfect for that.   | Read more »
Back To Bed Review
Back To Bed Review By Jennifer Allen on August 28th, 2014 Our Rating: :: STYLISH BUT LIMITEDUniversal App - Designed for iPhone and iPad It looks gorgeous, but Back to Bed is actually a fairly simple and uneventful puzzle game.   | Read more »
New Cars, New Locations, and a New Seaso...
New Cars, New Locations, and a New Season in Asphalt 8: Airborne Update Posted by Jessica Fisher on August 28th, 2014 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Award Winning Children’s Book Bedtime fo...
Bedtime for Sarah Sullivan is a children’s storybook that emphasizes the importance of going to bed, dreams, and those magical moments right before being tucked in. Now Kelly Paniagua, author of the award-winning children’s book, is planning to... | Read more »
Happy Cube Death Arena Review
Happy Cube Death Arena Review By Jordan Minor on August 28th, 2014 Our Rating: :: CUBEDUniversal App - Designed for iPhone and iPad Happy Cube Death Arena is adorably violent, but very, very shallow.   | Read more »
8bit Doves, the New Game from Icebreaker...
8bit Doves, the New Game from Icebreaker Developers Nitrome, is Now Available – and in Four Colours Posted by Ellis Spice on August 28th, 2014 [ | Read more »
Ace Ferrara and the Dino Menace Review
Ace Ferrara and the Dino Menace Review By Nadia Oxford on August 28th, 2014 Our Rating: :: DINO-MYTEUniversal App - Designed for iPhone and iPad Ace Ferrara and the Dino Menace combines space combat and weird humor into a fun game... | Read more »
Draw Stuff, Win Prizes. Glorkian Warrior...
Draw Stuff, Win Prizes. | Read more »
Lots of iOS Games Have Been Deeply Disco...
Labor Day is fast approaching, and so are the sales. Lots of sales, by the look of it. This list is already pretty sizable, and we haven’t even made it to the weekend yet. Naturally that means you can expect there to be plenty more price drops and... | Read more »
Letter Pix Review
Letter Pix Review By Jennifer Allen on August 28th, 2014 Our Rating: :: FLAWED WORDPLAYUniversal App - Designed for iPhone and iPad Create words to clear the board and guess a photo underlay correctly in this fun but flawed word... | Read more »

Price Scanner via MacPrices.net

Smartphone Outlook Remains Strong for 2014, U...
According to a new mobile phone forecast from the International Data Corporation (IDC) Worldwide Quarterly Mobile Phone Tracker, more than 1.25 billion smartphones will be shipped worldwide in 2014,... Read more
Save up to $60 with Apple refurbished iPod to...
The Apple Store has Apple Certified Refurbished 5th generation iPod touches available starting at $149. Apple’s one-year warranty is included with each model, and shipping is free. Many, but not all... Read more
12-Inch MacBook Air Coming in 4Q14 or 2015 –...
Digitimes’ Aaron Lee and Joseph Tsai report that according to Taiwan-based upstream supply chain insiders, Apple plans to launch a thinner MacBook model either at year end 2014 or in 2015, and that... Read more
Sapphire Screen “Most Wanted” iPhone 6 New Fe...
According to the ‘uSell.com iPhone Most Wanted Survey’ — a representative survey of 1,000 U.S. smartphone users conducted by used iPhone marketplace uSell.com — close to half of all smartphone users... Read more
The iPad’s Real Competitive Challenger (Not S...
It’s been my contention for some time that the iPad is suffering from something of an identity crisis, and I suspect that may be a factor in slackening sales this year. Apple can’t seem to decide... Read more
13-inch 2.6GHz/256GB Retina MacBook Pro on sa...
B&H Photo has the 13″ 2.6GHz/256GB Retina MacBook Pro on sale for $1379 including free shipping plus NY sales tax only. Their price is $120 off MSRP. Read more
Life Inventory iOS Apps – Learn to Know Thyse...
James Hollender’s Life Inventory apps s are now on sale with 20% off thru Labor Day, 09/01/2014. This is a great opportunity to get started on that Moral Inventory you’ve been putting off doing for... Read more
Pocket Watch, LLC. Reveals Cloud Server For P...
Beaumont, Texas based Pocket Watch, LLC. has announced the availability of its new ActivePrint Cloud Server Powered by Raspberry Pi. With this small standalone box almost any USB printer or available... Read more
902it Simplifies Area Code Changes For Nova S...
The east coast Canadian provinces of Nova Scotia and Prince Edward Island are phasing in 10 digit telephone dialing, to be fully in place by November, in order to accommodate a second area code to... Read more
Boomerang iPad Stand Mounts Your iPad Anywher...
Boomerang, a Mountable Stand with Multiple Viewing Angles, is now available for iPad Air. Boomerang combines several functions that aim to expand your iPad’s potential in one, elegant product. The... Read more

Jobs Board

*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
Senior Event Manager, *Apple* Retail Market...
…This senior level position is responsible for leading and imagining the Apple Retail Team's global event strategy. Delivering an overarching brand story; in-store, Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
Project Manager / Business Analyst, WW *Appl...
…a senior project manager / business analyst to work within our Worldwide Apple Fulfillment Operations and the Business Process Re-engineering team. This role will work Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.