TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 
AAPL
$442.14
Apple Inc.
+0.00
MSFT
$34.15
Microsoft Corpora
+0.00
GOOG
$882.79
Google Inc.
+0.00

MacTech Search:
Community Search:

Software Updates via MacUpdate

Evernote 5.1.1 - Create searchable notes...
Evernote allows you to easily capture information in any environment using whatever device or platform you find most convenient, and makes this information accessible and searchable at anytime, from... Read more
SketchUp 13.0.3688 - Create 3D design co...
SketchUp is an easy-to-learn 3D modeling program that enables you to explore the world in 3D. With just a few simple tools, you can create 3D models of houses, sheds, decks, home additions,... Read more
GarageSale 6.6b10 - Create outstanding e...
GarageSale is a slick, full-featured client application for the eBay online auction system. Create and manage your auctions with ease With GarageSale, you can create, edit, track, and manage... Read more
Twitter 2.2.1 - Official Twitter client...
Twitter (was Tweetie) is a Twitter client with a variety of features. Important Note: As of January 2011, AteBit's Tweetie application has been acquired and renamed by Twitter. Version 1.2.8 of the... Read more
SteerMouse 4.1.6 - Powerful third-party...
SteerMouse is an advanced driver for USB and Bluetooth mice. It also supports Apple Mighty Mouse very well. SteerMouse can assign various functions to buttons that Apple's software does not allow,... Read more
Google Chrome 27.0.1453.93 - Modern and...
Google Chrome is a Web browser by Google, created to be a modern platform for Web pages and applications. It utilizes very fast loading of Web pages and has a V8 engine, which is a custom built... Read more
Labels & Addresses 1.6.5 - Powerful...
Labels & Addresses is a home and office tool for printing all sorts of labels, envelopes, inventory labels, and price tags. Merge-printing capability makes the program a great tool for holiday... Read more
Delicious Library 3.0.2 - Import, browse...
Delicious Library allows you to import, browse, and share all your books, movies, music, and video games with Delicious Library. Run your very own library from your home or office using our... Read more
KeyCue 6.5 - Displays all menu shortcut...
KeyCue helps you to use your OS X applications more effectively. Just hold down the Command key for a while - KeyCue comes to help and shows a table of all currently available keyboard shortcuts.... Read more
HoudahSpot 3.7.8 - Advanced front-end fo...
HoudahSpot is a flexible file-search tool based on Apple's powerful Spotlight engine. Keep frequently used files within reach Retrieve the files you didn't know you still had Don't waste time... Read more

Evernote Update Keeps You Notified, Adds...
Evernote Update Keeps You Notified, Adds New Reminders Feature Posted by Andrew Stevens on May 23rd, 2013 [ permalink ] | Read more »
Clear Shakes Up A New Update: Email Your...
Clear Shakes Up A New Update: Email Your Lists Posted by Andrew Stevens on May 23rd, 2013 [ permalink ] iPhone App - Designed for the iPhone, compatible with the iPad | Read more »
Regular Show: Best Park in the Universe...
Regular Show: Best Park in the Universe Review By Carter Dotson on May 23rd, 2013 Our Rating: :: SLACKERSUniversal App - Designed for iPhone and iPad This park has some good ideas, but a lot of work needs to go into it to make it... | Read more »
Angry Birds Space Launches You Into Spac...
Angry Birds Space Launches You Into Space For Free Posted by Andrew Stevens on May 23rd, 2013 [ permalink ] iPhone App - Designed for the iPhone, compatible with the iPad | Read more »
Mailbox Shows Some Tablet Love, Gets Opt...
Mailbox Shows Some Tablet Love, Gets Optimized For iPad Posted by Andrew Stevens on May 23rd, 2013 [ permalink ] iPhone App - Designed for the iPhone, compatible with the iPad | Read more »
Ayopa Games Offers Their Titles For Free...
Ayopa Games Offers Their Titles For Free This Memorial Day Weekend Posted by Andrew Stevens on May 23rd, 2013 [ permalink ] Ayopa Games is celebrating this Mem | Read more »
Greedy Grub Review
Greedy Grub Review By Rob Rich on May 23rd, 2013 Our Rating: :: A CUTE CRAWLUniversal App - Designed for iPhone and iPad Greedy Grub is certainly adorable, but it’s not particularly ground-breaking.   | Read more »
Finger Tied Jr Review
Finger Tied Jr Review By Jennifer Allen on May 23rd, 2013 Our Rating: :: FINGER TWISTING FUNiPhone App - Designed for the iPhone, compatible with the iPad Finger Tied brought Twister-style gaming to the iPad, and Jr does much the... | Read more »
Zynga’s Battlestone – Mobile Hack ‘n’ Sl...
Zynga’s Battlestone – Mobile Hack ‘n’ Slash Arcade Action Posted by Rob LeFebvre on May 23rd, 2013 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Developer Spotlight: Infinite Dreams
With its latest title, Can Knockdown 3, recently earning a coveted Editor’s Choice award here, I took the time to learn a bit more about Polish game developer, Infinite Dreams. Who is Infinite Dreams? Based in the Southern Polish city of Gliwice,... | Read more »

Price Scanner via MacPrices.net

Economic Conservatives Defend Apple’s Tax Strategy
Given Apple’s longtime reputation as the particular darling of the liberal lefty end of the spectrum, it’s been facinating to see mostly prominant conservatives rallying to the defense of Apple’s... Read more
Is Apple Losing Its “Cool” Cachet With The Popular...
SMH’s Steve Colquhoun notes that while Apple has again been rated as the world’s top brand this week, a leading social researcher warns the company and its products are losing touch with Generation Y... Read more
New Rugged Smartphone From…. Caterpillar?!
Bullitt Mobile Ltd., global licensee of Cat phones for Caterpillar Inc., has introduced the new Cat B15 smartphone in North America. The Cat B15 is designed to be the most progressive, durable and... Read more
Mac mini on sale for $25 off, free shipping, NY ta...
B&H Photo has the 2.5GHz Mac mini available for $574.98 including free shipping and NY sales tax only. Their price is $25 off MSRP. B&H will include free copies of Parallels Desktop and Bento... Read more
Updated iPad Price Trackers
We’ve updated our iPad Price Tracker and our iPad mini Price Tracker with the latest information on prices and availability from Apple and other resellers. Read more
Take $20 off with Apple refurbished iPod nanos
The Apple Store has Apple Certified Refurbished 16GB iPod nanos available for $129 including free shipping and Apple’s standard one-year warranty. That’s $20, or 13%, off the cost of new nanos. All... Read more
Apple TV (refurbished) available for $85, 14% off
The Apple Store has Apple Certified Refurbished 2012 Apple TVs available for $85 including free shipping. That’s $14 off the cost of new models. Apple’s one-year warranty is standard. Read more
27″ iMacs on sale for $100 off MSRP
Amazon has 27-inch iMacs on sale for $100 off MSRP: - 27″ 3.2GHz iMac: $1899.99 - 27″ 2.9GHz iMac: $1699.98 Shipping is free Read more
Platform Wars: Tablets Triumphant, But Don’t Write...
The Register’s Paul Kunert says it’s finally official – the epic battle of legendary Apple CEO Steve Jobs is finally won, now that he has toppled the PC platform from beyond the grave, in the UK, at... Read more
Apple Tops 100 Most Valuable Global Brands 2013 Su...
MarketingWeek’s Lou Cooper reports that this years BrandZ ranking of the top 100 valuable global brands sees Apple maintain its reign as number one, ahead of Google and IBM in second and third and... Read more

Jobs Board

*Apple* Retail - Manager - Apple Inc. (...
Job Summary Keeping an Apple Store thriving requires a diverse set of leadership skills, and as a Manager, you’re a master of them all. In the store’s fast-paced, Read more
*Apple* Account Executive - CompuCom (U...
Apple Account Executive Job Location US-IL-Des Plaines Posted Date 3/27/2013 Req # 2013-4905 Apply/Socialize: * Apply Now! * Email this opportunity to a friend or Read more
*Apple* - Solution Architect - CompuCom...
Apple - Solution Architect Job Location US-TX-Dallas Posted Date 4/18/2013 Req # 2013-4932 Apply/Socialize: * Apply Now! * Email this opportunity to a friend or Read more
Mac/ *Apple* Specialist Needed - Enterp...
Mac/ Apple Specialist Needed - Enterprise iPad Deployment A prominent Robert Half client is seeking out a Mac/ Apple Specialist to assist with an iPad deployment Read more
Mac/ *Apple* Specialist Needed | Enterp...
Mac/ Apple Specialist Needed | Enterprise iPad Deployment A prominent Robert Half client is seeking out a Mac/ Apple Specialist to assist with an iPad deployment Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.