TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 
AAPL
$467.36
Apple Inc.
+0.00
MSFT
$32.87
Microsoft Corpora
+0.00
GOOG
$885.51
Google Inc.
+0.00

MacTech Search:
Community Search:

Software Updates via MacUpdate

Acorn 4.1 - Bitmap image editor. (Demo)
Acorn is a new image editor built with one goal in mind - simplicity. Fast, easy, and fluid, Acorn provides the options you'll need without any overhead. Acorn feels right, and won't drain your bank... Read more
Mellel 3.2.3 - Powerful word processor w...
Mellel is the leading word processor for OS X, and has been widely considered the industry standard since its inception. Mellel focuses on writers and scholars for technical writing and multilingual... Read more
Iridient Developer 2.2 - Powerful image...
Iridient Developer (was RAW Developer) is a powerful image conversion application designed specifically for OS X. Iridient Developer gives advanced photographers total control over every aspect of... Read more
Delicious Library 3.1.2 - Import, browse...
Delicious Library allows you to import, browse, and share all your books, movies, music, and video games with Delicious Library. Run your very own library from your home or office using our... Read more
Epson Printer Drivers for OS X 2.15 - Fo...
Epson Printer Drivers includes the latest printing and scanning software for OS X 10.6, 10.7, and 10.8. Click here for a list of supported Epson printers and scanners.OS X 10.6 or laterDownload Now Read more
Freeway Pro 6.1.0 - Drag-and-drop Web de...
Freeway Pro lets you build websites with speed and precision... without writing a line of code! With it's user-oriented drag-and-drop interface, Freeway Pro helps you piece together the website of... Read more
Transmission 2.82 - Popular BitTorrent c...
Transmission is a fast, easy and free multi-platform BitTorrent client. Transmission sets initial preferences so things "Just Work", while advanced features like watch directories, bad peer blocking... Read more
Google Earth Web Plug-in 7.1.1.1888 - Em...
Google Earth Plug-in and its JavaScript API let you embed Google Earth, a true 3D digital globe, into your Web pages. Using the API you can draw markers and lines, drape images over the terrain, add... Read more
Google Earth 7.1.1.1888 - View and contr...
Google Earth gives you a wealth of imagery and geographic information. Explore destinations like Maui and Paris, or browse content from Wikipedia, National Geographic, and more. Google Earth... Read more
SMARTReporter 3.1.1 - Hard drive pre-fai...
SMARTReporter is an application that can warn you of some hard disk drive failures before they actually happen! It does so by periodically polling the S.M.A.R.T. status of your hard disk drive. S.M.... Read more

Strategy & Tactics: World War II Upd...
Strategy & Tactics: World War II Update Adds Two New Scenarios Posted by Andrew Stevens on August 12th, 2013 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Expenses Planner Review
Expenses Planner Review By Angela LaFollette on August 12th, 2013 Our Rating: :: PLAIN AND SIMPLEUniversal App - Designed for iPhone and iPad Expenses Planner keeps track of future bills through due date reminders, and it also... | Read more »
Kinesis: Strategy in Motion Brings An Ad...
Kinesis: Strategy in Motion Brings An Adaptation Of The Classic Strategic Board Game To iOS Posted by Andrew Stevens on August 12th, 2013 [ | Read more »
Z-Man Games Creates New Studio, Will Bri...
Z-Man Games Creates New Studio, Will Bring A Digital Version of Pandemic! | Read more »
Minutely Review
Minutely Review By Jennifer Allen on August 12th, 2013 Our Rating: :: CROWDSOURCING WEATHERiPhone App - Designed for the iPhone, compatible with the iPad Work together to track proper weather conditions no matter what area of the... | Read more »
10tons Discuss Publishing Fantasy Hack n...
Recently announced, Trouserheart looks like quite the quirky, DeathSpank-style fantasy action game. Notably, it’s a game that is being published by established Finnish games studio, 10tons and developed by similarly established and Finnish firm,... | Read more »
Boat Watch Lets You Track Ships From Por...
Boat Watch Lets You Track Ships From Port To Port Posted by Andrew Stevens on August 12th, 2013 [ permalink ] Universal App - Designed for iPhone and iPad | Read more »
Expenses Review
Expenses Review By Ruairi O'Gallchoir on August 12th, 2013 Our Rating: :: STUNNINGiPhone App - Designed for the iPhone, compatible with the iPad Although focussing primarily on expenses, Expenses still manages to make tracking... | Read more »
teggle is Gameplay Made Simple, has Play...
teggle is Gameplay Made Simple, has Players Swiping for High Scores Posted by Andrew Stevens on August 12th, 2013 [ permalink ] | Read more »
How To: Manage iCloud Settings
iCloud, much like life, is a scary and often unknowable thing that doesn’t always work the way it should. But much like life, if you know the little things and tweaks, you can make it work much better for you. I think that’s how life works, anyway.... | Read more »

Price Scanner via MacPrices.net

13″ 2.5GHz MacBook Pro on sale for $150 off M...
B&H Photo has the 13″ 2.5GHz MacBook Pro on sale for $1049.95 including free shipping. Their price is $150 off MSRP plus NY sales tax only. B&H will include free copies of Parallels Desktop... Read more
iPod touch (refurbished) available for up to...
The Apple Store is now offering a full line of Apple Certified Refurbished 2012 iPod touches for up to $70 off MSRP. Apple’s one-year warranty is included with each model, and shipping is free: -... Read more
27″ Apple Display (refurbished) available for...
The Apple Store has Apple Certified Refurbished 27″ Thunderbolt Displays available for $799 including free shipping. That’s $200 off the cost of new models. Read more
Apple TV (refurbished) now available for only...
The Apple Store has Apple Certified Refurbished 2012 Apple TVs now available for $75 including free shipping. That’s $24 off the cost of new models. Apple’s one-year warranty is standard. Read more
AnandTech Reviews 2013 MacBook Air (11-inch)...
AnandTech is never the first out with Apple new product reviews, but I’m always interested in reading their detailed, in-depth analyses of Macs and iDevices. AnandTech’s Vivek Gowri bought and tried... Read more
iPad, Tab, Nexus, Surface, And Kindle Fire: W...
VentureBeat’s John Koetsier says: The iPad may have lost the tablet wars to an army of Android tabs, but its still first in peoples hearts. Second place, however, belongs to a somewhat unlikely... Read more
Should You Buy An iPad mini Or An iPad 4?
Macworld UK’s David Price addresses the conundrum of which iPAd to buy? Apple iPad 4, iPad 2, iPad mini? Or hold out for the iPad mini 2 or the iPad 5? Price notes that potential Apple iPad... Read more
iDraw 2.3 A More Economical Alternative To Ad...
If you’re a working graphics pro, you can probably justify paying the stiff monthly rental fee to use Adobe’s Creative Cloud, including the paradigm-setting vector drawing app. Adobe Illustrator. If... Read more
New Documentary By Director Werner Herzog Sho...
Injuring or even killing someone because you were texting while driving is a life-changing experience. There are countless stories of people who took their eyes off the road for a second and ended up... Read more
AppleCare Protection Plans on sale for up to...
B&H Photo has 3-Year AppleCare Warranties on sale for up to $105 off MSRP including free shipping plus NY sales tax only: - Mac Laptops 15″ and Above: $244 $105 off MSRP - Mac Laptops 13″ and... Read more

Jobs Board

Sales Representative - *Apple* Honda - Appl...
APPLE HONDA AUTOMOTIVE CAREER FAIR! NOW HIRING AUTO SALES REPS, AUTO SERVICE BDC REPS & AUTOMOTIVE BILLER! NO EXPERIENCE NEEDED! Apple Honda is offering YOU a Read more
*Apple* Developer Support Advisor - Portugue...
Changing the world is all in a day's work at Apple . If you love innovation, here's your chance to make a career of it. You'll work hard. But the job comes with more than Read more
RBB - *Apple* OS X Platform Engineer - Barc...
RBB - Apple OS X Platform Engineer Ref 63198 Country USA…protected by law. Main Function | The engineering of Apple OS X based solutions, in line with customer and Read more
RBB - Core Software Engineer - Mac Platform (...
RBB - Core Software Engineer - Mac Platform ( Apple OS X) Ref 63199 Country USA City Dallas Business Area Global Technology Contract Type Permanent Estimated publish end Read more
*Apple* Desktop Analyst - Infinity Consultin...
Job Title: Apple Desktop Analyst Location: Yonkers, NY Job Type: Contract to hire Ref No: 13-02843 Date: 2013-07-30 Find other jobs in Yonkers Desktop Analyst The Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.