TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.


Community Search:
MacTech Search:

Software Updates via MacUpdate

VMware Fusion 8.5.1 - Run Windows apps a...
VMware Fusion 8 and Fusion 8 Pro--the latest versions of its virtualization software for running Windows on a Mac without rebooting--include full support for Windows 10, OS X El Capitan, and the... Read more
Apple Final Cut Pro X 10.3 - Professiona...
Apple Final Cut Pro X is a professional video editing solution.Completely redesigned from the ground up, Final Cut Pro adds extraordinary speed, quality, and flexibility to every part of the post-... Read more
Civilization VI 1.0.0 - Next iteration o...
Sid Meier’s Civilization VI is the next entry in the popular Civilization franchise. Originally created by legendary game designer Sid Meier, Civilization is a strategy game in which you attempt to... Read more
Paperless 2.3.7 - $49.95
Paperless is a digital documents manager. Remember when everyone talked about how we would soon be a paperless society? Now it seems like we use paper more than ever. Let's face it - we need and we... Read more
Apple iMovie 10.1.3 - Edit personal vide...
With an all-new design, Apple iMovie lets you enjoy your videos like never before. Browse your clips more easily, instantly share your favorite moments, and create beautiful HD movies and Hollywood-... Read more
Apple Numbers 4.0.5 - Apple's sprea...
With Apple Numbers, sophisticated spreadsheets are just the start. The whole sheet is your canvas. Just add dramatic interactive charts, tables, and images that paint a revealing picture of your data... Read more
Xcode 8.1 - Integrated development envir...
Xcode includes everything developers need to create great applications for Mac, iPhone, iPad, and Apple Watch. Xcode provides developers a unified workflow for user interface design, coding, testing... Read more
iShowU Instant 1.1.0 - Full-featured scr...
iShowU Instant gives you real-time screen recording like you've never seen before! It is the fastest, most feature-filled real-time screen capture tool from shinywhitebox yet. All of the features you... Read more
RestoreMeNot 2.0.4 - Disable window rest...
RestoreMeNot provides a simple way to disable the window restoration for individual applications so that you can fine-tune this behavior to suit your needs. Please note that RestoreMeNot is designed... Read more
DEVONthink Pro 2.9.6 - Knowledge base, i...
DEVONthink Pro is your essential assistant for today's world, where almost everything is digital. From shopping receipts to important research papers, your life often fills your hard drive in the... Read more

Latest Forum Discussions

See All

Roofbot: Puzzler On The Roof (Games)
Roofbot: Puzzler On The Roof 1.0.2 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.2 (iTunes) Description: Guide Roofie through gorgeous, meditative rooftops and try to get the right color energy balls into the... | Read more »
The 4 best food delivery apps
As the temperatures continue to drop, so does the motivation to venture outside. Sometimes you still want to eat a nice meal from that sushi place down the road though. Thankfully in these trying times, there are a number of fine food delivery... | Read more »
Toca Life: Farm (Education)
Toca Life: Farm 1.0 Device: iOS Universal Category: Education Price: $2.99, Version: 1.0 (iTunes) Description: Work and play the farmer's way! Milk your cow, gather eggs from your hens and raise your crops. Have a picnic, play the... | Read more »
The Lost Shield (Games)
The Lost Shield 1.0.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.0 (iTunes) Description: The Lost shield is a brick break/adventure game. You play as a hero who must return a powerful but dangerous magic shield... | Read more »
The Forgotten Room (Games)
The Forgotten Room 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: Play as paranormal investigator John “Buster of Ghosts” Murr as he explores yet another mysteriously creepy house. This... | Read more »
5 Halloween mobile games for wimps
If you're anything like me, horror games are a great way to have nightly nightmares for the next decade or three. They're off limits, but perhaps you want to get in on the Halloween celebrations in some way. Fortunately not all Halloween themed... | Read more »
The 5 scariest mobile games
It's the most wonderful time of the year for people who enjoy scaring themselves silly with haunted houses, movies, video games, and what have you. Mobile might not be the first platform you'd turn to for quality scares, but rest assured there are... | Read more »
Lifeline: Flatline (Games)
Lifeline: Flatline 1.0.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.0 (iTunes) Description: The Lifeline series takes a terrifying turn in this interactive horror experience. Every decision you make could help... | Read more »
Game of Dice is now available on Faceboo...
After celebrating its anniversary in style with a brand new update, there’s even more excitement in store for Game of Dice has after just being launched on Facebook Gameroom. A relatively new platform, Facebook Gameroom has been designed for PC... | Read more »
4 addictive clicker games like Best Fien...
Clickers are passive games that take advantage of basic human psychology to suck you in, and they're totally unashamed of that. As long as you're aware that this game has been created to take hold of your brain and leave you perfectly content to... | Read more »

Price Scanner via

Apple Unveils Redesigned MacBook Pro With Tou...
October 27, 2016 – Apple today introduced the thinnest and lightest MacBook Pro yet, along with a new interface innovation that replaces the traditional row of function keys with a Retina-quality... Read more
Apple Unveils New TV App for Apple TV, iPhone...
October 27, 2016 – Apple today introduced a new TV app, offering a unified experience for discovering and accessing TV shows and movies from multiple apps on Apple TV, iPhone and iPad. The TV app... Read more
Price drops on select refurbished 2015 13″ Re...
Apple dropped prices on select Certified Refurbished 2015 13″ Retina MacBook Pros by as much as $90. An Apple one-year warranty is included with each model, and shipping is free: - 13″ 2.7GHz/256GB... Read more
Apple reveals new next-generation 15″ and 13″...
Apple today revealed their next-generation 15″ and 13″ MacBook Pros. The new models are thinner and lighter than before with a new aluminum design featuring an enhanced keyboard with retina, multi-... Read more
Worldwide Smartphone Shipments Up 1.0% Year o...
According to preliminary results from the International Data Corporation (IDC) Worldwide Quarterly Mobile Phone Tracker, vendors shipped a total of 362.9 million smartphones worldwide in the third... Read more
TuneBand Arm Band For iPhone 7 and 7 Plus Rel...
Grantwood Technology has added the TuneBand for iPhone 7 and 7 Plus to its smartphone armband series. The TuneBand provides a lightweight and comfortable way to wear the iPhone while running,... Read more
1.4GHz Mac mini on sale for $449, save $50
Adorama has the 1.4GHz Mac mini on sale for $50 off MSRP including free shipping plus NY & NJ sales tax only: - 1.4GHz Mac mini (Apple sku# MGEM2LL/A): $449 $50 off MSRP To purchase a mini at... Read more
21-inch 1.6GHz iMac on sale for $999, save $1...
B&H has the 21″ 1.6GHz Apple iMac on sale for $999 including free shipping plus NY sales tax only. Their price is $100 off MSRP. Read more
Macs’ Superior Enterprise Deployment Cost Eco...
IBM’s debunking of conventional wisdom and popular mythology about the relative cost of using Apple Mac computers as opposed to PCs running Microsoft Windows at the sixth annual Jamf Nation User... Read more
12-inch WiFi Apple iPad Pros on sale for $50-...
B&H Photo has 12″ WiFi Apple iPad Pros on sale for $50-$70 off MSRP, each including free shipping. B&H charges sales tax in NY only: - 12″ Space Gray 32GB WiFi iPad Pro: $749 $50 off MSRP -... Read more

Jobs Board

*Apple* Retail - Multiple Positions - Apple,...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
Software Engineering Intern: UI Applications...
Job Summary Apple is currently seeking enthusiastic interns who can work full-time for a minimum of 12-weeks between Fall 2015 and Summer 2016. Our software Read more
Security Data Analyst - *Apple* Information...
…data sources need to be collected to allow Information Security to better protect Apple employees and customers from a wide range of threats.Act as the subject Read more
*Apple* Retail - Multiple Positions - Apple,...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Solutions Consultant - Apple (United...
# Apple Solutions Consultant Job Number: 52812872 Houston, Texas, United States Posted: Oct. 18, 2016 Weekly Hours: 40.00 **Job Summary** As an Apple Solutions Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.