TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Capture One 11.0.1.40 - RAW workflow sof...
Capture One is a professional RAW converter offering you ultimate image quality with accurate colors and incredible detail from more than 400 high-end cameras -- straight out of the box. It offers... Read more
Capture One 11.0.1.40 - RAW workflow sof...
Capture One is a professional RAW converter offering you ultimate image quality with accurate colors and incredible detail from more than 400 high-end cameras -- straight out of the box. It offers... Read more
GraphicConverter 10.5.4 - $39.95
GraphicConverter is an all-purpose image-editing program that can import 200 different graphic-based formats, edit the image, and export it to any of 80 available file formats. The high-end editing... Read more
Dash 4.1.3 - Instant search and offline...
Dash is an API documentation browser and code snippet manager. Dash helps you store snippets of code, as well as instantly search and browse documentation for almost any API you might use (for a full... Read more
Microsoft OneNote 16.9 - Free digital no...
OneNote is your very own digital notebook. With OneNote, you can capture that flash of genius, that moment of inspiration, or that list of errands that's too important to forget. Whether you're at... Read more
DEVONthink Pro 2.9.17 - Knowledge base,...
Save 10% with our exclusive coupon code: MACUPDATE10 DEVONthink Pro is your essential assistant for today's world, where almost everything is digital. From shopping receipts to important research... Read more
OmniGraffle 7.6 - Create diagrams, flow...
OmniGraffle helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use Graffle to... Read more
iFinance 4.3.7 - Comprehensively manage...
iFinance allows you to keep track of your income and spending -- from your lunchbreak coffee to your new car -- in the most convenient and fastest way. Clearly arranged transaction lists of all your... Read more
Opera 50.0.2762.58 - High-performance We...
Opera is a fast and secure browser trusted by millions of users. With the intuitive interface, Speed Dial and visual bookmarks for organizing favorite sites, news feature with fresh, relevant content... Read more
Microsoft Office 2016 16.9 - Popular pro...
Microsoft Office 2016 - Unmistakably Office, designed for Mac. The new versions of Word, Excel, PowerPoint, Outlook and OneNote provide the best of both worlds for Mac users - the familiar Office... Read more

Latest Forum Discussions

See All

Around the Empire: What have you missed...
Around this time every week we're going to have a look at the comings and goings on the other sites in Steel Media's pocket-gaming empire. We'll round up the very best content you might have missed, so you're always going to be up to date with the... | Read more »
Everything about Hero Academy 2: Part 4...
In this part of our Hero Academy 2 guide, we're going to have a look at some of the tactics you're going to need to learn if you want to rise up the ranks. We're going to start off slow, then get more advanced in the next section. [Read more] | Read more »
All the best games on sale for iPhone an...
Another week has flown by. Sometimes it feels like the only truly unstoppable thing is time. Time will make dust of us all. But before it does, we should probably play as many awesome mobile videogames as we can. Am I right, or am I right? [Read... | Read more »
The 7 best games that came out for iPhon...
Well, it's that time of the week. You know what I mean. You know exactly what I mean. It's the time of the week when we take a look at the best games that have landed on the App Store over the past seven days. And there are some real doozies here... | Read more »
Popular MMO Strategy game Lords Mobile i...
Delve into the crowded halls of the Play Store and you’ll find mobile fantasy strategy MMOs-a-plenty. One that’s kicking off the new year in style however is IGG’s Lords Mobile, which has beaten out the fierce competition to receive Google Play’s... | Read more »
Blocky Racing is a funky and fresh new k...
Blocky Racing has zoomed onto the App Store and Google Play this week, bringing with it plenty of classic kart racing shenanigans that will take you straight back to your childhood. If you’ve found yourself hooked on games like Mario Kart or Crash... | Read more »
Cytus II (Games)
Cytus II 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: "Cytus II" is a music rhythm game created by Rayark Games. It's our fourth rhythm game title, following the footsteps of three... | Read more »
JYDGE (Games)
JYDGE 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: Build your JYDGE. Enter Edenbyrg. Get out alive. JYDGE is a lawful but awful roguehate top-down shooter where you get to build your... | Read more »
Tako Bubble guide - Tips and Tricks to S...
Tako Bubble is a pretty simple and fun puzzler, but the game can get downright devious with its puzzle design. If you insist on not paying for the game and want to manage your lives appropriately, check out these tips so you can avoid getting... | Read more »
Everything about Hero Academy 2 - The co...
It's fair to say we've spent a good deal of time on Hero Academy 2. So much so, that we think we're probably in a really good place to give you some advice about how to get the most out of the game. And in this guide, that's exactly what you're... | Read more »

Price Scanner via MacPrices.net

Deals on clearance 15″ Apple MacBook Pros wit...
B&H Photo has clearance 2016 15″ MacBook Pros available for up to $800 off original MSRP. Shipping is free, and B&H charges NY & NJ sales tax only: – 15″ 2.7GHz Touch Bar MacBook Pro... Read more
Apple restocked Certified Refurbished 13″ Mac...
Apple has restocked a full line of Certified Refurbished 2017 13″ MacBook Airs starting at $849. An Apple one-year warranty is included with each MacBook, and shipping is free: – 13″ 1.8GHz/8GB/128GB... Read more
How to find the lowest prices on 2017 Apple M...
Apple has Certified Refurbished 13″ and 15″ 2017 MacBook Pros available for $200 to $420 off the cost of new models. Apple’s refurbished prices are the lowest available for each model from any... Read more
The lowest prices anywhere on Apple 12″ MacBo...
Apple has Certified Refurbished 2017 12″ Retina MacBooks available for $200-$240 off the cost of new models. Apple will include a standard one-year warranty with each MacBook, and shipping is free.... Read more
Apple now offering a full line of Certified R...
Apple is now offering Certified Refurbished 2017 10″ and 12″ iPad Pros for $100-$190 off MSRP, depending on the model. An Apple one-year warranty is included with each model, and shipping is free: –... Read more
27″ iMacs on sale for $100-$130 off MSRP, pay...
B&H Photo has 27″ iMacs on sale for $100-$130 off MSRP. Shipping is free, and B&H charges sales tax for NY & NJ residents only: – 27″ 3.8GHz iMac (MNED2LL/A): $2199 $100 off MSRP – 27″ 3.... Read more
2.8GHz Mac mini on sale for $899, $100 off MS...
B&H Photo has the 2.8GHz Mac mini (model number MGEQ2LL/A) on sale for $899 including free shipping plus NY & NJ sales tax only. Their price is $100 off MSRP. Read more
Apple offers Certified Refurbished iPad minis...
Apple has Certified Refurbished 128GB iPad minis available today for $339 including free shipping. Apple’s standard one-year warranty is included. Their price is $60 off MSRP. Read more
Amazon offers 13″ 256GB MacBook Air for $1049...
Amazon has the 13″ 1.8GHz/256B #Apple #MacBook Air on sale today for $150 off MSRP including free shipping: – 13″ 1.8GHz/256GB MacBook Air (MQD42LL/A): $1049.99, $150 off MSRP Read more
9.7-inch 2017 WiFi iPads on sale starting at...
B&H Photo has 9.7″ 2017 WiFi #Apple #iPads on sale for $30 off MSRP for a limited time. Shipping is free, and pay sales tax in NY & NJ only: – 32GB iPad WiFi: $299, $30 off – 128GB iPad WiFi... Read more

Jobs Board

*Apple* Data Center Site Selection and Strat...
# Apple Data Center Site Selection and Strategy Research Analyst Job Number: 83708609 Santa Clara Valley, California, United States Posted: 18-Jan-2018 Weekly Hours: Read more
Security Engineering Coordinator, *Apple* R...
# Security Engineering Coordinator, Apple Retail Job Number: 113237456 Santa Clara Valley, California, United States Posted: 18-Jan-2018 Weekly Hours: 40.00 **Job Read more
Firmware Engineer - *Apple* Accessories - A...
# Firmware Engineer - Apple Accessories Job Number: 113422485 Santa Clara Valley, California, United States Posted: 18-Jan-2018 Weekly Hours: 40.00 **Job Summary** Read more
*Apple* Retail - Multiple Positions - Apple,...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Store Leader - Retail District Manag...
Job Description:Job SummaryAs more and more people discover Apple , they visit our retail stores seeking ways to incorporate our products into their lives. It's your Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.