TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

FileZilla 3.27.0.1 - Fast and reliable F...
FileZilla (ported from Windows) is a fast and reliable FTP client and server with lots of useful features and an intuitive interface. Version 3.27.0.1: MSW: Add misssing file to .zip binary package... Read more
Spotify 1.0.59.395. - Stream music, crea...
Spotify is a streaming music service that gives you on-demand access to millions of songs. Whether you like driving rock, silky R&B, or grandiose classical music, Spotify's massive catalogue puts... Read more
Sierra Cache Cleaner 11.0.6 - Clear cach...
Sierra Cache Cleaner is an award-winning general purpose tool for macOS X. SCC makes system maintenance simple with an easy point-and-click interface to many macOS X functions. Novice and expert... Read more
DiskCatalogMaker 7.1.2 - Catalog your di...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast Finder-like intuitive look and feel Super-fast search algorithm Can compress catalog data for... Read more
Live Home 3D Pro 3.1.2 - $69.99
Live Home 3D Pro, a successor of Live Interior 3D, is the powerful yet intuitive home design software that lets you build the house of your dreams right on your Mac. It has every feature of Live Home... Read more
Deeper 2.2.1 - Enable hidden features in...
Deeper is a personalization utility for macOS which allows you to enable and disable the hidden functions of the Finder, Dock, QuickTime, Safari, iTunes, login window, Spotlight, and many of Apple's... Read more
Pinegrow 3.04 - Mockup and design webpag...
Pinegrow (was Pinegrow Web Designer) is desktop app that lets you mockup and design webpages faster with multi-page editing, CSS and LESS styling, and smart components for Bootstrap, Foundation,... Read more
Deeper 2.2.1 - Enable hidden features in...
Deeper is a personalization utility for macOS which allows you to enable and disable the hidden functions of the Finder, Dock, QuickTime, Safari, iTunes, login window, Spotlight, and many of Apple's... Read more
Spotify 1.0.59.395. - Stream music, crea...
Spotify is a streaming music service that gives you on-demand access to millions of songs. Whether you like driving rock, silky R&B, or grandiose classical music, Spotify's massive catalogue puts... Read more
FileZilla 3.27.0.1 - Fast and reliable F...
FileZilla (ported from Windows) is a fast and reliable FTP client and server with lots of useful features and an intuitive interface. Version 3.27.0.1: MSW: Add misssing file to .zip binary package... Read more

Latest Forum Discussions

See All

The best deals on the App Store this wee...
There are quite a few truly superb games on sale on the App Store this week. If you haven't played some of these, many of which are true classics, now's the time to jump on the bandwagon. Here are the deals you need to know about. [Read more] | Read more »
Realpolitiks Mobile (Games)
Realpolitiks Mobile 1.0 Device: iOS Universal Category: Games Price: $5.99, Version: 1.0 (iTunes) Description: PLEASE NOTE: The game might not work properly on discontinued 1GB of RAM devices (iPhone 5s, iPhone 6, iPhone 6 Plus, iPad... | Read more »
Layton’s Mystery Journey (Games)
Layton’s Mystery Journey 1.0.0 Device: iOS Universal Category: Games Price: $15.99, Version: 1.0.0 (iTunes) Description: THE MUCH-LOVED LAYTON SERIES IS BACK WITH A 10TH ANNIVERSARY INSTALLMENT! Developed by LEVEL-5, LAYTON’S... | Read more »
Full Throttle Remastered (Games)
Full Throttle Remastered 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Originally released by LucasArts in 1995, Full Throttle is a classic graphic adventure game from industry legend Tim... | Read more »
Stunning shooter Morphite gets a new tra...
Morphite is officially landing on iOS in September. The game looks like the space shooter we've been needing on mobile, and we're going to see if it fits the bill quite shortly. The game's a collaborative effort between Blowfish Studios, We're Five... | Read more »
Layton's Mystery Journey arrives to...
As you might recall, Layton's Mystery Journey is headed to iOS and Android -- tomorrow! To celebrate the impending launch, Level-5's released a new trailer, complete with an adorable hamster. [Read more] | Read more »
Sidewords (Games)
Sidewords 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Grab a cup of coffee and relax with Sidewords. Sidewords is part logic puzzle, part word game, all original. No timers. No... | Read more »
Noodlecake Games' 'Leap On!...
Noodlecake Games is always good for some light-hearted arcade fun, and its latest project, Leap On! could carry on that tradition. It's a bit like high stakes tetherball in a way. Your job is to guide a cute little blob around a series of floating... | Read more »
RuneScape goes mobile later this year
Yes, RuneScape still exists. In fact, it's coming to iOS and Android in just a few short months. Jagex, creators of the hit fantasy MMORPG of yesteryear, is releasing RuneScape Mobile and Old School RuneScape for mobile devices, complete with... | Read more »
Crash of Cars wants you to capture the c...
Crash of Cars is going full on medieval in its latest update, introducing castles and all manner of new cars and skins fresh from the Dark Ages. The update introduces a new castle-themed map (complete with catapults) and a gladiator-style battle... | Read more »

Price Scanner via MacPrices.net

Save or Share
FotoJet Designer, is a simple but powerful new graphic design apps available on both Mac and Windows. With FotoJet Designer’s 900+ templates, thousands of resources, and powerful editing tools you... Read more
Logo Maker Shop iOS App Lets Businesses Get C...
A newly released app is designed to help business owners to get creative with their branding by designing their own logos. With more than 1,000 editable templates, Logo Maker Shop 1.0 provides the... Read more
Sale! New 15-inch MacBook Pros for up to $150...
Amazon has the new 2017 15″ MacBook Pros on sale for up to $150 off MSRP including free shipping: – 15″ 2.8GHz MacBook Pro Space Gray: $2249 $150 off MSRP – 15″ 2.89Hz MacBook Pro Space Gray: $2779 $... Read more
DEVONthink To Go 2.1.7 For iOS Brings Usabili...
DEVONtechnologies has updated DEVONthink To Go, the iOS companion to DEVONthink for Mac, with enhancements and bug fixes. Version 2.1.7 adds an option to clear the Global Inbox and makes the grid... Read more
15-inch 2.2GHz Retina MacBook Pro, Apple refu...
Apple has Certified Refurbished 2015 15″ 2.2GHz Retina MacBook Pros available for $1699. That’s $300 off MSRP, and it’s the lowest price available for a 15″ MacBook Pro. An Apple one-year warranty is... Read more
13-inch 2.3GHz Silver MacBook Pro on sale for...
B&H Photo has the new 2017 13″ 2.3GHz/256GB Silver MacBook Pro (MPXU2LL/A) on sale for $1399 including free shipping plus NY & NJ sales tax only. Their price is $100 off MSRP. Read more
Apple Tackles Distracted Driving With iOS 11...
One of the most important new features coming in iOS 11 is Do Not Disturb while driving, intended to help drivers stay more focused on the road. With Do Not Disturb while driving, your iPhone can... Read more
iMazing Mini for Mac: Free Automatic and Priv...
Geneva, Switzerland-based indie developer DigiDNA has released iMazing Mini, their free macOS utility designed to automatically back up iOS devices over any local Wi-Fi network. The app offers users... Read more
Clearance 2016 13-inch MacBook Airs, Apple re...
Apple dropped prices recently on Certified Refurbished 2016 13″ MacBook Airs, with models now available starting at $809. An Apple one-year warranty is included with each MacBook, and shipping is... Read more
9.7-inch 2017 iPads available for $299, save...
B&H Photo has 2017 9.7″ 32GB WiFi iPads on sale for $30 off MSRP for a limited time. Shipping is free, and pay sales tax in NY & NJ only: – 32GB iPad WiFi: $299, $30 off Read more

Jobs Board

*Apple* Retail - Multiple Positions - Apple...
SalesSpecialist - Retail Customer Service and SalesTransform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions - Apple...
SalesSpecialist - Retail Customer Service and SalesTransform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
Senior Payments Architect - *Apple* Pay - A...
Changing the world is all in a day's work at Apple . If you love innovation, here's your chance to make a career of it. You'll work hard. But the job comes with more Read more
Frameworks Engineering Manager, *Apple* Wat...
Frameworks Engineering Manager, Apple Watch Job Number: 41632321 Santa Clara Valley, California, United States Posted: Jun. 15, 2017 Weekly Hours: 40.00 Job Summary Read more
Manager, *Apple* Media Products - Apple Inc...
Job Summary The Apple Media Products Discovery, Fraud and Abuse team is responsible for protecting the integrity of Apple services. As a manager of the team, you Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.