TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

PDFKey Pro 4.3 - Edit and print password...
PDFKey Pro can unlock PDF documents protected for printing and copying when you've forgotten your password. It can now also protect your PDF files with a password to prevent unauthorized access and/... Read more
Kodi 15.0.beta1 - Powerful media center...
Kodi (was XBMC) is an award-winning free and open-source (GPL) software media player and entertainment hub that can be installed on Linux, OS X, Windows, iOS, and Android, featuring a 10-foot user... Read more
DiskCatalogMaker 6.4.12 - Catalog your d...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast. Finder-like intuitive look and feel. Super-fast search algorithm. Can compress catalog data... Read more
Macs Fan Control 1.3.0.0 - Monitor and c...
Macs Fan Control allows you to monitor and control almost any aspect of your computer's fans, with support for controlling fan speed, temperature sensors pane, menu-bar icon, and autostart with... Read more
Lyn 1.5.11 - Lightweight image browser a...
Lyn is a lightweight and fast image browser and viewer designed for photographers, graphic artists and Web designers. Featuring an extremely versatile and aesthetically pleasing interface, it... Read more
NeoOffice 2014.11 - Mac-tailored, OpenOf...
NeoOffice is a complete office suite for OS X. With NeoOffice, users can view, edit, and save OpenOffice documents, PDF files, and most Microsoft Word, Excel, and PowerPoint documents. NeoOffice 3.x... Read more
LaunchBar 6.4 - Powerful file/URL/email...
LaunchBar is an award-winning productivity utility that offers an amazingly intuitive and efficient way to search and access any kind of information stored on your computer or on the Web. It provides... Read more
Remotix 3.1.4 - Access all your computer...
Remotix is a fast and powerful application to easily access multiple Macs (and PCs) from your own Mac. Features Complete Apple Screen Sharing support - including Mac OS X login, clipboard... Read more
DesktopLyrics 2.6.6 - Displays current i...
DesktopLyrics is an application that displays the lyrics of the song currently playing in "iTunes" right on your desktop. The lyrics for the song have to be set in iTunes; DesktopLyrics does nothing... Read more
VOX 2.5.1 - Music player that supports m...
VOX is a beautiful music player that supports many filetypes. The beauty is in its simplicity, yet behind the minimal exterior lies a powerful music player with a ton of features and support for all... Read more

This Week at 148Apps: May 18-22, 2015
May Days at 148Apps How do you know what apps are worth your time and money? Just look to the review team at 148Apps. We sort through the chaos and find the apps you're looking for. The ones we love become Editor’s Choice, standing out above the... | Read more »
Biz Builder Delux (Games)
Biz Builder Delux 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: Ah, there's nothing like the rhythmic bustle of a burgeoning business burg... especially when you're the one building it... | Read more »
Auroch Digital is Bringing Back Games Wo...
| Read more »
Blades of Brim is a New Endless Runner f...
SYBO Games, the minds behind the ever-popular Subway Surfers, have announced their latest project: Blades of Brim. [Read more] | Read more »
Carbo - Handwriting in the Digital Age...
Carbo - Handwriting in the Digital Age 1.0 Device: iOS Universal Category: Productivity Price: $3.99, Version: 1.0 (iTunes) Description: | Read more »
Draggy Dead (Games)
Draggy Dead 1.1 Device: iOS Universal Category: Games Price: $.99, Version: 1.1 (iTunes) Description: Ditch your dead end job and take up a rewarding career in Grave Robbing today!Guide the recently deceased to a fun filled life of... | Read more »
Bad Dinos (Games)
Bad Dinos 1.0.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.0 (iTunes) Description: | Read more »
The Apple Watch isn't Great as a Fi...
| Read more »
Show the World What You See With Stre.am...
Live broadcasting is getting popular on mobile devices, which is why you can now get Stre.am, by Infinite Takes. [Read more] | Read more »
PhotoTime's 2.1 Update Adds Apple W...
The latest PhotoTime update is adding even more functionality to the handy photo organizing app. Yep, including Apple Watch support. [Read more] | Read more »

Price Scanner via MacPrices.net

12-inch MacBook stock status for Monday, May...
The new 12″ Retina MacBooks are still on backorder at The Apple Store with a 3-5 week waiting period. However, a few models are in stock today at Apple resellers. Stock is limited, so act now if you’... Read more
New 27-inch 3.3GHz 5K iMac in stock with free...
Adorama has the new 27″ 3.3GHz 5K iMac in stock today for $1999 including free shipping plus NY & NJ sales tax only. Adorama will include a free copy of Apple’s 3-year AppleCare Protection Plan. Read more
Memorial Day Weekend Sale: New 27-inch 3.3GHz...
Best Buy has the new 27″ 3.3GHz 5K iMac on sale for $1899.99 this weekend. Choose free shipping or free local store pickup (if available). Sale price for online orders only, in-store prices may vary... Read more
OtterBox Maximizes Portability, Productivity...
From the kitchen recipe book to the boarsroom presentation, the OtterBox Agility Tablet System turns tablets into one of the most versatile pieces of handheld technology available. Available now, the... Read more
Launch of New Car App Gallery and Open Develo...
Automatic, a company on a mission to bring the power of the Internet into every car, has announced the launch of the Automatic App Gallery, an app store for nearly every car or truck on the road... Read more
Memorial Day Weekend Sale: 13-inch 1.6GHz Mac...
Best Buy has the new 13″ 1.6GHz/128GB MacBook Air on sale for $849 on their online store this weekend. Choose free shipping or free local store pickup (if available). Sale price for online orders... Read more
Memorial Day Weekend Sale: 27-inch 3.5GHz 5K...
Best Buy has the 27″ 3.5GHz 5K iMac on sale for $2099.99 this weekend. Choose free shipping or free local store pickup (if available). Sale price for online orders only, in-store prices may vary.... Read more
Sale! 16GB iPad mini 3 for $349, save $50
B&H Photo has the 16GB iPad mini 3 WiFi on sale for $349 including free shipping plus NY sales tax only. Their price is $50 off MSRP, and it’s the lowest price available for this model. Read more
Price drop on 2014 15-inch Retina MacBook Pro...
B&H Photo has dropped prices on 2014 15″ Retina MacBook Pros by $200. Shipping is free, and B&H charges NY sales tax only: - 15″ 2.2GHz Retina MacBook Pro: $1799.99 save $200 - 15″ 2.5GHz... Read more
With a Mission to Make Mobile Free, Scratch W...
Scratch Wireless, claiming to be the world’s first truly free mobile service, has announced the availability of a new Scratch-enabled Android smartphone, the Coolpad Arise. The smartphone is equipped... Read more

Jobs Board

Payments Counsel, *Apple* Pay (mobile payme...
**Job Summary** Apple is looking for an atto ey to join Apple 's Legal Department to support Apple Pay. **Key Qualifications** 7+ years of relevant experience Read more
Touch Hardware Design and Integration Enginee...
…Summary** Design, develop, and launch next-generation Touch solutions in the new Apple Watch product category. The Touch team develops cutting-edge Touch solutions and Read more
*Apple* Solutions Consultant - Retail Sales...
**Job Summary** As an Apple Solutions Consultant (ASC) you are the link between our customers and our products. Your role is to drive the Apple business in a retail Read more
*Apple* TV Live Streaming Frameworks Test En...
**Job Summary** Work and contribute towards the engineering of Apple 's state-of-the-art products involving video, audio, and graphics in Interactive Media Group (IMG) at Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.