TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Firefox 52.0.2 - Fast, safe Web browser.
Firefox offers a fast, safe Web browsing experience. Browse quickly, securely, and effortlessly. With its industry-leading features, Firefox is the choice of Web development professionals and casual... Read more
Google Chrome 57.0.2987.133 - Modern and...
Google Chrome is a Web browser by Google, created to be a modern platform for Web pages and applications. It utilizes very fast loading of Web pages and has a V8 engine, which is a custom built... Read more
RapidWeaver 7.3.3 - Create template-base...
RapidWeaver is a next-generation Web design application to help you easily create professional-looking Web sites in minutes. No knowledge of complex code is required, RapidWeaver will take care of... Read more
Chromium 57.0.2987.133 - Fast and stable...
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all Internet users to experience the web. Version 27.0.2987.133: Note: This update has no Flash... Read more
Lyn 1.8.8 - Lightweight image browser an...
Lyn is a fast, lightweight image browser and viewer designed for photographers, graphic artists, and Web designers. Featuring an extremely versatile and aesthetically pleasing interface, it delivers... Read more
Adobe Animate CC 2017 16.2.0 - Advanced...
Animate CC 2017 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Flash Professional customer). Animate CC 2017 (was Flash CC) lets you... Read more
Tunnelblick 3.7.0 - GUI for OpenVPN.
Tunnelblick is a free, open source graphic user interface for OpenVPN on OS X. It provides easy control of OpenVPN client and/or server connections. It comes as a ready-to-use application with all... Read more
DEVONthink Pro 2.9.11 - Knowledge base,...
DEVONthink Pro is your essential assistant for today's world, where almost everything is digital. From shopping receipts to important research papers, your life often fills your hard drive in the... Read more
DiskCatalogMaker 6.8.1 - Catalog your di...
DiskCatalogMaker is a simple disk management tool which catalogs disks. Simple, light-weight, and fast Finder-like intuitive look and feel Super-fast search algorithm Can compress catalog data for... Read more
OmniGraffle 7.3 - Create diagrams, flow...
OmniGraffle helps you draw beautiful diagrams, family trees, flow charts, org charts, layouts, and (mathematically speaking) any other directed or non-directed graphs. We've had people use Graffle to... Read more

Dynasty Blades new update introduces a n...
Sharpen your weapons -- Dynasty Blades is back with new and improved hack n’ slash stylings. The Romance of the Three Kingdoms-inspired action MMORPG introduces a bunch of fun new features in its latest update. For the uninitiated, Dynasty Blades... | Read more »
Meganoid(2017) (Games)
Meganoid(2017) 1.0 Device: iOS Universal Category: Games Price: $3.99, Version: 1.0 (iTunes) Description: LAUNCH DISCOUNT 20% UNTIL APRIL 2nd! Support, tip and tricks: http://www.orangepixel.net/forum/ Subscribe to our newsletter... | Read more »
Telltale's Guardians of the Galaxy...
Telltale will be releasing their rendition of Guardians of the Galaxy later this month. The first episode, Tangled Up in Blue, features familiar faces including Star-Lord, Groot, Rocket, Gamora, and Drax. If the first episode's title is any... | Read more »
Royal Dungeon (Games)
Royal Dungeon 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: The king and his queen are trapped in their castle which suddenly turned out as a very dangerous place. The goal is to escape... | Read more »
Tom Clancy's ShadowBreak is a real-...
Ubisoft is treating Tom Clancy fans to the series' very first mobile-exclusive game in ShadowBreak, a real-time, multiplayer shooter in which players snipe at enemies in fast-paced tactics-driven combat. [Read more] | Read more »
Power Rangers: Legacy Wars beginner...
Rita Repulsa is back, but this time she's invading your mobile phone in Power Rangers: Legacy Wars. What looks to be a straightforward beat 'em up is actually a tough-as-nails multiplayer strategy game that requires some deft tactical maneuvering.... | Read more »
Hearthstone celebrates the upcoming Jour...
Hearthstone gets a new expansion, Journey to Un'Goro, in a little over a week, and they'll be welcoming the Year of the Mammoth, the next season, at the same time. There's a lot to be excited about, so Blizzard is celebrating in kind. Players will... | Read more »
4 smart and stylish puzzle games like Ty...
TypeShift launched a little over a week ago, offering some puzzling new challenges for word nerds equipped with an iOS device. Created by Zach Gage, the mind behind Spelltower, TypeShift boasts, like its predecessor, a sleak design and some very... | Read more »
The best deals on the App Store this wee...
Deals, deals, deals. We're all about a good bargain here on 148Apps, and luckily this was another fine week in App Store discounts. There's a big board game sale happening right now, and a few fine indies are still discounted through the weekend.... | Read more »
The best new games we played this week
It's been quite the week, but now that all of that business is out of the way, it's time to hunker down with some of the excellent games that were released over the past few days. There's a fair few to help you relax in your down time or if you're... | Read more »

Price Scanner via MacPrices.net

1.4GHz Mac mini on sale for $399, $100 off MS...
B&H Photo has the 1.4GHz Mac mini on sale for $100 off MSRP including free shipping plus NY sales tax only: - 1.4GHz Mac mini: $399 $100 off MSRP Sale ends on March 31st. Read more
13-inch 128GB MacBook Air on sale for $849, s...
B&H Photo has lowered their price on the 13″ 1.6GHz/128GB MacBook Air to $849, or $150 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 13″ 1.6GHz/128GB MacBook Air (MMGF2LL/A... Read more
Is Apple Planning An iPhone Based Modular Doc...
Today’s more powerful and larger-screened smartphones and phablets are becoming the default anchor computing device for more and more users computing devices, but even a five or six inch panel is not... Read more
Razer Launches New Razer Blade Pro World’s Fi...
Razer, the gaming and high performance hardware specialists, have announced the new Razer Blade Pro laptop — the first laptop to be qualified for THX Mobile Certification, an accreditation reserved... Read more
Gro CRM’s Apple Small Business Mac And iOS CR...
Gro Software, developers of the Mac CRM software for small business and enterprise, are included in FinancesOnline 2017 CRM Rising Stars and Great User Experience lists by business software review... Read more
Deal alert! 15-inch and 13-inch MacBook Pros...
B&H Photo has the new 2016 15″ and 13″ Apple MacBook Pros in stock today and on sale for up to $200 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 15″ 2.7GHz Touch Bar... Read more
Save up to $420 on a new MacBook Pro with App...
Apple is offering Certified Refurbished 2016 15″ and 13″ MacBook Pros, including some Touch Bar models, for up to $420 off original MSRP. An Apple one-year warranty is included with each model, and... Read more
12-inch 1.2GHz Retina MacBooks on sale for $1...
B&H has 12″ 1.2GHz Retina MacBooks on sale for up to $200 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 12″ 1.2GHz Space Gray Retina MacBook: $1449 $150 off MSRP - 12″ 1.... Read more
Is A New 10.5-inch iPad Still Coming In April...
There was no sign or mention of a long-rumored and much anticipated 10.5-inch iPad Pro in Apple’s product announcements last week. The exciting iPad news was release of an upgraded iPad Air with a... Read more
T-Mobile’s Premium Device Protection Now Incl...
Good news for T-Mobile customers who love their iPhones and iPads. The “Un-carrier” has become the first national wireless company to give customers AppleCare Services at zero additional cost as part... Read more

Jobs Board

Fulltime aan de slag als shopmanager in een h...
Ben jij helemaal gek van Apple -producten en vind je het helemaal super om fulltime shopmanager te zijn in een jonge en hippe elektronicazaak? Wil jij werken in Read more
Fulltime aan de slag als shopmanager in een h...
Ben jij helemaal gek van Apple -producten en vind je het helemaal super om fulltime shopmanager te zijn in een jonge en hippe elektronicazaak? Wil jij werken in Read more
Desktop Analyst - *Apple* Products - Montef...
…technology to improve patient care. JOB RESPONSIBILITIES: Provide day-to-day support for Apple Hardware and Software in the environment based on the team's support Read more
*Apple* Mobile Master - Best Buy (United Sta...
**493168BR** **Job Title:** Apple Mobile Master **Location Number:** 000827-Denton-Store **Job Description:** **What does a Best Buy Apple Mobile Master do?** At Read more
Fulltime aan de slag als shopmanager in een h...
Ben jij helemaal gek van Apple -producten en vind je het helemaal super om fulltime shopmanager te zijn in een jonge en hippe elektronicazaak? Wil jij werken in Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.