TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

A Better Finder Rename 10.00b1 - File, p...
A Better Finder Rename is the most complete renaming solution available on the market today. That's why, since 1996, tens of thousands of hobbyists, professionals and businesses depend on A Better... Read more
CrossOver 14.1.6 - Run Windows apps on y...
CrossOver can get your Windows productivity applications and PC games up and running on your Mac quickly and easily. CrossOver runs the Windows software that you need on Mac at home, in the office,... Read more
Printopia 2.1.14 - Share Mac printers wi...
Run Printopia on your Mac to share its printers to any capable iPhone, iPad or iPod Touch. Printopia will also add virtual printers, allowing you to save print-outs to your Mac and send to apps.... Read more
Google Drive 1.24 - File backup and shar...
Google Drive is a place where you can create, share, collaborate, and keep all of your stuff. Whether you're working with a friend on a joint research project, planning a wedding with your fiancé, or... Read more
Chromium 45.0.2454.85 - Fast and stable...
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all Internet users to experience the web. Version 45.0.2454.85: Note: Does not contain the "... Read more
OmniFocus 2.2.5 - GTD task manager with...
OmniFocus helps you manage your tasks the way that you want, freeing you to focus your attention on the things that matter to you most. Capturing tasks and ideas is always a keyboard shortcut away in... Read more
iFFmpeg 5.7.1 - Convert multimedia files...
iFFmpeg is a graphical front-end for FFmpeg, a command-line tool used to convert multimedia files between formats. The command line instructions can be very hard to master/understand, so iFFmpeg does... Read more
VOX 2.6 - Music player that supports man...
VOX is a beautiful music player that supports many filetypes. The beauty is in its simplicity, yet behind the minimal exterior lies a powerful music player with a ton of features and support for all... Read more
Box Sync 4.0.6567 - Online synchronizati...
Box Sync gives you a hard-drive in the Cloud for online storage. Note: You must first sign up to use Box. What if the files you need are on your laptop -- but you're on the road with your iPhone? No... Read more
Carbon Copy Cloner 4.1.4 - Easy-to-use b...
Carbon Copy Cloner backups are better than ordinary backups. Suppose the unthinkable happens while you're under deadline to finish a project: your Mac is unresponsive and all you hear is an ominous,... Read more

You Can Play Madfinger Games' Unkil...
Madfinger Games - probably best known for the Dead Trigger series - has officially launched their newest zombie shooter (that isn't called Dead Trigger), named Unkilled. [Read more] | Read more »
KORG iELECTRIBE for iPhone (Music)
KORG iELECTRIBE for iPhone 1.0.1 Device: iOS iPhone Category: Music Price: $9.99, Version: 1.0.1 (iTunes) Description: ** 50% OFF Special Launch Sale - For a Limited Time **The ELECTRIBE reborn in an even smaller form A full-fledged... | Read more »
I am Bread (Games)
I am Bread 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: ‘I am Bread’ is the latest quirky adventure from the creators of 'Surgeon Simulator', Bossa Studios. This isn't the best thing... | Read more »
Rock(s) Rider - HD Edition (Games)
Rock(s) Rider - HD Edition 1.0.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.0 (iTunes) Description: *** PLEASE NOTE: Compatible with iPhone 4s, iPad 2, iPad mini, iPod touch (5th generation) or newer *** Do you... | Read more »
Rebuild 3: Gangs of Deadsville (Games)
Rebuild 3: Gangs of Deadsville 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: It's been a few years since the zombpocalypse turned the world's cities into graveyards and sent the few... | Read more »
Power Ping Pong (Games)
Power Ping Pong 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Do you wield your bat with zen-like focus or do your balls of fury give you a killer spin? Table tennis goes mobile with a... | Read more »
Z.O.N.A Project X (Games)
Z.O.N.A Project X 1.00 Device: iOS Universal Category: Games Price: $1.99, Version: 1.00 (iTunes) Description: Z.O.N.A Project X - shooter in the post-apocalyptic world. | Read more »
Trick Shot (Games)
Trick Shot 1.0.6 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.6 (iTunes) Description: A game where all you have to do is throw a ball into a box, simple? Trick Shot is a minimalist physics puzzler with 90 levels... | Read more »
VoxelCity (Games)
VoxelCity 1.0.2 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.2 (iTunes) Description: Looking for a new city builder? Tired of social media anti-games with no strategy? Look no further! NO IAP EVER! VoxelCity is a... | Read more »
Goat Simulator MMO Simulator (Games)
Goat Simulator MMO Simulator 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: ** IMPORTANT - SUPPORTED DEVICESiPhone 4S, iPad 2, iPod Touch 5 or better.** Coffee Stain Studios brings next-gen... | Read more »

Price Scanner via MacPrices.net

Near-Office Input Functionality Virtually Any...
Today Logitech introduced the Logitech K380 Multi-Device Bluetooth Keyboard and the Logitech M535 Bluetooth Mouse, giving users the freedom to work on any device, most anywhere. According to... Read more
College Student Deals: Additional $100 off Ma...
Take an additional $100 off all MacBooks and iMacs at Best Buy Online with their College Students Deals Savings, valid through September 4, 2015. Anyone with a valid .EDU email address can take... Read more
Will You Buy An iPad Pro? – The ‘Book Mystiqu...
It looks like we may not have to wait much longer to see what finally materializes as a new, larger-panel iPad (Pro/Plus?) Usually reliable Apple product prognosticator KGI Securities analyst Ming-... Read more
eFileCabinet Announces SMB Document Managemen...
Electronic document management (EDM) eFileCabinet, Inc., a hosted solutions provider for small to medium businesses, has announced that its SecureDrawer and eFileCabinet Online services will be... Read more
WaterField Designs Unveils American-Made, All...
San Francisco’s WaterField Designs today unveiled their all-leather Cozmo 2.0 — an elegant attach laptop bag with carefully-designed features to suit any business environment. The Cozmo 2.0 is... Read more
Apple’s 2015 Back to School promotion: Free B...
Purchase a new Mac or iPad at The Apple Store for Education and take up to $300 off MSRP. All teachers, students, and staff of any educational institution qualify for the discount. Shipping is free,... Read more
128GB MacBook Airs on sale for $100 off MSRP,...
B&H Photo has 11″ & 13″ MacBook Airs with 128GB SSDs on sale for $100 off MSRP. Shipping is free, and B&H charges NY sales tax only: - 11″ 1.6GHz/128GB MacBook Air: $799.99, $100 off MSRP... Read more
13-inch 2.5GHz MacBook Pro (refurbished) avai...
The Apple Store has Apple Certified Refurbished 13″ 2.5GHz MacBook Pros available for $829, or $270 off the cost of new models. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.... Read more
27-inch 3.2GHz iMac on sale for $1679, save $...
B&H Photo has the 27″ 3.2GHz iMac on sale for $1679.99 including free shipping plus NY sales tax only. Their price is $120 off MSRP. Read more
Apple and Cisco Partner to Deliver Fast-Lane...
Apple and Cisco have announced a partnership to create a “fast lane” for iOS business users by optimizing Cisco networks for iOS devices and apps. The alliance integrates iPhone with Cisco enterprise... Read more

Jobs Board

*Apple* Desktop Analyst - KDS Staffing (Unit...
…field and consistent professional recruiting achievement. Job Description: Title: Apple Desktop AnalystPosition Type: Full-time PermanentLocation: White Plains, NYHot Read more
Simply Mac *Apple* Specialist- Repair Techn...
Simply Mac is the greatest premier retailer of Apple products expertise in North America. We're looking for dedicated individuals to provide personalized service and Read more
Simply Mac *Apple* Specialist- Service Repa...
Simply Mac is the greatest premier retailer of Apple products expertise in North America. We're looking for dedicated individuals to provide personalized service and Read more
*Apple* Desktop Analyst - KDS Staffing (Unit...
…field and consistent professional recruiting achievement. Job Description: Title: Apple Desktop AnalystPosition Type: Full-time PermanentLocation: White Plains, NYHot Read more
Simply Mac- *Apple* Specialist- Store Manag...
Simply Mac is the largest premier retailer for Apple products and solutions. We're looking for dedicated individuals with a passion to simplify and enhance the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.