TweetFollow Us on Twitter

Optimizing for PPC
Volume Number:12
Issue Number:5
Column Tag:Book Review

The Need for Speed

Learn the nitty-gritty of PowerPC optimization

By Mike Scanlin

Optimizing PowerPC Code:
Programming the PowerPC Chip in Assembly Language

By Gary Kacmarcik

Addison-Wesley, 1995

ISBN 0-201-40839-2, 694 pages (softback). $39.95.

I’m disappointed. It’s just no challenge any more. It took me years of careful trial, error, repeated error, and determined study, to perfect my 680x0 optimizing skills to the point where I really understood the chip from a software point of view. I was looking forward to the same kind of challenge on the PowerPC (scrounging for obscure magazine articles, surfing the net looking for example code, writing and timing code three different ways, disassembling all the programs with good performance to see how they did it, etc.). But now that I’ve read this book, all the hard theory has been taken care of, and the only thing remaining is to do a few PowerPC assembly language projects and put the theory to the test. Mr. Kacmarcik has cut short my search for knowledge by writing a book which makes plain everything about the PowerPC chip, including the subtle pipeline and cache interactions that a true optimizer wants to know.

This book is intended for programmers with some high-level experience and at least a little experience with assembly language. It does not explain what hexadecimal means, for example, but it does define concepts like “latency” and “throughput”.

The first nine of the sixteen chapters review in precise detail the entire PowerPC instruction set and architecture. The purpose of these chapters is to broaden the audience for this book. Anyone with PowerPC experience could skim these 170 pages in an hour or so. For the rest, though, it is a reasonable starting point. Unfortunately, there are too few examples for the descriptions of the individual instructions to be meaningful. It’s like someone handing you a book on how to write poetry where the first hundred pages are a dictionary explaining all the words you can use in your poems but not really giving you the context or any examples to appreciate them. It’s hard to separate the really important stuff (like everyday instructions, registers and concepts) from the stuff that was just put in for the sake of completeness. An uninitiated person who tries to understand it all will probably become overwhelmed. I can accept that these chapters are meant to be an introduction and a bit of a reference (in addition to the complete references in the appendices), but it’s a little too much, too soon, in my opinion.

The next seven chapters, and especially Appendix D, are the reason to buy this book. They contain the info that is hard to find elsewhere. The chapter titles will give you a good idea of what you’ll find:

10. Memory and Caches

11. Pipelining

12. PowerPC 601 Instruction Timing

13. Programming Model [C calling conventions]

14. Introduction to Optimizing

15. Resource Scheduling

16. More Optimization Techniques

Appendix D. Optimization Summary

The cache discussion reviews how set-associative caches work. This is good info that you can apply to designing your own caches in higher-level languages like C. It is interesting to read that cache simulations have shown nearly identical cache hit rates for caches with random line-replacement algorithms and caches with least-recently-used line-replacement algorithms. There are tidbits of useful information sprinkled throughout this chapter, such as the sentence, “According to the PowerPC ISA, the programmer should assume that the processor has a split (instruction/data) cache, and that the processor will not automatically keep the instruction cache consistent with data written via the store instructions (that is, with the data cache).” Writers of self-modifying code, beware.

Even though the cache discussion is complete, it illustrates a problem that several of the chapters have: it’s missing down-to-earth examples. For instance, it says the 601 has “a unified 32K, eight-way set associative cache”, and explains what that means technically, but it doesn’t go on to tell me how far apart two addresses need to be before they map to the same cache line. If I’m working on an image-filtering application, it is really useful to know what sizes not to use for rowBytes (to avoid thrashing the data cache) if my algorithm visits all the pixels down a vertical column.

The instruction timing chapter was one of my favorites. Here’s an example of the kind of precision you can expect:

The Multiply Low Immediate (mulli) instruction always takes five cycles in IE. The length of time that the other multiply instructions spend in IE is dependent on the data contained in rB. If the upper 16 bits of rB are all sign bits, then the instruction spends five cycles in IE, otherwise it spends nine cycles. This means that the lesser (in magnitude) of the two arguments should be placed in rB because there is a potential savings of four cycles if -2^15 <= rB < (2^15 - 1).

All your favorite timing topics are handled here along with micro-examples to illustrate each stage of the pipeline for the entire sequence of instructions. Topics include: branch prediction (taken and not taken), cache hits and misses, pipeline synchronization, pipeline stalls, misaligned data accesses, and more. Here’s another example of the kind of details you’ll find. This is from the discussion of instruction fetching:

This may seem like a strange thing to affect timing, but the address affects where the data will be stored in the cache, and the cache timing is different when the request is from the upper or lower part of a cache line. If your timings always assume that you’ll receive four or eight instructions at a time, you may be surprised when the code is timed on a real system . For a critical loop, it might be worthwhile to place a few nops before the loop so that it fits nicely into a cache line.

The programming model chapter was good. I especially liked the explanation of how leaf routines that don’t need more than 220 bytes of stack space don’t need to allocate a stack frame (because, by convention, interrupt routines know not to use the 220 bytes above the current stack pointer - known as the “Red Zone” in Inside Macintosh). This chapter also discusses why you should not use the Load and Store Multiple instructions.

I must say I was disappointed that the chapter titled “Introduction To Optimizing” was only eight pages long. I was hoping that after plowing through 300 pages of details I would finally get to see 100 lines of before and after PowerPC assembly. But I didn’t. So I kept plowing ahead and on page 317 I found out that, as a rule of thumb, I should always place two independent instructions between two branches that are taken (jumps to subroutines, perhaps). As I got further and further into the book I would find a gem like this every 20 to 50 pages. I couldn’t help but think: “These are the really useful pieces of information; why can’t he just list everything like this and give lots of examples?” Then I found Appendix D.

Appendix D begins on page 677 and ends on page 678. But those are the two best pages in the whole book. If you want to apply the 90-10 rule to reading this book and you only have time to read two pages, then you better make it these two - they are the “rules of thumb” to follow when writing PowerPC assembly code. If you do these things right then a large portion of your optimizing job will be done.

This is a great book. I was frustrated that I had to read almost 700 pages before I found the summary of tricks that I was looking for. But there are lots of little bits sprinkled throughout, such as the table on page 347 that shows how to multiply something by 3 through 10 with no more than 3 integer shifts, adds and subtracts. Mechanically, the book is beautiful to read. It is nicely typeset with fonts, font sizes and diagrams well chosen.

My biggest complaint is that I want to see real-world code examples (i.e. more than five instruction sequences) in action. I’d like the author to provide some high-resolution timer code so that I can time my own code and know if I’ve made a difference (how about a performance workbench to experiment with?). And I’d like to see things like a C program calling some performance bottleneck written in assembly so I could get a bigger picture of how all this code fits together in a real program. Nevertheless, if you have any interest in writing fast PowerPC code, you should buy this book.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Smultron 9.4.2 - Easy-to-use, powerful t...
Smultron 9 is an elegant and powerful text editor that is easy to use. Use it to create or edit any text document. Everything from a web page, a note or a script to any single piece of text or code.... Read more
Xcode 9.0 - Integrated development envir...
Xcode includes everything developers need to create great applications for Mac, iPhone, iPad, and Apple Watch. Xcode provides developers a unified workflow for user interface design, coding, testing... Read more
iShowU Instant 1.2.0 - Full-featured scr...
iShowU Instant gives you real-time screen recording like you've never seen before! It is the fastest, most feature-filled real-time screen capture tool from shinywhitebox yet. All of the features you... Read more
Apple Safari 11.0 - Apple's Web bro...
Note: The direct download link is currently unavailable. It is available in the OS X 10.12.6 release, as well as in the Apple Security Updates. Apple Safari is Apple's web browser that comes with... Read more
Typinator 7.3 - Speedy and reliable text...
Typinator turbo-charges your typing productivity. Type a little. Typinator does the rest. We've all faced projects that require repetitive typing tasks. With Typinator, you can store commonly used... Read more
ExpanDrive 6.0.16 - Access cloud storage...
ExpanDrive builds cloud storage in every application, acts just like a USB drive plugged into your Mac. With ExpanDrive, you can securely access any remote file server directly from the Finder or... Read more
coconutBattery 3.6.4 - Displays info abo...
With coconutBattery you're always aware of your current battery health. It shows you live information about your battery such as how often it was charged and how is the current maximum capacity in... Read more
NTFS 15.0.911 - $19.95
NTFS breaks down the barriers between Windows and macOS. Paragon NTFS effectively solves the communication problems between the Mac system and NTFS. Write, edit, copy, move, delete files on NTFS... Read more
Apple iOS 11 - The latest version of App...
iOS 11 sets a new standard for what is already the world’s most advanced mobile operating system. It makes iPhone better than before. It makes iPad more capable than ever. And now it opens up both to... Read more
BetterTouchTool 2.302 - Customize Multi-...
BetterTouchTool adds many new, fully customizable gestures to the Magic Mouse, Multi-Touch MacBook trackpad, and Magic Trackpad. These gestures are customizable: Magic Mouse: Pinch in / out (zoom... Read more

The best games to play while you wait fo...
SteamWorld Dig 2 is out this week on PC and Switch, and people are understandably excited. This clever series by Image and Form combines our favorite metroidvania mechanics with an esquisite universe, excellent storytelling, and true wit. While... | Read more »
Drag'n'Boom beginner's gu...
Have you ever wanted to burn and pillage a village as a bloodthirsty dragon? If you answered yes to that question, Drag'n'Boom offers you the perfect chance to do so, casting you as an adorable little dragon that wants to set humankind aflame. It... | Read more »
Thimbleweed Park (Games)
Thimbleweed Park 1.0.0 Device: iOS Universal Category: Games Price: $9.99, Version: 1.0.0 (iTunes) Description: A brand new adventure game from Ron Gilbert and Gary Winnick, creators of the classics Monkey Island and Maniac Mansion!... | Read more »
The best simulation games on mobile
There's nothing like a good sim -- from the seemingly ridiculous to the incredibly mundane, you can be there's a simulation game out there for your every whim. [Read more] | Read more »
INKS guide - how to create works of pinb...
INKS puts a clever new spin on everyone's favorite classic arcade game, pinball. The core mechanics are the same -- keep a little ball pinging around the board for as long as possible without letting it fall into the precarious holes in the board.... | Read more »
Warbands: Bushido (Games)
Warbands: Bushido 1.0 Device: iOS Universal Category: Games Price: $3.99, Version: 1.0 (iTunes) Description: Warbands:Bushido is a miniatures board game with cards, miniatures, dice and beautiful terrains to fight on, with both... | Read more »
The best mobile games like Divinity: Ori...
Divinity: Original Sin 2 launched this week to the excitement of RPG fans everywhere. The game, which derives a lot of of its story and mechanics from old-school isometric RPGs and Dungeons & Dragons, has unseated PlayerUnknown's... | Read more »
Iron Marines guide - beginner tips and t...
Iron Marines is a brilliant RTS title that feels a bit like Starcraft. It's got a sci-fi setting and some of the most spectacular strategy mechanics we've seen in mobile games to date. With that said, the RTS genre can be a bit tricky to break... | Read more »
The best new games we played this week -...
The work week can be tough, but on the bright side, it's almost overandthere are bunches of brand new games to try out this weekend. This week definitely makes up for last week's sleepiness ten-fold. We've got one of the finest RTS game on mobile... | Read more »
Through the Ages (Games)
Through the Ages 1.0.60 Device: iOS Universal Category: Games Price: $9.99, Version: 1.0.60 (iTunes) Description: The offical adaptation of Vlaada Chvátil’s strategy classic, the second best board game ever by Board Game Geek website... | Read more »

Price Scanner via MacPrices.net

Apple Refurbished 3TB Time Capsule for $279,...
Apple has Certified Refurbished 3TB Time Capsules available for $279 including free shipping plus Apple’s standard one-year warranty. Their price is $120 off MSRP. Read more
19% off Smart Battery Cases for iPhone 7
Amazon has both Black and White Smart Battery Cases for iPhone 7s available for $80.41 including free shipping. Their price is $18.59, or 19%, off MSRP. Read more
Back on sale: 10.5-inch 64GB iPad Pros for $5...
MacMall has 10.5″ 64GB Apple iPad Pros on sale again for $599 including free shipping. That’s $50 off MSRP and the lowest price available for this model from any reseller. Read more
Verizon offers Certified Preowned 16GB iPhone...
Verizon has the 16GB iPhone 6, Certified Preowned, available for $259.99 or $10.83 per month for 24 months. Service plan required. According to Verizon, “All CPO devices have been reconditioned to... Read more
Preorder new iPhone 8 at US Cellular, and tak...
Preorder the new iPhone 8 or iPhone 8 Plus at US Cellular, and take $50 off the prepaid price: – 64GB iPhone 8: $649.99 – 128GB iPhone 8: $799.99 – 64GB iPhone 8 Plus: $749.99 – 128GB iPhone 8 Plus... Read more
12-inch and 9-inch Apple iPad Pros, Certified...
Apple has Certified Refurbished 2016 12″ WiFi iPad Pros available starting at $589. An Apple one-year warranty is included with each model, and shipping is free: – 32GB 12″ iPad Pro WiFi: $589... Read more
QuickerTek Announces Solar PV Chargers for US...
Wichita, Kansas based QuickerTek has announced its new 30 Watt and 60 Watt USB Type-C Solar Juicz Chargers. These solar panels are the only products of their kind, featuring the USB 3.1 adapter cable... Read more
Apple refurbished 128GB iPhone 6s and 6s Plus...
Apple has Certified Refurbished 128GB iPhone 6s and 6s Plus’ available for up to $100 off the price of new models. Space Gray, Silver, Gold, and Rose Gold models are available. Each phone comes... Read more
13-inch 2.3GHz Silver MacBook Pros on sale fo...
B&H Photo has 2017 13″ 2.3GHz Silver MacBook Pros in stock today and on sale for $100 off MSRP, each including free shipping plus NY & NJ sales tax only: – 13-inch 2.3GHz/128GB Silver... Read more
12-inch 64GB iPad Pros available for $749, $5...
MacMall has 12″ 64GB iPad Pros on sale for $749 including free shipping. Their price is $50 off MSRP. Read more

Jobs Board

Development Operations and Site Reliability E...
Development Operations and Site Reliability Engineer, Apple Payment Gateway Job Number: 57572631 Santa Clara Valley, California, United States Posted: Jul. 27, 2017 Read more
*Apple* Solutions Consultant - Apple Inc. (U...
…about helping others on a team while also delighting customers? As an Apple Solutions Consultant (ASC), you will discover customers needs and help connect them Read more
Software/Data Engineer, *Apple* Media Produ...
Job Summary Apple Media Products is the team behind the App Store, Apple Music, iTunes, and many other high profile products on iPhone, Mac and AppleTV. Our Data Read more
SW Engineer , *Apple* Media - Apple Inc. (U...
Job Summary Our team is responsible for exposing Apple Media content and services to the world, and building the infrastructure for next generation internal and Read more
*Apple* Data Center Site Selection and Strat...
Job Summary As Apple 's products and services scale the globe, the Data Center Affairs team works behind the scenes to secure infrastructure for Apple 's data Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.