TweetFollow Us on Twitter

PowerPC Series
Volume Number:10
Issue Number:1
Column Tag:PowerPC Series

PowerPC Code Generation

What’s the difference between PowerPC and 68K machines?

By Peter A. Jacobson, Absoft Corp.

About the author

Peter is a principle of Absoft. Along with his partner Wood Lotz, he has been developing scientific and engineering software since 1979 for a wide variety of micro- and mini-computers.

This article will discuss some of this issues concerning code generation by high level language compilers for the IBM PowerPC RISC microprocessor. It will compare and contrast typical code generation strategies employed on CISC based architectures, such as the Motorola M68000 family of microprocessors, against the approach that might be taken with the PowerPC. The topics addressed will include addressing modes, register sets, instruction sets, instruction pipelines, and superscalar considerations. It should be understood that certain features of the PowerPC will be simplified and various aspects of code generation will be trivialized in order to facilitate this discussion.

It is difficult to arrive at a precise definition of what constitutes a RISC microprocessor. It rarely means that an individual machine actually has fewer instructions than its CISC counterpart. The PowerPC has over 230 instructions while an MC68020 has barely 100. The technology progresses so quickly that definitions are amended before they even come into common usage. In addition, features that were once ascribed only to RISC technology have found their way into CISC architectures. However, one of the most significant differences affecting code generation for RISC microprocessors is that instructions are restricted to one machine word in length and there are consequently only a limited number of instruction formats available which can access memory. Typically, load and store are the only memory operations provided and usually with extremely limited effective addressing modes. The PowerPC provides just one fundamental addressing mode: register indirect with index. The index, which may be either an immediate operand or a general purpose register, is added to a general purpose register to form the effective address. The immediate operand is encoded in the instruction and consists of a 16-bit value, sign extended to 32 bits. The register can be suppressed by specifying general purpose register R0 so that an address is formed from just the index. In this way, absolute addresses can be formed from the immediate operand, but they are limited to just the lowest and the highest 32768 bytes of memory.

Obviously, this restraint seriously affects code generation strategies. With a Motorola MC680x0, an efficient code generator could add 1 to a variable by incrementing a memory location directly with an ADDQ instruction. While for the PowerPC, it is necessary to first load the variable from memory, perform the addition, and then store the result. It might appear that this model leads to very inefficient code production, but there are many other factors that must be considered in generating code. First, most program variables are usually accessed more than once in a given procedure. Therefore, for either type of microprocessor, it is almost always more efficient to have the variable already available in a register, rather than repeatedly accessing main memory for it. Since RISC microprocessors provide such a limited number of instructions which can access memory, code generators must be capable of performing a very sophisticated analysis of program flow and allocate registers accordingly. RISC microprocessors typically provide a large set of registers that can be used to maintain copies of program variables. The Motorola MC68040, widely used in the current generation of Macintoshes, is limited to 8 data registers, 8 address registers, 8 floating point registers, and a condition code register. The PowerPC provides 32 general purpose registers, 32 floating point registers, a condition register divided into eight 4-bit fields, and six user-level special purpose registers. The general purpose registers can be used for both addresses and data. Just as with the MC68040, some of the registers are reserved for special purposes (such as stack pointer, data space pointer, etc.), but the PowerPC still provides a large number of registers for program use.

Compilers also create their own variables, many of which can have short, but very active life spans. Such compiler created variables are used for loop induction, array indexing, maintaining the intermediate results of expression evaluation, and so on. The register set represents the fastest memory available to the microprocessor and efficient register allocation is critical to program performance. Various register allocation schemes are used by code generators to insure that the most appropriate variables are allocated to registers, either temporarily within a region of code, or permanently, for the length of the procedure. Further, compilers do not necessarily immediately write the result of an assignment statement to memory. This is known as a delayed store and is employed to allow efficient scheduling of the instruction stream (discussed below). Indeed, variables which are local to a procedure may never be written to memory. However, regardless of how efficiently the compiler allocates variables to registers, it must provide a mechanism by which a programmer can indicate that a variable (and its associated memory location) is volatile. Processes in many real time systems often communicate with each other through memory locations and use memory mapped I/O to control or react to external devices. If, by setting a variable to a specific value, the programmer intends to control a valve or launch a missile, it would be inappropriate (to say the least) for the code generator not to update the associated memory location immediately.

To the programmer accustomed to the Motorola M68000 family of microprocessors and unfamiliar with RISC architectures, the instruction set of the PowerPC may seem initially puzzling. Nevertheless, the PowerPC architecture has much in common with other RISC microprocessors such as the SPARC, MC88110, R4400, and obviously POWER. The first significant difference is that most of the instructions take three operands, two sources and a destination, and several instructions take more. Also, there is no stack pointer, no instructions for calling subroutines, no obvious way to move the contents of a general purpose register to another general purpose register, and many other apparent deficiencies. (However, programmers familiar will older mainframes and mini-computers will find nothing new here.) Consider the following instruction:

 fnmsubs6,12,13,18

This is the “Floating Point Negative Multiply-Subtract (Single-Precision)” instruction. Since there is no ambiguity in the instruction set, registers are indicated by number only - register numbers cannot be confused with immediate values. This instruction says to multiply the operand in floating point register 12 by the operand in floating point register 13 and then subtract the operand in floating point register 18 from this intermediate value. The result is rounded, then negated, and finally placed in floating point register 6. The latency of this instruction is just 4 clocks - the total time it takes to execute the instruction and for the result to be available in the destination floating point register.

Since every instruction can have a destination operand different from its source(s), compilers are not forced to either copy or reload values (variables or expressions) that will be used multiple times in a block of code. This is important not only in avoiding unnecessary memory accesses, but as will be seen later, provides opportunities for exploiting the instruction pipeline and the superscalar nature of the PowerPC.

The problem of there being no stack pointer in the PowerPC architecture has been addressed by the various standards bodies concerned with the PowerPC. Through the formalization and adoption of ABIs (Application Binary Interfaces) the needs of high-level languages for a uniform stack pointer and stack frame have been addressed. General purpose register 1 is normally designated as the stack pointer and various locations in the frame have been reserved for house keeping purposes. A frame is often created by saving the current stack pointer and then subtracting the required frame amount from the stack pointer to create the new frame. In practice it is easier to accomplish this than it appears since one form of the store instruction will write the effective address of the destination into the register used to calculate the effective address:

 stwu   rS,d(rA)

This is the “Store Word with Update” instruction which says to store the contents of the source register rS at an effective address equal to the contents of general purpose register rA plus the immediate index value d and then place that effective address in rA. To create a frame, rS and rA would be 1 and d would be negative. The instruction would cause r1 to be stored at the location resulting from the calculation of the effective address r1-d and then update r1 to r1-d.

One of the most important locations in the frame is naturally where the return address for a subroutine call is stored. As stated earlier, the PowerPC does not have a subroutine call instruction - instead the branch instruction is used. A form of this instruction places the address of the instruction that follows the branch into a special purpose register called the link register. Any procedure which is not a leaf (i.e. a procedure which calls other procedures) must save the link register before calling another procedure. A subroutine return is accomplished by simply branching to the contents of the link register.

The so-called fused multiply-add instructions are another feature of the PowerPC instruction set that is important enough to be mentioned here. These instructions can perform a multiplication and an addition in the same amount of time as just a single multiplication or a single addition alone. In other words, twice as fast as the combined operations. Fortunately, this type of operation occurs often enough in mathematical software that the alert code generator will find ample opportunities to exploit them. For example, expressions of the form:

 a1 = a0 + b x c

appear in matrix operations and in polynomial expansions.

The PowerPC implements a true superscalar architecture. A superscalar machine is one which can issue multiple instructions to different execution units during each clock cycle. The PowerPC incorporates three different execution units that can operate independently and in parallel. They are the integer unit which affects the general purpose registers, the floating point unit which affects the floating point registers, and the branch unit which affects certain of the special purpose registers. Therefore, an integer shift, a floating point addition, and a branch instruction could all be issued during the same clock cycle. It is important to understand that not all of the PowerPC instructions can execute in a single clock cycle and it would be extremely difficult to schedule all three execution units for simultaneous execution on every cycle, but with careful code generation and attention paid to data dependencies, an exceptionally efficient throughput can be achieved.

It is not necessary for an instruction to completely finish in an individual execution unit before another instruction can be issued. The execution of an instruction consists of multiple stages that can be viewed (very roughly for the PowerPC is far more complicated) as fetch, decode, execute, and writeback. Each instruction is fetched from an instruction queue, decoded, executed, and the result is then written to the appropriate register file. These stages are called the pipeline and it is possible and certainly desirable for multiple instructions to be in the pipeline at once - each at a different stage. The basic limitation which would cause an instruction to stall is data dependency, which means that the execution of the instruction is dependant on the result of the preceding instruction. An instruction can also be stalled if it is waiting for an instruction with a latency greater than once clock to finish executing. That is, an instruction takes more cycles than there are stages in the pipeline for that execution unit. Instruction latency is determined by how complicated an instruction is (division takes longer than addition) and by memory access considerations. An instruction may stall while waiting for an operand to be delivered from memory. The issues of cache arbitration, both for instructions and data, are beyond the scope of this article.

A code generator which is aware of these two features, multiple execution units and their pipelines, attempts to schedule the instruction stream to make the most efficient use of the resources. Scheduling consists largely of the code generator rearranging or moving instructions to eliminate data dependencies and to keep the individual pipelines busy. This can cause expressions to executed out of order, array element address calculations to take place far from the memory references, and any number of other reorderings of the instruction stream to eliminate data dependencies. Obviously, register allocation seriously affects this scheduling process and is usually put off as long as possible to prevent any artificial or code-generator created dependencies.

“It projects a military coup!”

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

MacFamilyTree 8.2.5 - Create and explore...
MacFamilyTree gives genealogy a facelift: modern, interactive, convenient and fast. Explore your family tree and your family history in a way generations of chroniclers before you would have loved.... Read more
Hopper Disassembler 4.3.2- - Binary disa...
Hopper Disassembler is a binary disassembler, decompiler, and debugger for 32- and 64-bit executables. It will let you disassemble any binary you want, and provide you all the information about its... Read more
GraphicConverter 10.5.1 - $39.95
GraphicConverter is an all-purpose image-editing program that can import 200 different graphic-based formats, edit the image, and export it to any of 80 available file formats. The high-end editing... Read more
Delicious Library 3.7 - Import, browse a...
Delicious Library allows you to import, browse, and share all your books, movies, music, and video games with Delicious Library. Run your very own library from your home or office using our... Read more
Adobe Animate CC 2017 18.0.0.107 - Anima...
Animate CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Flash Professional customer). Animate CC 2018 (was Flash CC) lets you... Read more
Adobe After Effects CC 2018 15.0 - Creat...
After Effects CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous After Effects customer). The new, more connected After Effects CC... Read more
Adobe Premiere Pro CC 2018 12.0.0 - Digi...
Premiere Pro CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Premiere Pro customer). Adobe Premiere Pro CC 2018 lets you edit... Read more
Alarm Clock Pro 10.3 - $19.95
Alarm Clock Pro isn't just an ordinary alarm clock. Use it to wake you up in the morning, send and compose e-mails, remind you of appointments, randomize the iTunes selection, control an internet... Read more
Adobe Lightroom 20170919-1412-ccb76bd] -...
Adobe Lightroom is available as part of Adobe Creative Cloud for as little as $9.99/month bundled with Photoshop CC as part of the photography package. Lightroom 6 is also available for purchase as a... Read more
Adobe Illustrator CC 2018 22.0.0 - Profe...
Illustrator CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Illustrator customer). Adobe Illustrator CC 2018 is the industry... Read more

ICEY (Games)
ICEY 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ICEY is a 2D side-scrolling action game. As you follow the narrator's omnipresent voice, you will see through ICEY's eyes and learn the... | Read more »
The best new games we played this week -...
We've made it, folks. Another weekend is upon us. It's time to sit back and relax with the best new releases of the week. Puzzles, strategy RPGs, and arcade games abound this week. There's a lot of quality stuff to unpack this week, so let's hop... | Read more »
Wheels of Aurelia (Games)
Wheels of Aurelia 1.0.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.0.1 (iTunes) Description: | Read more »
Halcyon 6: Starbase Commander guide - ti...
Halcyon 6 is a well-loved indie RPG with stellar tactical combat and some pretty good writing, too. It's now landed on the App Store, so mobile fans, if you're itching for a good intergalactic adventure, here's your game. Being a strategy RPG, the... | Read more »
Game of Thrones: Conquest guide - how to...
Fans of base building games might be excited to know that yet another entry in the genre has materialized - Game of Thrones: Conquest. Yes, you can now join the many kingdoms of the famed book series, or create your own, as you try to conquer... | Read more »
Halcyon 6: Starbase Commander (Games)
Halcyon 6: Starbase Commander 1.4.2.0 Device: iOS Universal Category: Games Price: $6.99, Version: 1.4.2.0 (iTunes) Description: An epic space strategy RPG with base building, deep tactical combat, crew management, alien diplomacy,... | Read more »
Legacy of Discord celebrates its 1 year...
It’s been a thrilling first year for fans of Legacy of Discord, the stunning PvP dungeon-crawling ARPG from YOOZOO Games, and now it’s time to celebrate the game’s first anniversary. The developers are amping up the festivities with some exciting... | Read more »
3 reasons to play Thunder Armada - the n...
The bygone days of the Battleship board game might have past, but naval combat simulators still find an audience on mobile. Thunder Armada is Chinese developer Chyogames latest entry into the genre, drawing inspiration from the explosive exchanges... | Read more »
Experience a full 3D fantasy MMORPG, as...
Those hoping to sink their teeth into a meaty hack and slash RPG that encourages you to fight with others might want to check out EZFun’s new Eternity Guardians. Available to download for iOS and Android, Eternity Guardians is an MMORPG that lets... | Read more »
Warhammer Quest 2 (Games)
Warhammer Quest 2 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: Dungeon adventures in the Warhammer World are back! | Read more »

Price Scanner via MacPrices.net

Save $100 on 13″ MacBook Airs, prices start a...
Adorama has 2017 13″ MacBook Airs on sale today for $100 off MSRP including free shipping. Adorama charges NY & NJ sales tax only: – 13″ 1.8GHz/128GB MacBook Air (MQD32LL/A): $899, $100 off MSRP... Read more
1.4GHz Mac mini available for $399, $100 off...
TigerDirect has the 1.4GHz Mac mini on sale today for $399 including free shipping. Their price is $100 off MSRP, and it’s the lowest price available for this model. Although currently out of stock,... Read more
21″ 2.3GHz iMac on sale for $999, save $100
MacMall has the 21″ 2.3GHz iMac (MMQA2LL/A) on sale today for $999 including free shipping. Their price is $100 off MSRP, and it’s the lowest price available for this model. Read more
12″ iPad Pros on sale for $50 off MSRP, no ta...
Adorama has 12″ iPad Pros on sale today for $50 off MSRP. Shipping is free, and Adorama charges sales tax in NY & NJ only: – 12″ 64GB iPad Pro: $749, save $50 – 12″ 256GB iPad Pro: $899, save $50... Read more
9″ iPads on sale for $30 off, starting at $29...
MacMall has 9″ iPads on sale for $30 off including free shipping: – 9″ 32GB iPad: $299 – 9″ 128GB iPad: $399 Read more
Apple restocks full line of refurbished 13″ M...
Apple has restocked a full line of Apple Certified Refurbished 2017 13″ MacBook Pros for $200-$300 off MSRP. A standard Apple one-year warranty is included with each MacBook, and shipping is free.... Read more
13″ 3.1GHz/256GB MacBook Pro on sale for $167...
Amazon has the 2017 13″ 3.1GHz/256GB Space Gray MacBook Pro on sale today for $121 off MSRP including free shipping: – 13″ 3.1GHz/256GB Space Gray MacBook Pro (MPXV2LL/A): $1678 $121 off MSRP Keep an... Read more
13″ MacBook Pros on sale for up to $120 off M...
B&H Photo has 2017 13″ MacBook Pros in stock today and on sale for up to $120 off MSRP, each including free shipping plus NY & NJ sales tax only: – 13-inch 2.3GHz/128GB Space Gray MacBook... Read more
15″ MacBook Pros on sale for up to $200 off M...
B&H Photo has 15″ MacBook Pros on sale for up to $200 off MSRP. Shipping is free, and B&H charges sales tax in NY & NJ only: – 15″ 2.8GHz MacBook Pro Space Gray (MPTR2LL/A): $2249, $150... Read more
Roundup of Apple Certified Refurbished iMacs,...
Apple has a full line of Certified Refurbished 2017 21″ and 27″ iMacs available starting at $1019 and ranging up to $350 off original MSRP. Apple’s one-year warranty is standard, and shipping is free... Read more

Jobs Board

Project Engineer, *Apple* Education Profess...
Project Engineer, Apple Education Professional Services Job Number: 113143353New York City, New York, United StatesPosted: Oct. 17, 2017Weekly Hours: 40.00 Job Read more
Commerce Software Engineer, *Apple* Media P...
Commerce Software Engineer, Apple Media Products Job Number: 113092072New York City, New York, United StatesPosted: Oct. 19, 2017Weekly Hours: 40.00 Job Summary With Read more
Engineering Manager, *Apple* Retail Enginee...
# Engineering Manager, Apple Retail Engineering Job Number: 58139948 Santa Clara Valley, California, United States Posted: 20-Oct-2017 Weekly Hours: 40.00 **Job Read more
*Apple* Retail - Multiple Positions - Apple,...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
Commerce Engineer, *Apple* Media Products -...
Commerce Engineer, Apple Media Products (New York City) Job Number: 113028813New York City, New York, United StatesPosted: Sep. 20, 2017Weekly Hours: 40.00 Job Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.