TweetFollow Us on Twitter

PowerPC Assembly
Volume Number:10
Issue Number:9
Column Tag:PowerPC Essentials

Understanding PowerPC Assembly

Speak like a native in only two easy lessons! Welcome to lesson two.

By Bill Karsh, BillKarsh@aol.com

This article is part two of a two-part article on understanding the PowerPC architecture and assembly language. Last month we took a brief look at the hardware architecture of the MPC601 processor, and discussed the user programming model. This month, we’ll summarize its assembly language syntax in a condensed and easily digestible form for quick reference. This article is a compressed and intelligently-filtered user manual.

If you haven’t already read last month’s article, you might want to go back and learn about the environment and data types of the 601. If you have, let’s jump in and learn its lingo.

601-Speak - Terms, Notation and Generalities

Let’s introduce an example statement to look at, and enumerate as much as possible that is common to most instructions. Don’t worry. Much of this will be revisited, and the inevitable exceptions will be pointed out, as we go.

addrD,rA,rB

This familiar operation simply adds the contents of GPRs rA and rB together, and writes the result to rD. Immediately, there is a wealth of new stuff to talk about.

Register names - GPRs appear in assembly listings as r0...r31. FPRs are written fp0...fp31, or fr0...fr31 depending on the assembler. The CR fields are cr0...cr7. The remaining special-purpose registers are most often accessed through special-purpose instructions, such as mtspr 1,rA (move contents of rA to special-purpose register 1, the XER). There are usually simplified mnemonics for these - in this example, mtxer rA.

Destination and source order - Unlike 68K assembly, the destination register (purposely called rD) is listed first, and the source(s) second. This is true of all instructions except stores (from register to memory), where the memory destination is now on the right and source register on the left.

Destination flexibility - Unlike the 68K, where an add would be written ADD D0,D1 with D1 being both a source and the destination, the destination on the 601 can be separately specified. However, one can write add rA,rA,rB if desired.

(rA|0) - In many cases (you will know when it makes sense from the context) a zero can be substituted for rA. Here, add rD,0,rB simply copies rB into rD.

No size extensions (.L, .W, .B) - Operations are normally performed on all 32 bits of a register unless otherwise noted. The principal exceptions are bit-field operations, in which one specifies bit ranges, and load/stores, where the operand size is part of the mnemonic, such as lbz (load byte into register and zero bits 0-23).

Condition Register updating is optional - On the 68K, the majority of arithmetic and move instructions implicitly update the 68K CCR (condition code register). Not so on the 601. To save work, the 601’s CR and XER bits are not updated unless specifically requested. This is done through a rather large set of mnemonics, making the instruction set look more formidable than it is. For example, the following all perform the same add operation, but with various status updates:

 
 addc rD,rA,rB   ;update XER(CA)
 addco  rD,rA,rB ;update XER(CA,OV,SO)
 add. rD,rA,rB   ;update cr0

These mnemonic variations set additional bits in the encoding of the basic instruction. Note that load/stores do not offer CR update versions (a final “dot”). That requires a separate compare - remember, the reduced (and specialized) instruction set theme.

The “dot” option is very common and always means the same thing: update cr0. We will not repeat this every time a new instruction group is introduced, but maybe once more to be kind.

Local Addressing modes - Load/Stores provide the only interaction with memory. We will discuss memory addressing modes later. Otherwise, there are only two modes available for register-based operations. Our add example demonstrated register direct mode. There is also immediate half-word mode, which looks like this:

 addi rD,rA,SIMM   or   addis rD,rA,SIMM

The (i) or (is) suffix denotes immediate or immediate shifted mode, respectively. In the manual, SIMM and UIMM denote signed and unsigned immediate 16-bit values, respectively. They are encoded directly into the instructions, which are 32 bits in length. Note that instructions are complete entities, having no extension words. There isn’t room for anything larger than 16-bit immediates to be encoded. That’s the main use of immediate shifted mode. For example, the 32-bit immediate 0xABCDEF23 might be built in r3 by the sequence:

 addis  r3,0,0xABCD  ;load upper half-word, and zero lower half-word
 addi   r3,0,0xEF23  ;load lower half-word

Notation

We’ve already met with several variations of the add instruction. It’s time to introduce some lossless compression, that is, simplifying notation. This will require two types of custom brackets for listing options. Some readers will love this scheme right away. Others will find it to be like ordering a combination plate from a Chinese menu. As familiarity grows, it makes for a much quicker-to-use reference for everybody. Pages can be reduced to lines. Let’s try it.

< > denotes none, or any one from the list.

[ ] denotes none, or any number of options, in the order listed.

Using brackets, all 24 add variations are correctly summarized as follows:

 addi < s, c, c. >
 add  < c, e, me, ze >  [ o, . ]

To practice interpreting this, addi can generate four instructions: addi, addis, addic or addic. - the last ends in “c-dot.” The second line first generates the five possibilities: add, addc, adde, addme or addze. Further, each one of the five has four versions. For example: addze, addzeo, addze., addzeo. - again, the last ends in “o-dot.” You only need to understand “c” once, not six times! We’ll cover what these instructions do when we get to the arithmetic section.

Latency

The issue of timing and scheduling is complicated, but we’ll include the latency with the instruction descriptions (for your edification) in spite of the fact that it tells only part of the timing story. Latency sometimes refers to the total processing time for an instruction. Since most stages (except execution) take one cycle, we’ll take latency to mean execution latency. The vast majority of instructions execute in one cycle, although a handful do differ.

Memory Addressing Modes

To do anything useful, we have to get data into and out of memory. Memory accesses are performed with the load and store instructions, which offer two addressing formats. Again using the lbz instruction as our example, we have:

 lbz    rD,d(rA|0) ;register indirect with offset
 lbzx   rD,(rA|0),rB ;register indirect with index

The (optional) d signifies a 16-bit offset in bytes, encoded in the instruction (sign extended to 32 bits before use). Quite simply, the effective address of the source is the sum rA + d, where rA can be substituted with zero. The destination register for the load is rD, of course. If you thrill to semantics, when d = 0, this is really register indirect mode. With rA zero, addressing like d(0) can be called absolute mode. Register indirect with index is also simple to understand - effective address = rA + rB. Finally, there is the update option, which can be used with either mode, and works as follows. If rA is neither zero, nor the same register as rD, then after the load or store, the effective address is written to rA. That’s all there is to it.

By the way, I loathe semantics and all unnecessary names for things. I still remember being confounded by a DEC VAX manual that had this description of their multiple-precision subtraction library routine, called something like SubX(A,B,C). “Difference C is derived from subtrahend A and minuend B.” An extensive poll proved that neither I nor nearly 100 other scientists at the lab where I was working had any clue which of A and B was which - not one of us! Without these precise and expressive terms, DEC would have been unduly forced to write C = B - A. How crass indeed!

Armed with this general knowledge, we can begin looking at instruction groupings.

Load (Memory to Destination Register rD)

Latency 2

Operand Size Basic Options Operands

unsigned byte lbz [ u, x ] rD,d(rA)

unsigned half-word lhz [ u, x ] rD,d(rA)

signed half-word lha [ u, x ] rD,d(rA)

unsigned word lwz [ u, x ] rD,d(rA)

Options:

z “Load and zero,” right justify operand at low end of register, zero all higher bits, i.e., treat as unsigned. z is mandatory for byte and word loads.

a “Algebraic,” load with sign extension to 32 bits.

u “Update,” if (rA != 0 && rA != rD) rA = effective address (after load).

x “Index,” use register indirect w/index addressing, i.e., lbzx rD,rA,rB.

General Notes:

- The default addressing mode is register indirect w/offset, unless x option is specified, i.e., lbz rD,d(rA).

- The a and u forms may have greater latency on the 604.

- li, lis (load immediate) (see notes for ‘Addition’ section)

- la (load address) (see notes for ‘Addition’ section)

Store (Source Register rS to Memory)

Latency 1

Operand Size Basic Options Operands

byte stb [ u, x ] rS,d(rA)

half-word sth [ u, x ] rS,d(rA)

word stw [ u, x ] rS,d(rA)

Options:

u “Update,” if (rA != 0) rA = effective address (after store). Unlike loads, it is permissible to set rA = rS.

x “Index,” use register indirect w/index addressing, i.e., stbx rS,rA,rB.

General Notes:

- The default addressing mode is register indirect w/offset, unless x option is specified, i.e., stb rS,d(rA).

- Operand ordering is different from normal. The Source is given first.

Addition (negate, +, -)

Latency 1

Result Basic Options Operands

rD = -rA neg [ o, . ] rD,rA

rD = rA + SIMM addi < s, c, c. > none rD,rA,SIMM

rD = rA + rB add < c, e > [ o, . ] rD,rA,rB

rD = rA - 1 addme [ o, . ] rD,rA

rD = rA + 0 addze [ o, . ] rD,rA

rD = SIMM - rA subfic none rD,rA,SIMM

rD = rB - rA subf < c, e > [ o, . ] rD,rA,rB

rD = -1 - rA subfme [ o, . ] rD,rA

rD = 0 - rA subfze [ o, . ] rD,rA

Options:

i “Immediate operand,” specified as the 16-bit value SIMM.

s “Shifted,” the given 16-bit immediate is left-shifted 16-bits to become the high half of a word. The low half is zero.

c “Carry,” update XER(CA) bit.

e “Extended,” the XER(CA) bit is added to the result for multi-word arithmetic. XER(CA) is also updated according to the final result.

m “Minus 1,” simplifies specifying second operand as 32-bit immediate -1 (0xFFFFFFFF).

z “Zero,” simplifies specifying second operand as zero.

o “Overflow,” update XER(OV,SO) bits.

. “Record result,” update cr0.

General Notes:

- Read subf as “subtract from.”

- The ze variants of add and subtract are most useful for multi-word arithmetic. However, they can also be used as register-to-register move or negate and move mnemonics.

- The c, e and o options may introduce additional latency.

- li and lis (load immediate value) are simplified mnemonics for moving an immediate value into a register. They expand (encode) as follows:

 lirD,SIMM;addi  rD,0,SIMM
 lis  rD,SIMM  ;addisrD,0,SIMM

- la (load address) is a simplified mnemonic for loading the effective address of a variable into a register. There are two versions of operand syntax. The first is useful for retrieving the address of a field in a structure, where the structure’s base address is stored in register rA:

 larD,SIMM(rA) ;addi rD,rA,SIMM

The second syntax is sometimes seen in forming the address of local or global variables. Here the base address, rA, is implied - a standard register reserved for just such a purpose by the compiler.

 larD,SIMM;addi  rD,rA,SIMM

Multiplication (*, /)

Result Basic Options Operands Latency

rD = rA * SIMM

(low 32 bits) mulli none rD,rA,SIMM 5

rD = rA * rB

(low 32 bits) mullw [ o, . ] rD,rA,rB 5/9/10Ý

rD = rA * rB

(high 32 bits) mulhw [ u, . ] rD,rA,rB 5/9/10Ý

rD = rA / rB divw [ u, o, . ] rD,rA,rB 36

Options:

u “Unsigned,” treat sources and result as unsigned.

General Notes:

ÝLatency depends upon magnitude and “sign” of rB. In particular, if rB is <= 16 bits in width, latency = 5, else if bit 0 is zero, latency = 9, else, latency = 10. In addition, magnitude |rB| should be <= |rA| to achieve lowest possible latency.

Divide operations treat rA as a 64-bit dividend (numerator). The 32-bit quotient is returned to rD, but the remainder is discarded. To recover the remainder, one can perform the following sequence:

 divw   rD,rA,rB ;rD = trunc(rA/rB)
 mullw  rD,rD,rB ;rD = rA - remainder
 subf   rD,rD,rA ;rD = remainder

Bit-Wise Logical (&, |, ^, ~, sign extension)

Latency 1

Result Basic Options Operands

rD = rA & UIMM andi. none rD,rA,UIMM

andis. none rD,rA,UIMM

rD = rA & rB and [ c, . ] rD,rA,rB

rD = rA | UIMM ori < s > none rD,rA,UIMM

rD = rA | rB or [ c, . ] rD,rA,rB

rD = rA ^ UIMM xori < s > none rD,rA,UIMM

rD = rA ^ rB xor [ c, . ] rD,rA,rB

rD = ~(rA & rB) nand [ . ] rD,rA,rB

rD = ~(rA | rB) nor [ . ] rD,rA,rB

rD = ~(rA ^ rB) eqv [ . ] rD,rA,rB

sign extend low byte extsb [ . ] rD,rA

sign extend low half extsh [ . ] rD,rA

rD = count of leading cntlzw [ . ] rD,rA

zeros in word rA

Options:

s “Shifted,” the given 16-bit immediate is left-shifted 16-bits to become the high half of a word. The low half is zero.

c “Complement,” rB is first one’s complemented (~rB). For example, the instruction andc rD,rA,rB results in rD = rA & ~rB. The contents of rB are unchanged by the operation.

General Notes:

no-op instructions (instructions that effectively do nothing) can be encoded in many ways. You will often encounter interesting ways when looking at disassembly output from various compilers. Motorola provides the official simplified mnemonic “no-op” which expands as follows:

 no-op  ;ori0,0,0

This is the preferred form. no-ops serve to reserve space in code, later to be filled in with addresses by the linker. On some machines, no-ops are used to synchronize the machine. That is unnecessary on the 601, where instructions for that purpose are provided.

mr (move register) is a simplified mnemonic for copying the contents of one register to another. The preferred expansion is:

 mrrD,rB;or rD,0,rB

not (bit-wise not, or complement) is a simplified mnemonic for the C language’s (~) operator, with preferred expansion:

 
 not  rD,rB ;nor rD,0,rB

Shift (<<, >>)

Latency 1

Result Basic Options Operands

rD = rA << rB[26-31] slw [ . ] rD,rA,rB

logical shift left

rD = rA >> rB[26-31] srw [ . ] rD,rA,rB

logical shift right

rD = rA >> SHÝ srawi [ . ] rD,rA,SHÝ

algebraic shift right immediate

rD = rA >> rB[26-31] sraw [ . ] rD,rA,rB

algebraic shift right

General Notes:

- Only bits 26 through 31 of rB are used to form the shift count. If the value formed from these six bits (0-63 possible) is greater than 31 (i.e. bit 26 = 1), the value is taken to be exactly 32. This strategy of checking bit 26 for a very large count allows the quicker operation of zeroing rD, or setting rD to 31 copies of bit 0 for logical or arithmetic shifts, respectively.

- Algebraic right shifts update XER(CA) if rA is negative, and any 1-bits were shifted out of bit position 31.

Ý SH represents a 5 bit wide immediate value encoded in the instruction. Values larger than 31 are nonsensical, and should be flagged as errors by a compiler.

Note that immediate forms for the logical shifts seem to be absent. These functions are actually handled by the ‘Rotate and Mask’ instructions discussed next.

Rotate and Mask

This instruction group is unusual compared to what is offered on the 68K. Following the excellent example of the Motorola manual, we will first introduce the effects possible with these instructions, then build the mnemonics table, and finally explain how they work in more detail.

Extract Bit-field - A selected bit-string (n bits) starting at bit b can be extracted from rA, and right or left justified into rD. All other bits of rD are zeroed.

Insert Bit-field - A selected right or left justified bit-string (n bits) can be extracted from rA and inserted into rD. The inserted bits replace the corresponding bits of rD. Other bits of rD are unchanged.

Rotate - Rotate word left or right n bits.

Clear - Clear the leftmost or rightmost n bits to zero.

Clear Left and Shift Left - Clear the leftmost b bits of rA, then shift rA left n bits. This sounds strange, but is useful for scaling an (unsigned) array index. The clearing on the left allows controlling the maximum size offset obtained. This can help keep indices within bounds, but you’d still be in trouble accessing “elements of surprise.”

All of the above are performed with just a handful of combination rotate-and-mask instructions. Two more operands than normal are required to specify the 32-bit mask. These are MB (begin) and ME (end), both bit position numbers in the range (0-31). The mask is generated as follows, thinking of this as a two step process. First, clear all bits of the mask to zero. Then, starting with MB and marching forward, set bits to one until you get to ME - even if you have to wrap around from bit 31 back through bit 0.

In the simple case MB <= ME, bits MB through ME inclusive are set to one, all others are zero.

If MB > ME, the [MB-31] and [0-ME] ranges are ones, while bits [(ME+1)-(MB-1)] are zero, because of the wrap-around. The mask is always generated this way, but used differently by the different instructions in the group, as we will explain.

Latency 1

Result Basic Options Operands

Rotate rA left by SHÝ then AND with mask

rlwinm [ . ] rD,rA,SHÝ,MB,ME

Rotate rA left by rB* then AND with mask

rlwnm [ . ] rD,rA,rB*,MB,ME

Rotate rA left by SHÝ then mask insert

rlwimi [ . ] rD,rA,SHÝ,MB,ME

General Notes:

Ý SH represents a 5 bit wide immediate value encoded in the instruction.

* Only bits rB[27-31] (5 bits) are used to form the rotate count.

- Values larger than 31 for SH, MB and ME are nonsensical, and should be flagged as errors by a compiler.

These instructions certainly demand some discussion. Note that there are no rotate right instructions. Rotate right by n bits can be achieved by a left-rotate through 32-n bits.

In performing the AND with mask operations, the contents of rA are first rotated left by the number of bits specified by SH or rB. This result is then ANDed with the mask specified by MB and ME, and stored in rD. For example, to simply rotate rA left or right by n bits, one writes, respectively:

rlwnm   rD,rA,n,0,31 ;rotlw  rD,rA,n (simplified)
rlwnm   rD,rA,32-n,0,31

To extract an n-bit field starting at b in rA, and left justify into 
rD, one writes:

rlwinm  rD,rA,b,0,n-1;extlwi  rD,rA,n,b (simplified)

To shift right by an immediate count, n:

rlwinm  rD,rA,32-n,n,31   ;swri  rD,rA,n (simplified).

To clear the right n bits of rA:

rlwinm  rD,rA,0,0,31-n    ;clrrwi  rD,rA,n (simplified).

The mask insert instruction works as follows. rA is rotated left by the number of bits specified by SH or rB, as before. Where there are zeros in the mask, the bits of rD are preserved. Where there are ones in the mask, the corresponding bits of rD are replaced with the corresponding bits of rA. It’s endless fun to work out various possibilities for this operation. Once more, I cheat you, demonstrating only the one example for which there is already a simplifying mnemonic. This extracts an n-bit field, left justified in rA, and inserts it into rD starting at bit b:

rlwimi  rD,rA,32-b,b,b+n-1;inslwi  rD,rA,n,b.

I admit it. I find myself spending inordinate amounts of time thinking about and fixing problems of the “±1” variety. That is, keeping straight the difference between array index number and element count, for example, or that the number of things between a and b inclusive is (b - a + 1). I can’t even figure out my own age easily - do I count the year I was born or not? Perhaps I’m deficient in this way, but if I can help one other dummy like me, I’m happy. Let me offer the three rules I find most useful to remember. Again, a right shift by (number of bits) n is just a left shift by 32-n - convince yourself with examples, and then accept it as law. Second, try working in terms of numbers of bits. When you have that straight, convert a bit count to an index by subtracting 1. Third, use the (b - a + 1) rule to count things correctly. I know, I know, this is Arithmetic 1 for preschoolers, but honest to God, I burn all my time on just this sort of stuff!

If my kind of problem is a recognizable type of brain disease, don’t tell me about it. I like to think that my zealous scientific skepticism started so early that I rejected arithmetic as implausible and groundless, so never succumbed to it.

Let’s walk through the thinking behind the last example in painful detail. First, the bits we want are left justified in rA, so the leftmost bit of the string is at position 0. In the end, the string has to start at bit position b of rD. To get the shift amount correct, I use a test example. If we shift right by 1, the new left would be at 1, so, we clearly want to shift right by b. Now, without thinking, rule one tells me that SH must be 32-b. Next we need a mask that has ones only in the n-bit string that starts at position b. We immediately see that MB is just b. Finally, we want the number of bits from b to ME, inclusive, to be n, so, n = ME-b+1. Or, ME = b+n-1.

Compare

Latency 1

Result Basic Operands

compare immediate (algebraic) cmpi crD,L,rA,SIMM

compare (algebraic) cmp crD,L,rA,rB

compare immediate (logical) cmpli crD,L,rA,UIMM

compare (logical) cmpl crD,L,rA,rB

Options:

l “Logical,” the operands to be compared, rA and rB or UIMM, are treated as unsigned numbers.

L “Long,” the operands to compare are 64-bit if L = 1, 32-bit if L = 0. The 601 does not implement L = 1, since its GPRs are 32 bits.

General Notes:

- For all comparisons, the result of comparing rA to rB (or immediate operand) is placed into one of the eight CR fields cr0...cr7. Remember that this means “rA with respect to rB.” For those who are familiar with 68K conventions, this is works like CMP.L rB,rA

- If crD is omitted from the operand list, cr0 is assumed by the compiler.

- Various dis/assemblers may display the crD field as “crD,” “crfD,” or simply as a number from 0 through 7.

- Since the L flag is always zero on the 601, there is a “compare word” simplified mnemonic for each instruction that relieves you of setting L = 0. For example, cmpw has the expansion:

 cmpw crD,rA,rB  ;cmprD,0,rA,rB

The four simplified versions are: cmpwi, cmpw, cmplwi, cmplw.

Operations on CR

Result Basic Operands Latency

logically combine crand crbD,crbA,crbB 1

two CR bits - cror

bitD = bitA op bitB crxor

crnand

crnor

creqv

crandc

crorc

move field mcrf crD,crA 2

Options:

c “Complement,” bit crbB is complemented before the crandc or crorc operation.

General Notes:

- The crXXX operations can be used to combine separate comparisons. Each has similar syntax as shown in the table. crbD, crbA and crbB are the bit numbers (0-31) of the CR register on which to operate.

- The “move CR field” operation, mcrf, operates on whole 4-bit fields cr0...cr7 of the CR. The crA field is copied to crD.

Branching

Branching has its own addressing modes. There are essentially two of these, absolute mode, and what you knew on the 68K as “PC relative mode,” even though there is no accessible program-counter register on the 601. The term PC is used only as a familiar convenience. The two modes need virtually no explanation as to what they mean. What should be mentioned, though, are the following items.

First, addresses are always signed numbers. That means that absolute addresses or relative displacements to be added to the PC are first sign extended.

Next, instruction words must always be four-byte aligned. Therefore, the low two bits of branch addresses are always ignored (taken to be zero).

In addition, the magnitude of the absolute address’s range or of the displacement depends upon where that number is stored. For example, if the address is to be fetched from the link (LR) or count (CTR) register, it can be a full 32 bits. If, on the other hand, the address is encoded into the branch instruction itself, the available space is more limited. In particular (accounting for implicit zeroing of the two low bits), conditional branches can store numbers effectively 16 bits in width. This gives a range of [-32768, +32767] bytes. Unconditional branches have a simpler encoding, making room for a larger displacement field 26 bits in width. This gives the range [-33554432, +33554431] bytes. (I am going by the formats for branch encodings given in chapter 10 of the manual, not by the addressing diagrams in chapter 3 which are in error. The chapter 3 diagrams show the first instruction code field as 7 bits in width, It is in fact 6 bits wide for all instructions).

Next on the list of items to mention is that virtually any branch instruction can optionally store a return address (the address following the branch instruction) into the link register. This option is specified by appending a trailing (l - lower case L) to the instruction mnemonic. In a similar way, absolute mode is usually specified in the mnemonic by appending (a).

Also, if the target address is in the LR or CTR, then (lr) or (ctr) is made part of the mnemonic itself, and no immediate value operand is given for the address. The contents of the LR or CTR are always interpreted as an absolute address. The other possibility is an address specified by an immediate value as an operand to the instruction. Whether the value is taken as an absolute address or as a relative displacement depends upon whether the (a) option has been specified.

Finally, all conditional branches are encoded with a static predictor bit (the “y” bit in the manual). Static prediction means that the likelihood of needing to take a branch or not is determined at compile time. The 601 may be the last of its breed. Subsequent implementations will probably use dynamic (run-time determined) prediction. Anyway, the y bit works like this. Suppose y = 0. If the sign of the address (absolute mode) or the sign of the displacement (relative mode) is negative, take the branch, otherwise don’t. This is the default behavior, which essentially says, “this looks like a loop, so loop.” The compiler can set y = 1, which reverses the above logic. The default behavior is fine for a loop. It would probably do the right thing, when not in a loop, perhaps 50% of the time by chance alone. Using static prediction effectively to improve the 50% score almost certainly requires programmer knowledge about the algorithm and its likely input parameter ranges - one has to know what’s going on. I don’t believe compilers will use this with any success for a long, long time. On the other hand, dynamic prediction learns which way things went this time, and remembers that for the next time. This looks promising.

There are more branch instructions and their simplified mnemonics on the 601 than you can shake an armload of sticks at - clearly, more is sometimes a lot less. However, the situation can be tamed by realizing that they are all built out of smaller pieces, and with a modest degree of regularity. There are the fundamental types, to which all other simplified versions expand. The simplified versions can be classified into two types, which I deem “class-A” and “class-B.” Give yourself a gold star when you think you “get it.” If you really do get it, become a particle physicist. People with this peculiar talent are suited to that line of work - I warn you, the pay is probably better where you are now.

Fundamental Branches

This is a case where being able to look at the instruction formats will clarify things. In spite of all the complexity of the mnemonic options, what actually gets encoded is not so complicated.

Here we see, as stated, that addresses stored in the LR or CTR are always absolute. Also, the LR-update option (update with return address) is no more than a bit in the instruction.

BI is a 5-bit field that can hold a number from 0 through 31. This specifies which single bit of the CR is to be tested. This bit constitutes “the condition.” The condition is true if the bit is one, otherwise it is false.

BO directs what action to take. This may be to branch unconditionally - only needed in the case of branch to LR or CTR. Otherwise, it is based on some combination of the condition and the result of decrementing the CTR. BO also holds the y-bit for predictions. We give the following table of BO encodings. However, there is no need to familiarize yourself with it in any great detail, right now.

BO Branch If...

0000y decremented CTR != 0 && false

0001y decremented CTR == 0 && false

0010y false

0100y decremented CTR != 0 && true

0101y decremented CTR == 0 && true

0110y true

1000y decremented CTR != 0

1001y decremented CTR == 0

10100 unconditionally

The versions that use the CTR as a decrementing counter deserve some discussion. First, you get your initial count into the CTR using the special-purpose register move, mtctr, and can read it with mfctr. These are discussed further in the ‘Special-Purpose Register Move’ section. The functionality is similar to DBcc on the 68K, but not the same. Let’s recall how DBcc works. DBcc always tests its condition (cc) first. If the condition it true, the DBcc will fall through without decrementing the index register at all. If the condition is false, the index is decremented, then compared against -1. The (index = 0) case does cause a branch to the loop’s top - this case is executed. In contrast, the 601 always decrements the CTR first. What’s tested here is the AND of both the count (as compared against 0), and the condition. In general, the (index = 0) case is not executed on the 601.

With the help of the figure to show us what’s going on with formats, we can easily grasp the set of fundamental branch instructions, to which all other simplified forms expand. Each row of this table corresponds to one of the encoding formats in the figure.

Latency 1

Result Basic Options Operands

Branch Unconditionally b [ l, a ] target

Branch Conditionally bc [ l, a ] BO,BI,target

Branch to LR Conditionally bclr [ l ] BO,BI

Branch to CTR Conditionally bcctr [ l ] BO,BI

Important!

The forms you should use to implement a decrementing count held in the CTR are one of bc or bclr, where the target address is specified separately from the CTR count. The bcctr instruction is intended for holding an alternate target address in the CTR, when you want to preserve the address already in the LR, or do not want the target overwritten by a return address. In fact, encoding BO for decrementing in the bcctr instruction creates an invalid form which might cause an exception on later machines.

Class-A Simplified Branches

In this class, the writing of branches is to be simplified by providing a shorthand for the BO and BI fields. A name is given to each row of the BO table. This name will become part of the instruction mnemonic. A name is also given to the bits of the CR. This name becomes an operand in the syntax of the class-A instructions.

We reintroduce the BO table, but with a new column “BO-name,” that shows the name for each row.

BO Branch If... BO-name

0000y decremented CTR != 0 && false dnzf

0001y decremented CTR == 0 && false dzf

0010y false f

0100y decremented CTR != 0 && true dnzt

0101y decremented CTR == 0 && true dzt

0110y true t

1000y decremented CTR != 0 dnz

1001y decremented CTR == 0 dz

10100 unconditionally nothing

The second table shows how traditional mnemonics like eq, lt, etc., can be correlated with a bit number of the CR - cr0, actually. The blow-up diagram of the CR will make this obvious.

Test Value Meaning

Bit-Name bit in cr0

lt 0 less than

gt 1 greater than

eq 2 equal

so 3 summary

overflow

In order to reach bits beyond cr0, we need the following set of names for field-offsets into the CR.

Offset-Name Value

cr0 0

cr1 4

cr2 8

cr3 12

cr4 16

cr5 20

cr6 24

cr7 28

In this manner, bit 19 (the SO bit of cr4), for example, can be addressed as cr4+so. Bits in cr0 can be addressed as either cr0+eq, for example, or just eq.

Now we can compose the class-A table.

Latency 1

Result Basic Options Operands

Branch Conditionally b [BO-Name] [ l, a ] [crN+bit,] target

Branch to LR Conditionally b [BO-Name] lr [ l ] [crN+bit]

Branch to CTR Conditionally b [BO-Name] ctr [ l ] [crN+bit]

General Notes:

As stated for the fundamental types, the BO-Names encoding decrementing of the CTR are incompatible with b...ctr instructions.

Some examples now follow to clarify:

ba target ;Branch-absolute to target, uncond.  
 ;This is just a fundamental type.

blr;Branch uncond. to address in LR.  No operands needed.

btctr gt;Branch to address in CTR on ”greater than zero.”

bdzdelta;Decrement CTR, Branch-relative by delta if CTR == 0. 
 ;No condition bit needed.

bdnzt cr1+lt,delta ;Decrement CTR. 
 ;Branch-relative by delta if CTR != 0 and bit 4 of CR is set.

bfla    eq,target;Branch-absolute to target on “non-zero.” 
 ;Place return address in LR.

Just about anything can be coded using the spiffy class-A types. However, I suspect that someone saw examples like bf eq, and said “That’s ass-backward! Wouldn’t it be so much more convenient to just say bne?” And so, with nothing better to do, he developed the class-B types...

Class-B Simplified Branches

These are used when the CTR is not being decremented. Specifying the condition is simplified in that more natural looking expressions can be used for the negatives of tests, such as “ge” as opposed to bf lt - false(less than). The mnemonics for the tests now become:

Test Meaning

lt less than

le less than or equal

eq equal

ge greater than or equal

gt greater than

nl not less than

ne not equal

ng not greater than

so summary overflow

ns not summary overflow

These names take care of specifying which bit within a crN field is being tested. Therefore, while crN is still an optional operand, the “+bit” part now becomes incorporated into the mnemonic proper. Here is the class-B table.

Latency 1

Result Basic Options Operands

Branch Conditionally b [Test] [ l, a ] [crN,] target

Branch to LR Conditionally b [Test] lr [ l ] [crN]

Branch to CTR Conditionally b [Test] ctr [ l ] [crN]

Examples:

bnela target;Branch-absolute to target on “non-zero.” Place return address
 ;in LR.Same as last class-A example.

bnslr cr2 ;Branch to address in LR if no summary overflow recorded in 
cr2.

There, not so hard after all.

Special-Purpose Register Moves

Result Basic Operands Latency

move to SPR mtspr SPR,rA variable*

move from SPR mfspr rD,SPR variable*

move to CR fields mtcrf CRM,rA 1,2Ý

move XER[0-3] to CR field mcrxr crD 2ÝÝ

and zero XER[0-3]

move from CR mfcr rD 1

General Notes:

- For mtspr and mfspr, the special-purpose register operand, SPR, is the register’s number as shown in the figure for the user programming model register set. For example, to move register rA to the link register (SPR8) one writes:

 mtspr  8,rA;mtlrrA (simplified)

- For each of the special-purpose registers we are interested in: XER, LR, CTR, there is a simplified mnemonic that relieves writing the register number. The simplified move-to-SPR mnemonics are: mtxer, mtlr, mtctr. For move from SPR, we have: mfxer, mflr, mfctr.

* There is precious little information on what the variable latency might be. Let me provide my informed guess. I believe that moves concerning the XER, LR, and CTR probably execute in one cycle. You would expect that, since they are used routinely in integer and branch processing - note that mfctr may well be called on every iteration of a loop to make the loop index available. Probably, the longer latencies concern registers having to do with controlling the machine’s environment. One can easily imagine that changing some of these involves some reaction from the system to reflect a changed state.

- For move to CR fields, operand CRM is an 8 bit mask. Bit 0 corresponds to cr0, bit 7 corresponds to cr7, etc. Suppose CRM = 0x08. Then cr3 is overwritten with the four bits rA[12-15] (the bits at the same position in rA as the four bits of cr3).

Ý Some versions of the manual list 1 cycle, others 2 for the latency of mtcrf. Also note that mtcrf may execute more slowly if CRM < 0xFF. In other words, mtcrf is fastest if all of the CR is moved rather than parts of it.

- For mcrxr, crD is a number (0-7) for the destination CR field cr0...cr7.

ÝÝ Some versions of the manual list 1 cycle, others 2 for the latency of mcrxr.

- For mfcr, the entire contents of the CR register (bits 0-31) are moved to destination rD.

In Conclusion

I have tried to be accurate and thorough in what has been covered. The areas about which we have said nothing are namely: synchronization instructions for enforcing in-order execution of reads and writes, cache manipulation, and exception handling. Most programmers will have little direct contact with these matters, nor will compilers typically generate this type of code, so you probably won’t get bogged down by these omissions when debugging. I have also left out floating-point operations for reasons of clarity, magazine real estate limitations, and the preservation of trees. You can, of course, learn all about floating-point operations by consulting the 601 User Manual for syntax, and the Inside Macintosh volume PowerPC Numerics for usage.

Since this is really a reference manual, which doesn’t usually have a clever sum-it-all-up conclusion, I will bow to that tradition. I’m keeping my chatty quips and clichés to myself.

References

[The following references are no doubt available in many places. We’ll list the ones we’re sure about - Ed stb]

PowerPC 601 User’s Manual, Motorola. Available from APDA with the Macintosh with PowerPC Starter Kit. $39.95.

PowerPC System Software (New Inside Macintosh). Available as part of APDA’s PowerPCStarter Kit, or separately from the Mail Order Store for $20.20.

PowerPC Numerics (New Inside Macintosh). Available from the Mail Order Store for $28.95

 
AAPL
$98.14
Apple Inc.
+0.47
MSFT
$44.19
Microsoft Corpora
-0.31
GOOG
$589.10
Google Inc.
+0.08

MacTech Search:
Community Search:

Software Updates via MacUpdate

Bartender 1.2.20 - Organize your menu ba...
Bartender lets you organize your menu bar apps. Features: Lets you tidy your menu bar apps how you want. See your menu bar apps when you want. Hide the apps you need to run, but do not need to... Read more
TotalFinder 1.6.2 - Adds tabs, hotkeys,...
TotalFinder is a universally acclaimed navigational companion for your Mac. Enhance your Mac's Finder with features so smart and convenient, you won't believe you ever lived without them. Tab-based... Read more
Vienna 3.0.0 RC 2 :be5265e: - RSS and At...
Vienna is a freeware and Open-Source RSS/Atom newsreader with article storage and management via a SQLite database, written in Objective-C and Cocoa, for the OS X operating system. It provides... Read more
VLC Media Player 2.1.5 - Popular multime...
VLC Media Player is a highly portable multimedia player for various audio and video formats (MPEG-1, MPEG-2, MPEG-4, DivX, MP3, OGG, ...) as well as DVDs, VCDs, and various streaming protocols. It... Read more
Default Folder X 4.6.7 - Enhances Open a...
Default Folder X attaches a toolbar to the right side of the Open and Save dialogs in any OS X-native application. The toolbar gives you fast access to various folders and commands. You just click... Read more
TinkerTool 5.3 - Expanded preference set...
TinkerTool is an application that gives you access to additional preference settings Apple has built into Mac OS X. This allows to activate hidden features in the operating system and in some of the... Read more
Audio Hijack Pro 2.11.0 - Record and enh...
Audio Hijack Pro drastically changes the way you use audio on your computer, giving you the freedom to listen to audio when you want and how you want. Record and enhance any audio with Audio Hijack... Read more
Intermission 1.1.1 - Pause and rewind li...
Intermission allows you to pause and rewind live audio from any application on your Mac. Intermission will buffer up to 3 hours of audio, allowing users to skip through any assortment of audio... Read more
Autopano Giga 3.6 - Stitch multiple imag...
Autopano Giga allows you to stitch 2, 20, or 2,000 images. Version 3.0 integrates impressive new features that will definitely make you adopt Autopano Pro or Autopano Giga: Choose between 9... Read more
Airfoil 4.8.7 - Send audio from any app...
Airfoil allows you to send any audio to AirPort Express units, Apple TVs, and even other Macs and PCs, all in sync! It's your audio - everywhere. With Airfoil you can take audio from any... Read more

Latest Forum Discussions

See All

The Great Prank War Review
The Great Prank War Review By Nadia Oxford on July 28th, 2014 Our Rating: :: PRANKING IS SERIOUS BUSINESSUniversal App - Designed for iPhone and iPad Though short, The Great Prank War offers an interesting and fun mix of action and... | Read more »
Marvel Contest of Champions Announced at...
Marvel Contest of Champions Announced at Comic-Con Posted by Jennifer Allen on July 28th, 2014 [ permalink ] Announced over the weekend at San Diego Comic-Con was the fairly exciting looking Marvel Contest of Champions. | Read more »
Teenage Mutant Ninja Turtles Review
Teenage Mutant Ninja Turtles Review By Jennifer Allen on July 28th, 2014 Our Rating: :: DULL SWIPINGUniversal App - Designed for iPhone and iPad The pizza power is weak when it comes to this Teenage Mutant Ninja Turtles game.   | Read more »
Exploration Focused Puzzle Game Beatbudd...
Exploration Focused Puzzle Game Beatbuddy Set to Make Transition from PC to iOS this September Posted by Jennifer Allen on July 28th, 2014 [ permalink ] | Read more »
PlanetHD
PlanetHD By Nadia Oxford on July 28th, 2014 Our Rating: :: SPACE MADNESSUniversal App - Designed for iPhone and iPad PlanetHD will keep players busy for a while, though its unpredictable physics are a handful to deal with.   | Read more »
This Week at 148Apps: July 21-25, 2014
Another Week of Expert App Reviews   At 148Apps, we help you sort through the great ocean of apps to find the ones we think you’ll like and the ones you’ll need. Our top picks become Editor’s Choice, our stamp of approval for apps with that little... | Read more »
Reddme for iPhone - The Reddit Client (...
Reddme for iPhone - The Reddit Client 1.0 Device: iOS iPhone Category: News Price: $.99, Version: 1.0 (iTunes) Description: Reddme for iPhone is an iOS 7-optimized Reddit client that offers a refreshing new way to experience Reddit... | Read more »
Jacob Jones and the Bigfoot Mystery : Ep...
Jacob Jones and the Bigfoot Mystery : Episode 2 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Jacob Jones is back in Episode 2 of one of Apples 'Best of 2013' games and an App Store... | Read more »
New Trailer For Outcast Odyssey, A New K...
New Trailer For Outcast Odyssey, A New Kind of Card Battler Posted by Jennifer Allen on July 25th, 2014 [ permalink ] Out this Fall is a new kind of card battle game: Outcast Odyssey. | Read more »
Hay Day – Tip, Tricks, Strategies, and C...
Recently got into Supercell’s other huge hit, Hay Day and could do with some advice on what to do? We’ve got you covered with some helpful trips and tricks to bear in mind! Ticking Along One of the key things to keep in mind while building up that... | Read more »

Price Scanner via MacPrices.net

13-inch 2.5GHz MacBook Pro on sale for $1099,...
Best Buy has the 13″ 2.5GHz MacBook Pro available for $1099.99 on their online store. Choose free shipping or free instant local store pickup (if available). Their price is $100 off MSRP. Price is... Read more
Roundup of Apple refurbished MacBook Pros, th...
The Apple Store has Apple Certified Refurbished 13″ and 15″ MacBook Pros available for up to $400 off the cost of new models. Apple’s one-year warranty is standard, and shipping is free. Their prices... Read more
Record Mac Shipments In Q2/14 Confound Analys...
A Seeking Alpha Trefis commentary notes that Apple’s fiscal Q3 2014 results released July 22, beat market predictions on earnings, although revenues were slightly lower than anticipated. Apple’s Mac’... Read more
Intel To Launch Core M Silicon For Use In Not...
Digitimes’ Monica Chen and Joseph Tsai, report that Intel will launch 14nm-based Core M series processors specifically for use in fanless notebook/tablet 2-in-1 models in Q4 2014, with many models to... Read more
Apple’s 2014 Back to School promotion: $100 g...
 Apple’s 2014 Back to School promotion includes a free $100 App Store Gift Card with the purchase of any new Mac (Mac mini excluded), or a $50 Gift Card with the purchase of an iPad or iPhone,... Read more
iMacs on sale for $150 off MSRP, $250 off for...
Best Buy has iMacs on sale for up to $160 off MSRP for a limited time. Choose free home shipping or free instant local store pickup (if available). Prices are valid for online orders only, in-store... Read more
Mac minis on sale for $100 off MSRP, starting...
Best Buy has Mac minis on sale for $100 off MSRP. Choose free shipping or free instant local store pickup. Prices are for online orders only, in-store prices may vary: 2.5GHz Mac mini: $499.99 2.3GHz... Read more
Global Tablet Market Grows 11% in Q2/14 Notwi...
Worldwide tablet sales grew 11.0 percent year over year in the second quarter of 2014, with shipments reaching 49.3 million units according to preliminary data from the International Data Corporation... Read more
New iPhone 6 Models to Have Staggered Release...
Digitimes’ Cage Chao and Steve Shen report that according to unnamed sources in Apple’s upstream iPhone supply chain, the new 5.5-inch iPhone will be released several months later than the new 4.7-... Read more
New iOS App Helps People Feel Good About thei...
Mobile shoppers looking for big savings at their favorite stores can turn to the Goodshop app, a new iOS app with the latest coupons and deals at more than 5,000 online stores. In addition to being a... Read more

Jobs Board

*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
Sr. Product Leader, *Apple* Store Apps - Ap...
**Job Summary** Imagine what you could do here. At Apple , great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
*Apple* Solutions Consultant (ASC) - Apple (...
**Job Summary** The ASC is an Apple employee who serves as an Apple brand ambassador and influencer in a Reseller's store. The ASC's role is to grow Apple Read more
WW Sales Program Manager, *Apple* Online St...
**Job Summary** Imagine what you could do here. At Apple , great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.