|Column Tag:||Assembly Workshop
The Secrets of the Machine
Or, how to read Assembler
By Malcolm H. Teas, Rye, New Hampshire
About the author
Malcolm H. Teas, 556 Long John Road, Rye, NH 03870 Internet: email@example.com
America Online: mhteas
Why Read Assembler?
When we write programs in C or Pascal, what were really doing is writing in the computers second language. When I studied French I was always translating it into english in my head to understand it. Well, thats just what the computers doing. Its taking C, Pascal, or whatever youre programming with and translating it to its native language - assembler. Translation is what a compilers job is. But just like my french translations would lead to errors and awkward speech, the compiler can occasionally make mistakes and the code that the compiler creates from your source is a little awkward too - it isnt always the most efficient. Sometimes this doesnt matter that much since the CPU is quite fast. However, if youre doing a time critical algorithm, the speed of your application just isnt what you want it to be, or you suspect that theres some strange error, then its time to talk to the machine in its native language. This article is a travellers phrasebook.
Where do I find assembler?
Although this wasnt always so, these days you can now find some way to examine either the translated assembler code for your source or your disassembled application. If youre using the latest version of Think C (version 5.0), then try the Disassemble item in the Source menu. This generates the translated assembler version of the source in the front window. The new window that results shows the assembler and can be printed, saved and otherwise treated as any other Think C editor window.
MPW (Macintosh Programmers Workshop) also offers a number of tools to get at your assembler listings. The dumpCode tool takes any type of code resource and disassembles it. It also can list the jump table and other information that is included. Ill talk about jump tables when I cover the memory map of an application. If you use the SourceBug debugger, it has an option to view source as either the original language or as assembler.
ResEdit now has an external that, when you open a code resource, disassembles it. It is quite helpful in finding the targets of jumps and other memory addresses, it shows you graphically with arrows. Unfortunately, it doesnt permit editing of the assembler. While this external is not officially supported by Apple, it has worked well for me.
If you cannot get any of these, you can always use a low-level debugger like MacsBug or TMON. By getting into the debugger while in your application, you can disassemble the code youre interested in and save it to a file. In MacsBug, youd use the ip command to disassembler around the program counter, then use the log command to save the screen to a file.
What the computer really looks like.
When you read assembler, you see instructions that refer to registers, memory locations, and have an unusual syntax. The programming environment for assembler is the bare machine so it has some constraints. To learn to read assembler, you need to know something about the environment. Actually, this is the hard part, reading the assembler instructions is easy.
First are the registers. The CPUs (a computers central processing unit) that are used these days have a number of registers to hold data or addresses currently being used by the program. The Motorola chips used in the Macintosh (the 680x0 family) have eight data registers, eight address registers, a program counter (also called a PC), and a condition code or status register. The 68020 and later chips have some other specialized registers used by the Macs operating system for handling interrupts, mapping memory, and managing the CPUs cache. However, these are only used in the operating system and are not interesting to the application programmer.
Data registers are used more often by instructions that manipulate data like the logical and arithmetic instructions. Address registers are used to address locations in the computers memory. Theyre often used to index data and can be used in move instructions to help calculate the memory location of data. Address register seven (A7) is used by the CPU as a stack pointer. Some instructions can address data on the stack and automatically push or pop the stack. The Mac operating system has a convention to use address register five (A5) as the pointer to the top of an applications global data and to use A6 (address register six) as the stack frame pointer. Ill cover more of the stack frame and global data space later.
The program counter (PC) is a special register that holds the address of the next instruction to execute. The status register (SR or CCR for Condition Code Register) holds flags showing the results of the last data operation: zero, negative, positive, etc. These are used in all branching instructions that implement the if statements, loops, and multi-way ifs like the C switch statement.
The memory of the Mac, to an assembler language programmer, just looks like a big array. Some of this array holds the program, some holds the system, some holds the applications data, and some other is used by other applications. This explains why one applications bugs can cause problems for other applications. The first application can overwrite the contents of memory anywhere so that data or code for another application can be damaged too. As a result, keeping track of pointers and handles is quite important.
But for an application, the Macs memory is organized into application memory areas which hold the heap, stack, and global data. Any application is expected to stay in its own area. Low memory belongs to the interrupt table and system globals. Above that is the system heap, followed by the multifinder area. In high memory are the address locations of the cards and I/O devices that the Mac is equipped with. The Mac operating system divides the MultiFinder area into application memory areas or partitions, one for each application in memory at the time. The size of an applications partition is determined when an application is launched from the SIZE resource. If there is no SIZE resource, a default partition size of 512K bytes is used. The partition size can be changed by the user in the Finders Get Info box. This creates a new SIZE resource. (See Inside Mac VI page 5-14 for more information on the SIZE resource.)
This application memory partition is, in turn, subdivided into the applications heap, stack, and global area. Your applications code, opened resources, handle and pointer blocks are all in the heap which occupies the bottom part of the partition and may grow upward. The jump table, global variables, and the quickdraw application globals are all stored in the global area at the top of the partition. Register A5 (by convention) points into this area, at the top of the applications globals. When a routine references global data, its done as a negative offset from register A5. Due to how this addressing mode is coded in instructions, this makes the maximum size of the application globals 32K. Although some compilers have ways around this limit, its best to stay under it the larger global areas are more difficult to access and make your program less efficient. Parameters for routines and local variables are stored on the stack which grows downward in memory and is located just beneath the QuickDraw application globals.
The jump table is fixed in memory for the life of the application and so is used to get around the 32K limit on CODE resources and to allow them to be moved in memory. When a routine is called that isnt in the same code resource as the calling routine, the compiler & linker make a jump table entry. This is a jump instruction to the other code resource. So, the calling routine does a JSR (Jump to Subroutine) to the address in the jump table, and the jump table the jumps control to the location in the new code resource. When a code resource is moved in memory, the jump table is corrected. This also allows the Segment Loader manager (part of the Mac Toolbox) to load code resources.
The stack grows toward low memory, in other words, when something is pushed onto the stack, the stack pointer is decremented. When its popped, the stack pointer is incremented. The stack is used to pass parameters to subroutines. The parameters are pushed onto the stack, then the routine is called. The routine then executes the link instruction. This instruction pushes the contents of register A6 on the stack, copies the register A7 (the stack pointer) into register A6, then decrements the stack pointer by the cumulative size of the routines local variables. This makes A6 the frame pointer. Each routine called has a frame of state information preserved on the stack. This is what enables debuggers to retrace the stack (called a stack crawl) to find the list of current routines. So, when you see your code accessing data via a positive offset from A6, the codes accessing its parameters. A negative offset is used for its local variables.
Parameters are passed in different orders depending on the language being used. C passes the parameters from right to left. The rightmost parameters in the C call are pushed on the stack first. This enables a routine to use the information in the parameters topmost on the stack to determine the number of parameters that should follow. The C stdlib library routines printf() and scanf() use this technique. But, to keep life interesting, Pascal passes parameters in the opposite order. The leftmost parameters are pushed on the stack first. In addition, the return value for a Pascal routine is passed on the stack. The calling routine clears a location for the callees return value before pushing its parameters. A return value for a C routine is passed in register D0. By seeing this, you can tell from the assembler code what language a routine was written in. The Mac Toolbox was written in Pascal, so, all Toolbox traps have their data passed from left to right. Also, its possible in both Think C and MPW C to declare a C routine type modifier of pascal. This then makes it expect its parameters in Pascal order on the stack.
The most common instructions youll see in assembler code are move instructions that copy data from one place to another, it often seems that most of what any program does is move data around. Three move instructions follow:
move.b #4, d4
move.w (a4), d2
move.l #16(a3, d2.w), -(a7)
Notice the the name of the instruction is on the left and its two operands are on the right. The first operand is the source of the data to move, the second is the destination. The first moves the immediate data 4 into the lowest byte of data register four (d4). Immediate data is coded into the instruction, Ill talk more about that in the addressing mode section. We know that this is a byte move from the .b that follows the instruction name. Instructions can come in three sizes, b for byte, s or w for short integer or word (two bytes), and l for long integer (four bytes). In the case of byte and word operations, the lower byte or word is always used. The second instruction copies the data pointed to by address register four to data register two. The last instruction also uses an address register as a pointer, but does some address arithmetic before using it. It adds the 16 and the value in D2 to the value of A3 to get the address to copy the data from. Then, it decrements 4 from A7, and puts the data into the memory location now pointed to by A7. This latter part of the move is a stack push. Itll decrement 4 from A7 since the size of the move is long, or four bytes.
Although move instructions are the most common, following closely in use are the ALU instructions, so named for the part of the CPU which processes them: the Arithmetic-Logic Unit. These instructions add, subtract, multiply, divide, shift, rotate, and, or, xor, and negate data. There isnt just one instruction for each, instead, there may be several instructions for each operation. For example, the add operation is done by the ADD, ADDA, ADDI, ADDQ, and ADDX. The different instruction codes for the same operation are for handling different addressing modes or registers. ADD works just with data registers, ADDA works with address registers. ADDI and ADDQ work with immediate data, ADDQ uses a short form of immediate data that can range from one to eight. ADDX allows adding the source to the destination register, but also adds the extend bit. This last is used for multiple-word arithmetic operations. This sort of naming is common throughout the ALU instructions.
The ALU operations are generally done by taking the source data and destination data, operating on it, and leaving the result in the destination datas location. Naturally, this constrains some of the addressing modes. Immediate data isnt going to be possible for the destination data.
Both the ALU operations and the move operations affect the condition code. This is a set of flags held in the condition code register (CCR) that indicates whether the data is zero, negative, has overflowed, carried, or set the extend bit. Different instructions affect the CCR in different ways, some dont affect the CCR at all. For example ADDA doesnt change the CCR, its generally used in calculating addresses not data, all the other add instructions affect the CCR. If you get into reading assembler often, youll want one of Motorolas books on the 68000 family of CPUs as reference. Thisll be able to tell you, among other things, what instructions affect the CCR and in what ways.
The CCR is used by a set of instructions called conditionals. These instructions make decisions based on the flags in the CCR by directing the program to execute either one or another block of instructions. Or, just optionally skip a block of instructions. The conditional instructions are Bcc and DBcc. The cc is replaced with a conditional test code. This directs the CPU to test some combination of the flags in the CCR, if the condition is met, the jump is taken, otherwise, execution continues with the next instruction. DBcc is a special loop instruction. It tests the condition, if its true, then it executes the following instruction. Otherwise, it decrements the data register its got as an operand. If the data register is -1, it executes the following instruction. Otherwise, it jumps to its target address. Another conditional instruction doesnt make conditional jumps, instead it sets or clears the addressed data. Scc sets its destination location to zero, if the condition is false, or to all ones if its true.
The CCR can also be set with the CMP set of instructions. This does the same thing as a subtract instruction, but the result of the subtraction isnt saved. Just the CCR flags are set. This is a way of comparing two pieces of data so that the CCR reflects that comparison. The TST instruction also sets the CCR flags. However, since it just has one operand it just clears the overflow and carry bits, and sets the negative and zero bits according to the data.
There are also jump and jump-to-subroutine instructions. Jump just unconditionally jumps to a new location in the program. The JSR or jump-to-subroutine instruction pushes the current PC (the pointer to the next instruction) on the stack, then, just like the jump instruction, it puts the location of the jumps target in the PC and continues execution. The JSR is the part of a subroutine call. Parameters are pushed on the stack before the JSR is executed, then, at the end of the subroutine, the RTS instruction is executed. It pops the topmost longword off the top of the stack and puts in into the PC. This makes the CPU continue at the instruction just after the JSR. The other elements in the subroutine call are the LINK and UNLK instructions. The first pushes the value of its operand (an address register) or the stack, copies the stack pointer (address register seven) into its operand, then adds the displacement (its second operand) onto the stack pointer. This has the effect of saving the old routines stack frame, and allocating a new stack frame for the subroutine. A stack frame is the instantiation of a routines data, that is, the local data declared for a routine.
Some other useful instructions are designed to shortcut some common operations. LEA and PEA calculate the address location for the data through the given addressing mode and leave that as their result. LEA leaves it in the named address register and PEA pushes it onto the stack. This is a way of using a complex addressing mode and calculating the resulting address location on the fly.
EXG, SWAP, and EXT can also be useful operations. EXG swaps the values of its source and destination. SWAP exchanges the high and low words of the longword data addressed. EXT is used for typecasting bytes to words or words to longwords. Itll take the high bit of the byte, for example, and copy it to all the bits in the upper byte of that word. This has the affect of sign-extending the byte to a word. The word-to-longword conversion operates similarly.
How to find data: Addressing modes
When the CPU saves data from a register to memory or loads a register from memory, it uses an address to determine that memory location. But there are a number of ways that this address is determined. The address can be part of the instruction, in which case its called immediate. But this is only useful for the low-memory globals on the Macintosh as memory in the heap is moved around and the application may be loaded in a different area of memory next time its run.
More often, the datas address is calculated. A value from an address register is used, possibly in combination with an immediate value (coded in to the instruction) indicating an offset. Or, the offset can come from a data register. In some cases, as with jump instructions, the address is offset from the PC register. This is called PC-relative addressing.
An address can also be generated indirectly. An indirect address is a two-step process for the CPU. If we were using an indirect addressing mode with register A3 for example, the CPU would fetch the contents of the location addressed by A3, then use those contents as the address. Offsets can also be done from indirect addresses.
Addressing modes can be quite complex, but most compilers and programmers generally use a fairly small number. Although, the Motorola chips designers tried to implement operations that would be useful for compiler writers, the compiler writers found that it was usually easier to use the simpler modes. Doing otherwise required compilers to be larger and slower.
The two simplest addressing modes are immediate and register direct. Immediate mode is where the operand for the instruction is included in the instruction. The assembler syntax is to specify the operand with #data where data is the number. Often small or hardcoded integers are handled like this. Register direct is where the operand is already in a register. In that case just specify the register number: D0 is data register zero and A4 is address register four. These modes are the simplest.
The next set of addressing modes are called address register indirect. Theyre used when the data (operands) are in memory and an address register already has the datas memory address. There are several variations for this mode. Address register indirect, the simplest, uses the specified address register as a pointer to the data. The syntax is (A3), the parentheses indicate the indirection. The CPU also does address register indirect with predecrement or with postincrement. The address register is again used as a pointer, but with predecrement; its decremented before its use. With postincrement, its incremented after use. The amount of the increment or decrement is the size of the instruction: one for byte, two for word, and four for long. These modes are useful in scanning arrays or working with stacks.
When dealing with arrays or structures, the next two address modes are useful. Address register indirect with displacement adds a 16 bit (word or short integer) displacement to the pointer in the specified address register to get the datas address. Address register indirect with index mode also lets you specify a second register (either data or address) with an index value to add to the pointer. It also has a displacement number too, but this time, its only eight bits long. The index register can be treated as a word or a long by appending a size after the index register specification.
You have the tools now to read assembler. If youd like to write it, youll need more work. But, like anything, practice makes perfect. Often just knowing whats really happening is a help. Just remember that Ive found that the lower the level you are in the computer, the simpler things often are, its just that theyre in unfamiliar terms.
Center the window or dialog on the current screen. First,
find the height and width of both the window passed in and
the screen. Then take half of the total margin for the
width and height to find the top left point of the window.
Move the window. */
void center (WindowPtr w)
int wHeight, wWidth, /* Window heighth and width. */
sHeight, sWidth,/* Screen heighth and width. */
top, left; /* The new top-left of the window. */
; Make the stack frame
00000000 LINK A6,#$FFFE
; Save the registers well use.
00000004 MOVEM.L D3-D7/A4,-(A7)
; Get the parameter in A4
00000008 MOVEA.L $0008(A6),A4
if (w == 0L) return; /* If null window, ignore it. */
; Do a move, this sets the CCR
0000000C MOVE.L A4,D0
; PC-relative jump if zero to 52
0000000E BEQ.S *+$0044
/*Find the heighths and widths. */
wHeight = w->portRect.bottom - w->portRect.top;
; Get portRect.bottom
00000010 MOVE.W $0014(A4),D7
; Subtract portRect.top
00000014 SUB.W $0010(A4),D7
wWidth = w->portRect.right - w->portRect.left;
; Get portRect.right
00000018 MOVE.W $0016(A4),D6
; Subtract portRect.left
0000001C SUB.W $0012(A4),D6
sHeight = screenBits.bounds.bottom - screenBits.bounds.top;
; Get screenBits bottom
00000020 MOVE.W $000A(A5),D5
; Subtract screenBits top
00000024 SUB.W $0006(A5),D5
sWidth = screenBits.bounds.right - screenBits.bounds.left;
; Get screenBits right
00000028 MOVE.W $000C(A5),D4
; Subtract screenBits left
0000002C SUB.W $0008(A5),D4
/*Now calculate top-left point of the centered window. */
top = (sHeight - wHeight) / 2;
; Move screen height to D3
00000030 MOVE.W D5,D3
; Subtract window height
00000032 SUB.W D7,D3
; Prepare for division
00000034 EXT.L D3
; Divide height result by two
00000036 DIVS.W #$0002,D3
left = (sWidth - wWidth) / 2;
; Move screen width to D0
0000003A MOVE.W D4,D0
; Subtract window width
0000003C SUB.W D6,D0
; Prepare for division
0000003E EXT.L D0
; Divide width result by two
00000040 DIVS.W #$0002,D0
; Save width result in local var
00000044 MOVE.W D0,$FFFE(A6)
MoveWindow (w, left, top, FALSE); /* And center it. */
; Push the windowPtr on stack
00000048 MOVE.L A4,-(A7)
; Push the width result on stack
0000004A MOVE.W D0,-(A7)
; Push the heighth result
0000004C MOVE.W D3,-(A7)
; Push a place for return value
0000004E CLR.B -(A7)
; Trap call
; Restore registers
00000052 MOVEM.L (A7)+,D3-D7/A4
; Clear stack frame
00000056 UNLK A6
; And return from subroutine