|Column Tag:||Assembly Lab
Assembly Language for the Rest of Us
By Jeffrey B. Kane, MD, Boston, MA
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Introduction and Purpose
Ive always found that its the little things in life that hang me up. The grandiose projects somehow seem to fall in place, but when I want to do a simple project, using only obvious facts, I find that those facts must be so obvious, no one has bothered to write them down.
Most people who program do so to create little tools. As an engineer or scientist might we write our code in a high level language like Pascal, C, Fortran, or LISP in order to efficiently read and debug our code (especially with tools from people like Think Technologies and Coral Software). Once the code is up and running, we may notice that one of our procedures, lets say one that crunches a lot of numbers using SANE1, is taking up most of the processing time. By rewriting this one part of the program we can speed things up dramatically, often 10 times or more!
This article is about assembly language. Its not about writing a lightening fast game or a new version of Excel, rather its about using the minimum amount of programing we can get away with in order to write a short but useful little bit of code. Neither is it about writing a stand alone program in assembly, but instead we will take a program written in Pascal and rewrite just one of its procedures in assembly language, so the entire program can run much faster.
What we will do
In this article will recode a small subroutine whose mission in life is to take an array and multiply each of its elements by a scalar constant. Although the program is not the most dramatic use of assembly language, and the code we write will not be the most elegant, hopefully it will be clear and easy to understand2. We will be illustrating the steps needed to interface any code with a running Pascal program. I chose to make use of the 68881 floating point processor3 in our program, not to add complexity, but to illustrate how easy it is to use this chip. Motorola has made the 68881 appear simply to be an extension of the 68000 CPU4. To the programmer the combination of a 68000 and a 68881 appear as a souped up 68000, just like the basic version but turbocharged for numeric speed with a few more instructions and registers added to it. If you dont have this chip the same principles of programing apply. You would simply replace the 68881 instructions with a few more 68000 instructions, but the programing logic would be the same.
MPW Assembler, LSP, and MDS
I will be using LightSpeed Pascal® to develop our main Pascal program. As I explained above the basic program creates an array, then multiplies each number within the array by a constant. LightSpeed is a great product for writing quick and dirty programs, since you can debug and observe your variables as you go. It even has an assembly level debugger built right in, so if you want to peek at the actual memory to see exactly what your instructions are doing, you can. This is a great way to learn the simple and predictable way each line in your program will effect memory.
We will also be using Macsbug, a source level debugger that should be available either from your local computer dealer5, most user groups, bulletin boards, and/or commercial language products (including LightSpeed). With Macsbug we can look at everything the Mac is doing, while it follows our assembly language instructions as they execute, one line at a time. Tracing our code as it executes will make it clear that each instruction has very predictable and simple results (Remember that if a compressed piece of sand called the 68000 can understand what these instructions mean, so can you!).
Finally we will be using the assembler built into the MPW shell6 (version 2.02). Consulair also makes a stand alone Assembler that was the standard of the industry7.
In this article we will assume8 that you know a what a bit is (0 or 1), that 8 bits are stored in a byte, that 16 bits make up a word, and finally that 32 bits make up a long word9. We also assume that have some idea of what a hexadecimal number is10.
The Pascal Code
First lets take a look at the Pascal Code:
element = PACKED RECORD
empty : integer;
n : extended;
vector = ARRAY[0..19] OF element;
matrix = RECORD
rows : integer;
columns : integer;
vecPtr : ^vector
inVect : vector;
outVect : vector;
i : integer;
inMatrix, outMatrix : matrix;
scalar : extended;
error : integer;
FUNCTION ScaleMult (scalar : extended; VAR inMat, outMat : matrix)
scalar := 35;
inMatrix.rows := 0;
inMatrix.columns := 19;
inMatrix.VecPtr := @inVect;
outMatrix.rows := 0;
outMatrix.columns := 19;
outMatrix.VecPtr := @outVect;
writeln(scalar = , scalar);
writeln(scalar, inMatrix, outMatrix, error, inVect);
write(longint(@scalar), longint(@inMatrix), longint(@outMatrix),
FOR i := 0 TO 19 DO
inVect[i].empty := 255;
inVect[i].n := i;
error := ScaleMult(scalar, inMatrix, outMatrix);
writeln(error = , error);
FOR i := 0 TO 19 DO
writeln(i:5, inVect[i].n:10, outVect[i].n:10);
writeln(scalar = , scalar);
In this article, we will rewrite the function ScaleMult in assembly language. From our perspective, all that Pascal does is hand our assembly language program the information in the form we declared in the interface statement. Since we have declared our subroutine as a function, our responsibility will be to return a result to Pascal, also in the form we declared. Pascal does not care how we perform our task, only that we get it done.
The first unusual item that you may notice is our definition of an element. In most programming, numbers with decimal places in them are defined as real numbers. An example of a real number would be 3.14159 or 3.14159 E+0. The last part, E+0 means that we will multiply our number 3.14159 times ten raised to the the zero power (which is equal to one). We could have simply written the number as 3.14159 and Pascal would have interpreted it the same way. If the number contained E+1' then we would have multiplied our base number by ten raised to the first power (or 10), or we could have written the equivalent real number as 31.4159. Similarly E+3 would have meant multiplying by 103 (or 1000) and been equivalent to 3,141.59. SANE uses a more accurate form of the real number, called an extended number. The extended format can hold more decimal places than a real number, and it can also raise a number to a larger power than a real number (a real number can run between ±3.4E+38 and ±1.2E-38, whereas extended numbers can (approximately) extend from ±1.1E+4932 and ±1.7E-4932, with more decimal places stored for each number). All of those extra decimal places translate into more accuracy as we do our calculations.
In our program we have defined an element to be:
element = PACKED RECORD
empty : integer;
n : extended;
Instead of just holding an extended number, our element has an integer (which is empty) placed in front of it. We have done this to provide for the difference between the way SANE and the 68881 handle extended numbers. Although we will explain the reasoning below (its actually not hard to understand), for now simply consider each Element of the array as an extended Format number.
Basic Assembly Language
Now lets take a look at our assembly language function:
MC68881; We will be using
; instructions for
; the 68881 chip
MACHINE MC68020 ; optimize for the
; 68020, if you dont
; have one remove
; this line
ScaleMult FUNC EXPORT
; this lets the rest
; of the world know
; about our function
_Debugger OPWORD $A9FF ; this will pop us
; into Macs bugs so
; we can follow every
; instruction and
; observe the
; The following are all displacements so we
; can address our data on the stack, relative
; to the address stored in A6
result EQU 20 ; two bytes for our
; integer result
Scalar EQU 16 ; 4 bytes (address of
; the scalar)
inMat EQU 12; 4 bytes (address of
; the input Matrix)
outMat EQU 8
; 4 bytes (address of
; the output Matrix)
ReturnAdd EQU 4 ; 4 bytes
superScalar EQU -12
; 12 bytes (hold a
; place for our 68881
; version of the
oldA2 EQU -14 ; old the old value
; of A2
oldA3 EQU -16 ; old the old value
; *** Add the following features (to make
; this a real program):
; 1) make sure the number of
;rows and columns in the input ;
Matrix jives with that of the ; the output matrix.
;2) Fix the error message to
;tell the Pascal Program if
;there was a problem, such as
;if: rows X columns vector
; ***** The first real line of our program ;
(its about time!)
_Debugger; jump into Macsbugs
; so we can follow
; whats going on
; move all the initial parameters off the
; stack from the calling routine function
; ScaleMult (scalar:extended;var inMatrix,
; outMatrix:matrix) :error
link A6,#-12 ; push A6 (the frame
; pointer) onto the
; stack , load SP
; into A6, then
; subtract off 12
; bytes from A7 to
; make room for our
; after the link instruction
; the stack looks as follows:
;integer (2 bytes)
;------------ <- 20(A6)
scalar (4 bytes)
;------------ <- 16(A6)
;inMatrix (4 bytes)
;------------ <- 12(A6)
;oldMatrix (4 bytes)
;------------ <- 8(A6)
;returnAddress (4 bytes)
;------------ <- 4(A6)
;old A6 (4 bytes)
;------------ <- (A6)
; superscalar (12 bytes)
;------------ <- -12(A6)
; and initial (A7)
; now move the addresses onto the chip so we
; can work with them
MOVE.L A2,-(A7); push A2 onto the
; stack to save it
MOVE.L A3,-(A7); do the same to A3
; get the pointer to
; the address of
MOVE.L 4(A2),AØ ; get the address of
; the input matrix
; (4 bytes from the
; start of our matrix
; data structure )
MOVE.L 4(A2),A1; get the address of
; the output matrix
MOVE.L inMat(A6),A2 ; get the number of
; rows (first byte of
; Matrix record)
MOVE.W 2(A2),D1; then get the number
; of columns (3rd
; byte of matrix
; we will now move
; the scalar to a
; place with a bit
; more room
MOVE.L (A2)+,(A3)+ ; move the first 4
MOVE.L (A2)+,(A3)+; move the next 4
MOVE.W (A2),(A3); move the last byte
MOVE.L (A2),D2 ; get the top
; long-word of
; the scalar, shift
; it, then put it
FMOVE.X(A2),FP1; get the modified
; scalar and store it
; on the 68881
Loop MOVE.L(AØ),D2; get the current
; high Long word of
; the element from
; the input matrix
FMOVE.X(AØ),FP0; get the current
FMUL.X FP1,FP0 ; multiply by the
FMOVE.XFP0,(A1) ; put the element
; back in RAM
MOVE.L (A1),D2 ; now shift the high
; Long word back (in
; the output
; matrix,and then put
; it away in RAM
MOVE.L (AØ),D2 ; now shift the high
; Long word back (in
; the input matrix)
; and then put it
ADD #12,AØ; increment the
; elements address
; (to get the next
DBLT D1,Loop ; decrement the
; number of columns
; and test
; if we are here we have gone through one
MOVE.W 2(A2),D1; restore the number
; of columns
DBLT DØ,Loop ; test if we have
; completed through
; all the rows
; if we are here, we have gone through all
; the rows
; since we have done our work lets put away
; our toys (clear the stack of all the
; garbage) and go home.
MOVE.L (A7)+,A3 ; pop off the old A3
MOVE.L (A7)+,A2 ; pop off the old A2
MOVE.L (A7)+,A0 ; save the return
ADD.L #12,A7 ; move the stack
; pointer clear all
; the data off the
CLR.W (A7); replace the return
; value on the stack
; with ours
; ** NOTE: we are pushing a value of zero
; onto the stack, meaning that nothing went
; wrong. This is bogus and in the real
; version we need to fix this!
MOVE.L AØ,-(A7); push the return
; address back onto
; the stack
RTS ; return to reality
; (our calling
The 68000 Instructions:
First of all the basics. Anything after a semicolon ( ; ) is a comment and not a command. Our Assembler will ignore all comments, just as if they had never been typed in. Each line of code has three basic parts11. If you look at the code you can see that it seems to be divided into three columns. The first column begins with the first character of each new line and is called a label. Labels are used so we can refer to that line of code from within our program. Anything in the second column is an instruction for the 68000 (or the 68881). Anything in the third column tells the chip how (and where) to execute that instruction.
A couple of the instructions that appear at the beginning of the program are actually instructions for the Assembler12 and not the chip. The instruction EQU as in ReturnAdd EQU 4 tells the Assembler to substitute the number 4 every time it sees the label ReturnAdd in our code. We could have written the number 4 in ourselves, but using the label ReturnAdd makes our code a lot more understandable13. FUNC tells the Assembler that this is the name of a function. The word EXPORT tells it to let the rest of the world know that a function exists by the name ScaleMult and where that function can be found. MC68881 states that we will be using some of the 68881 instructions (otherwise our Assembler would say we must be making a mistake when we try to use them). MACHINE MC68020 lets the Assembler use instructions that are available only on the 68020 chip on a Mac II. The MPW Assembler will replace some of our instructions with its own if it sees a faster way of doing the same thing. This is called optimizing the code. If you are not running with a 68020 delete this line. OPWORD is similar to EQU except that it lets us define a new instruction by its numeric equivalent, whereas EQU lets us define a constant used by the instruction. ENDF and END signal that we are finished writing our function and code, respectively.
The 68000 chip has several registers for holding numbers. These can be thought of as Mail Box slots if you will. A mail box can either hold a parcel (your data) or a slip telling you where your parcel is (an address). Our registers take 32 bit numbers and hold them on the chip. Eight slots (or registers) hold simple data. Another eight registers hold 32 bit addresses (see Figure 1). Address are simply numbers that tell the Mac which byte in the memory holds your data. 0 is the first byte in RAM, 1 is the second, etc. Remember that both Data and the Address registers hold simple 32 bit numbers. These registers are referred to as D0, D1, , D7 for the data and A0, A1, , A7 for the Addresses. To manipulate numbers on the computer we simply move a number from RAM, into one of these registers, do what we want to it, then put it back. The fact that the 68000 CPU has so many registers on the chip means that we dont have to swap back and forth between RAM and the chip nearly as often as you had to on older chips. Our basic command for moving data around is MOVE (doesnt that make sense?).
To get a number that is stored in the 128th byte of RAM, and move it to register D1 on our chip would simply write MOVE 128,D1 (see Figure 2). To actually place the number 68 into register D1 we would say MOVE #68,D1 (see Figure 3). The symbol # means to take the number and use it literally, versus using the number as the address of our data, as we showed in Figure 2.
Putting the name of a register in parentheses means to use the address stored within that register to find our data. This is shown in Figure 4. The same instruction without the parentheses means to use the contents of the register directly, as shown in Figure 5.
Figure 2. MOVE 128, D1 (get the data from memory location 128)
Figure 3. MOVE #68, D1 (put 68 in register D1)
Figure 4. MOVE (A1), D! (register A1 contains the address of our data)
Figure 5. Move A1, D1 (register A1 contains the data)
Figure 6. How Numbers Are Stored in the 68000 Registers
Basic 68881 Commands
Notice that many of the instructions have suffixes attached to them, such as MOVE.W or MOVE.L (and MOVE.B). These simply tell the 68000 what size number we are moving, .B for a Byte (8 bits long), .W for a Word (16 bits long and the default if you forget to specify a suffix), or finally .L for a Long word (or 32 bits). Some of the Floating Point instructions specify a .X which tells the 68881 to move three long words (or the 96 bits it takes to specify an extended number). The .X extension is only understood by the 68881, so if you added it to a 68000 instruction, the CPU would not know what you where talking about. When we move different size numbers onto the chip they are always placed so that bit zero is aligned all the way to the right. Thus the three numbers would be stored as in Figure 9.
The instruction FMOVE works the same way as MOVE, but allows you to move numbers to and from the 68881 coprocessor chip. By using the suffix .X (as in FMOVE.X) we can tell the coprocessor that you are manipulating a 96 bit extended number, although .L, .W, .B will also work if you are referring to these quantities. The 68881 FPP has 8 registers of its own (FP0 to FP7), each with enough room to store an extended number.
Figure 7. How numbers are stored in RAM
Bits and Bytes
I am not going to go over a lengthy explanation of bits, bytes, or how to convert between decimal numbers and hexadecimal numbers (since these are just different ways of writing the same number). There are many good books on these kinds of basics, and they are referred to in the appendix [1-3].
The 68000 stores numbers in memory in the same way that you would read them from a piece of paper. It starts at a low memory address and writes the number, starting with the most significant digit and going to the least significant digit. If another number is to be written, it will repeat the sequence. Thus if I was going to store the number $38240193 at memory location 128, and then store the number $42310000 in the next highest position (which would start at memory location 132) the memory would appear as in Figure 7.
The stack is a key concept to writing assembly language. The best example of a stack are those old plate stackers that we all remember from the school cafeteria. If you imagine that each plate is a number, you simply place the next plate on the stack, piling them up as you go. If you need a plate, you simply pop it off the top. The 68000 uses the same concept to store numbers. To define a stack you first pick where in memory you want to start. This starting address is usually placed in register A6 (this is simply a convention and we could just as easily stashed it in A4 or any other register). We then place a number on the stack (we call this pushing the number onto the stack) by simply writing the number into memory, using the memory location pointed to by register A6. The next thing we want to keep track of is the end of our stack. We can do this using another one of our registers, A7. If we just added 2 bytes to the stack by pushing our number , then we need to adjust register A7 to point to the next free byte of memory, or in other words, the end of the stack.
What if we want to take the top number off of the stack? A7 points to the top, so we get our number, move it somewhere else, then change A7 to point to the new top of the stack. Using this system we can push numbers onto the top of the stack, and pop them off at will. This type of memory arrangement is called a Last In/First Out system, since the last number we pushed onto the stack is the First number we have available to pop off again.
Most computer systems use the stack as a method of storing numbers. The only little twist to the way the Macintosh handles a stack is that instead of starting at a low memory address and working higher, the Macintosh starts at a High memory address and works downward, toward lower memory. This twist doesnt change the way we work with the stack. If we decide that we want to start our stack at memory address 128, we store 128 in register A6 and then set A7 to point to the end of the stack, which in this case is also 128. We now want to add a 4 bytes number onto the stack ,such as $32411000. We simply subtract 4 from the address in A7 (or 128 - 4 = 124), and Move our 4 byte number to RAM, starting at address 124. This is accomplished with the command MOVE.L #$32411000,-(A7). If we look at the RAM, starting at address 124 we will now see the number $32411000.
Take a close look at the instruction we just issued. By appending .L to the MOVE command we have told the assembler that we are moving a long word, which is by definition 4 bytes long (or 32 bits). The #$32411000 portion used tells the 68000 to use the number $32411000 literally. Finally the command -(A7) does something special. It tells the 68000 to subtract the length of our longword, 4, from the address stored in A7, and place the resulting address back in A7. It also does this before it executes the MOVE instruction, so our number gets moved to address 124 instead of 128. This leaves 124 in A7 when we are finished so our stack pointer now points to the end of the stack again. We call this kind of addressing Predecrement Indirect. Dont worry about what this mode of addressing is called, just take note of what it does. The fact that the 68000 can do these kinds of instructions in one step makes life very simple for us. The 68000 can also do the converse operation. Lets say we want to remove our number and store it in RAM starting at address 200. We will then issue the command MOVE.L (A7)+, 200. Again the suffix .L tells us that we are dealing with a 4 byte longword. Note that if we used .W instead the 68000 would do the same calculation, but know that we were talking about moving a (2 byte) word instead. Our original command, MOVE.L, will get the 4 byte number, starting at the memory location pointed to by register A7, and move that number to memory address 200. After it is done, it will add 4 to the address stored in register A7 so that A7 now contains 124+4 or 128, which is the top of the stack again. This type of addressing is called Postincrement Indirect. We have just successfully pushed a 4 byte number onto the stack, and then popped it off again, all the time keeping register A7 pointing to the end of the stack.
We can use this algorithm to push many numbers onto the stack, one after another. This is exactly what a Pascal program does when it calls your assembly language subroutine. It pushes all the variables it is going to hand you onto the stack, and then jumps to your routine. Since we know the kinds of numbers that the main program will pass to us, we can pop them off, do our calculation, push our result back onto the stack, then jump back to the main program. We will talk more about actually doing this a little later.
Lets look at a few more Assembly language instructions. The instruction CLR (which stands for Clear) will put all zeros at an address. Thus CLW.W 128 will put 2 bytes worth of zeros starting at address 128. CLW.L (A1) will store 4 bytes worth of zeros at the address stored in register A1. CLW.B D1 will fill the 1st byte of register D1 with zeros.
The instruction ADD will do what its name would suggest. ADD.L D1,D2 will add the two long words stored in registers D1 and D2 and store the result in D2. ADD.W (A1)+,D1 will get a number from memory, stored at the address contained in A1. It will then add that number to the number stored in D1. The final result is stored in D1. The fact that we wrote (A1)+ means that after it completes all of this addition, it will increase the address stored in A1 by 2 (since we specified .W). This is very useful if we are adding a bunch of numbers to D1 and they are stored in memory one after another starting at the address in A1. After we execute our ADD (A1)+,D1 command, A1 now automatically points to the next number in memory. This command, when placed in a simple loop, is a powerful way to add many numbers.
Figure 8. The Macintosh Stack
Another use for this type of addressing to move a range of bytes from one part of RAM to another. If we store the beginning address of our data in A1, and the address of the place we want to move it to in A2, by repeatedly calling MOVE.L (A1)+, (A2)+ we move our data from (A1) to (A2) in RAM, afterwards both the addresses stored in A1 and A2 are automatically incremented by 4, so we are ready to move the next long word. Nifty, isnt it?
The commands LSR and LSL stand for Logical Shift Right and Logical Shift Left. What they do is move each bit in memory to the right or left a certain number of places. The bits that fall off the end disappear14, and the places that we leave empty are replaced by zeros. An example of a Logical Shift Left is shown in Figure 9. LSL.L #5,D1 will shift the binary number stored in register D1 to the left 5 places by pushing zeros onto the right. A LSR instruction works in the converse fashion
Figure 9. LSR #5, D1
LSR and LSL will become important later on when we want to quickly manipulate our numbers. The next instruction that we use a bit is LEA which stands for Load Effective Address. An example might be LEA Scalar(A6),A2. This instruction calculates the address using the code 16(A6) and stores that address in register A2 (remember that we had defined the label Scalar to be the same as the number 16). Note that it doesnt go to the address, just calculates what it should be. Unfortunately you probably dont know what 16(A6) would be, but it is not hard to figure out. We take the address stored in register A6 (lets say its our old friend 128 again) and add 16 to it. Take the result (128 + 16 = 144) and put it in A2. We can now use the address in A2 for other instructions. This type of instruction is good if we know we want a number stored 16 bytes away from some base address, and that base address will be given to us later. When we get that base address, we can store it in A6 and use our LEA instruction to find the address of our data. Many times we will know the form a bunch of data, i.e. an integer is stored first, then a real number, then a pointer to an array (i.e. the address of some other data), but we wont know exactly where this data is going to be in RAM. LEA lets us get the beginning of our data and calculate where everything else should be.
The final instruction that we need to look closely at are the LINK and UNLK instructions (i.e. LINK A6,#-12 & UNLK A6). If you remember when we were talking about stacks, we said we would remember the beginning address of the stack by putting it in register A6. The technique of beginning your stack at A6 is called stack frames. The way it works is as follows: The main Pascal program has its own stack. It pushes all of the data it needs to give your subroutine on its stack. Finally it pushes a return address onto the stack (so you know which instruction to return to when youre done executing your code), and then jumps to your assembly code. You dutifully save the return address by popping it off the stack and moving it someplace safe. You then pop all of the data off the stack and are ready to go.
Figure 10. The Stack After LINK
At this point you most likely want to create your own stack, and the best place is right at the end of the main programs stack. The first thing you do is save the old value of A6 that marks the beginning of Pascals programs stack. A great way to store this value is to push it right onto the stack. Take the new value of A7 (the address that we are going to use as the base of our new stack) and store that in A6. We are now ready to go. A6 and A7 both point to the beginning of our stack. We then increment the stack pointer (A7) to make room for any storage space that you might want to use. As we push and pop values to and from our stack we will adjust A7 appropriately with the predecrementing and postincrementing commands that we discussed. These steps are illustrated in Figure 10. All of the steps necessary to create a stack frame are contained in the LINK instruction. The instruction LINK A6, #-12 will first push A6 onto the stack after decreasing A7 by 4 (since an address is a long word of 4 bytes). The new new value of A7 is copied into A6, since that will be the base of our new stack. Finally the stack pointer A7 is decremented by 12 bytes to make room for new data that we might want to store there. In this case we decremented the stack by 12 since, if you remember, our stack grows downward in memory. Decreasing A7 by 12 leaves room for 12 new bytes of storage space (which would be just enough room for us to stash a 96 bit extended number in the 68881 format!).
The nice thing about having A6 point to the beginning of our stack is that when we are ready to jump back to the main program we can take A6 and copy it into A7. That maneuver has just made our stack pointer point to the bottom of our stack frame. We then pop the old value of A6 off the stack and put it back into register A6, then add 4 to the contents of A7, just like our old friend the MOVE (A7)+,A6 instruction. We now have returned the stack to its state just before we created our stack frame. The 68000 does all of this with another single instruction UNLK. In our case UNLK A6 will do all of the above steps.
The second nicety of having A6 point to the bottom of our stack frame is that we can easily store our own variables on the stack. Although we do not know in advance where A6 will point, we do know the relative position of our data on the stack. Lets say one variable will be located 4 bytes before A6, another will be 6 bytes before A6, etc. By using addresses of the form -4(A6) and -6(A6) we can access these variables whenever we need to.
Another advantage of addressing all of our data relative to the stack frame in (A6), is that we dont necessarily have to waste time moving the variables that we received from our Pascal, off of the stack. Since we know what the stack looks like after we execute our LINK A6,#-12 instruction (see Figure 10) we know that the pointer to inMat is located 8 bytes prior to (A6), and that outMat will be 12 bytes before (A6). We then use the same form of addressing (e.g. MOVE.L 12(A6),AØ) to move outMat to AØ so we can work with it.
In order to quickly manipulate every element in our matrix, we will create a loop that performs our calculation, then repeats itself with the next element in the matrix, until we get to the end. To create a loop and test for a specific condition (so we know when to stop looping) we will use the instruction DBLT (decrement and branch if less than zero). This is one member of a class of instructions written in shorthand as DBcc, where cc is a test. The easiest way to think of this instruction is dont branch if the condition is true. The instruction works by first testing if the test condition is true, i.e. did we just execute an instruction that resulted in an answer that was less than zero (or in other words a negative number). If the prior instruction created a negative number then dont branch (which would then leave our loop, just as branching would bring us back to the beginning of our loop), but instead execute the next instruction. Now if the last instruction created a positive number (and thus the test failed) then we decrease the register D1 by one and see if D1 is now less than zero. If it is, then we dont branch but instead continue on to the next instruction. If neither of these condition are true then round we go again, branching to the beginning of our loop. This instruction has several advantages. First it looks for a specific condition that will end the loop. Failing this, it counts down a register until it is less than zero and then exits the loop.
The last instruction in our program is RTS (Return from Subroutine). This instruction assumes that we have pushed the return address back onto the top of the stack and will jump to that address. If we had stored our return address in AØ we could have accomplished the same feat by using the command JMP (A0) (jump to the address stored in register AØ).
SANE vs. the 68881
Although SANE and the 68881 both use the same basic format for an extended number, there is one fundamental difference. In SANE the extended number is 80 bits (10 bytes) long. On the other hand, the 68881 places two bytes of zeros just before the last word in the number (this is illustrated in Figure 11). The 68881 does this because it is more efficient for it to move three long words than to manipulate 2 1/2 long words. We can easily convert back and forth between these formats by grabbing the last word of a SANE extended number and shifting it over 16 spaces to the left (using the LSL instruction). When we want to return the number to SANE we simple shift the top long word of our 68881 extended number back to the right 16 spaces (using LSR)15.
Figure 11. SANE vs. 68881 extended numbers
The interface for a Pascal calling routine
A Pascal program calls a procedure in a very orderly fashion. In our example the procedure that we will write in assembly language is declared to Pascal as:
FUNCTION ScaleMult (scalar : extended;VAR inMat, outMat : matrix) : integer;
Pascal will prepare to call our subroutine by first pushing enough space onto the stack for our result. If our subroutine was declared as a procedure, Pascal would not expect a return value and would skip this step. Following this, Pascal pushes its data onto the stack as read from left to right. In our case Pascal would next push a pointer to the variable scalar. Pascal pushes a pointer16 to the data if it is declared as a VAR parameter (meaning that the subroutine can change the actual variable), or if the data is larger than four bytes in size17. A pointer to inMat is then pushed onto the stack followed, by a pointer to outMat. Finally the return address is pushed onto the stack and the program then jumps to our subroutine. Pascal cannot do any type checking when it jumps to an assembly language routine. It simply assumes you know what kind of data you are expecting, and that you will place the correct result on the stack (if your routine is a function and not a procedure) when you return to the main program.
Figure 12. The Stack as passed from Pascal
When the Pascal calling routine tries to make space for the return variable (the very first thing we said that Pascal would push onto our stack if it is calling a function) it follows the Pascal rule of pushing the actual variable if it is 4 bytes or less, or, as is the case of our extended variables which is larger than 4 bytes, it pushes a pointer to the variable. We would then use that address to store the result of our functions. When we return to Pascal we would leave the address of our return variable on the stack so the calling Pascal program can remember where to get its result.
In our assembly language program we will remove all the information from the stack, complete our routine, then push an integer (2 bytes) onto the stack as a result. Our final maneuver will be to jump back to the return address that was given to us initially. Figure 12 shows us the stack as it would appear when we initially enter our subroutine.
One point I would like to clarify is this business of Pascal pushing a pointer onto the stack if that variable is more than four bytes long, or if it is declared as a VAR in the interface statement. Figure 13 shows the size of some common variables used in Pascal.
Variable What is passed on the stack
integer 2 bytes
longInt 4 bytes
real 4 bytes
extended 4 byte pointer
to the 10 bytes SANE
pointer 4 bytes
string 4 bytes pointer0 to your string if it is
over 4 bytes in size) A
string takes up n+1
bytes where n is the
number of characters.
The very first byte tells
you how many
characters are in the
boolean 2 bytes (put your result in the
least significant byte of
the word when you
return to Pascal)
char 2 bytes
anything 4 byte pointer to the variable in RAM
declared as var
Figure 13. Size of Pascal Variables Pushed onto the stack
Remember, if we wrote a procedure instead of a function we would leave the stack empty when we returned to the main program. An illuminating example would be if we had declared our function to return an extended number. Since we dont want to push the entire 10 bytes of data onto the stack we simply store our data at the address supplied by the calling routine. I always considered it courteous of Pascal to take care of the memory for any variables our function needs to return.
In our program, we are returning an integer which is two bytes in size, so we can just push the actual value of the integer onto the stack, and not worry about pointers at all. Remember that Pascal has already made enough room on the stack to hold our integer.
Pascal lets us freely use registers A0, A1, D0, D1, and D2. We can of course use any of the registers on the 68000 chip, just so long as we save the values that are stored there and put them back, before we return to the Pascal environment.
Now that we understand all the the basic instructions, lets go through our assembly language code. Our program receives three pointers (or addresses) from Pascal. The data is stored at these addresses in SANE extended number format. We take each of these numbers, and shift over the top word so it is in the correct format for the 68881 chip. Next, we move one element of our matrix (from inMat) onto the 68881, along with the scalar, multiply them, then save the result in the output matrix (outMat). We repeat this for all of the numbers in our matrix before returning to the main program. The LINK instruction is used to illustrate how we can make room on the stack for our own variables, easily accessing them relative to the base of the stack. Our funny data structure for an element may make a little more sense now, in light of the above differences between the SANE and 68881 representation of an extended number. Each element of our vector is a record of the form
element = PACKED RECORD
empty : integer;
n : extended
the empty integer that I have thrown before each SANE extended number takes up one word of space (16 bits). This gives us enough room to change all of our 80 bit SANE extended numbers into the larger 96 bit 68881 format. Because we have this extra room we can store our numbers right back in their array. This may not be important for our simple little program, but if we were doing a lot of math with these numbers, converting back and forth for each mathematical operation would take a lot of time. This way we can convert each element to the 68881 extended format when we start our program and convert them all back to the SANE format when we are finished.
A vector is simply defined as an array of many elements.
vector = ARRAY[0..19] OF element;
matrix = RECORD
rows : integer;
columns : integer;
vecPtr : ^vector
A matrix is a data structure that specifies the number of rows and columns it will contain, followed by a vector that is rows ¥ columns in size. Again, all of this is not strictly needed for our code, but it does illustrate how we can write a general algorithm that would take an matrix of an arbitrary size, do whatever math is needed, and then store the results in a new matrix.
How to Compile and Put the Program Together
I have always hated articles that give you great code18, but then leave you in the lurch as to exactly how they put it all together. Im going to give instructions for using the MPW compiler, since it already supports the 68881 and 68020 chips.
First of all open the MPW shell and type Command-N to get a fresh document. Type in all of the Assembly code, as it is written (of course you can leave out the comments if you want). Save this document as ScalarMult.a. Click on the Worksheet19 and type the Magical incantation Asm ScalarMult.a followed by the enter key (or Command- RETURN if you dont have an enter key on your keyboard). [Please note that the return key and enter key are treated differently in the MPW environment.] If you didnt make any mistakes then MPW will have just created a file named ScalarMult.a.o. If everything worked out alright then quit MPW.
Version 1.11 of LightSpeed Pascal includes a little utility called the .O converter; Double click on this icon, then select our assembly code file ScalarMult.a.o from the dialog box. After you quit .O converter, You will see that we have just created a LightSpeed library file.
Our final chore will be to launch LightSpeed Pascal and open a new project. First get a fresh window up, and type in the Pascal main program. Next go to the Project Menu and select Add Files . Add the main program that we just typed in, along with the library file we created from our assembly code. Thats all there is to it!. You can either run the program by typing Command- G (for Go) from within Lightspeed, or elect to build a stand alone program20.
Remember that our procedure will jump into Macsbug as the first instruction, so that we can follow the program. If you dont have Macsbug in your system folder , remove the instruction _Debugger or the computer will bomb with an ID=1 error
In this article, I hope I have introduced enough basic information to let you write an assembly language subroutine. We looked at the basic 68000 instructions, and a few instructions from the 68881 numeric coprocessor ships. As we saw 68881 instructions are easily executed, as if they were part of a souped up 68000 CPU.
We saw how Pascal calls functions and procedures and we created a short assembly language subroutine that could stored its own local variables and manipulated them. Finally we looked at how to compile and interface the assembly language using Light Speed Pascal and the MPW assembler. I hope that this introduction lets you see that assembly language is not as intimidating as it looks, and that by selectively rewriting certain Pascal procedures we can realize great increase in the speed of many of our programs, especially those that are calculation intensive.
---------- Footnotes ----------
1. Apples Standard Apple Numeric Environment which handles complex math very well, if not a little slowly
2. Note that in many cases we are presenting a simplified picture of what our Assembly language instructions do. If you are an experienced Assembly language programmer, you may note some deviations from fact (i.e. white lies). This is to shield the inexperienced user with unnecessary, complicating details, and will be noted. Come to think of it, if your such a hot shot programmer, why do you need to read this? See any of the books in the reference section for a more complete picture of Assembly Language programing.
4. Central Processing Unit.
5. I know... when you ask the salesman, theyve never heard of it.
6. Macintosh Programmers Workshop
7. And the only native Mac language system supported by Apple for a long time
8. Yes, I know the definition of AssUMe.
9. If you didnt know it, you do now.
10. A hexadecimal number is a number, similar to our ordinary base 10 counting numbers. Instead of counting from zero to nine (then indicating ten by putting a one in the next significant place, as in 10), in hexadecimal we count from zero to fifteen (representing the numbers ten through fifteen with the symbols A through F). We would then represent the next number, 16, by putting a one in the next significant place as in $10. I will use the $ sign before any hexadecimal number to you dont confuse $10 with the decimal number ten. Hexadecimal notation is a convenient way to write numbers when we deal with computers. See any of the books in the reference section for a good review of the subject.
11. Not counting the comments
12. The program that translates our written words into the actual numbers that the 68000 and 68881 understands
13. At least for us humans. The computer prefers numbers and will substitute numbers for all of our words, including the instructions. Oh well to each his own.
14. Not exactly, each bit first goes to a place called a Carry Bit, before it drops off into never never land, but thats not really important.
15. An even easier way to accomplish the same thing would be to MOVE the last word over, for example using the instruction MOVE 8(AØ), 1Ø(AØ), provided AØ pointed to the beginning of our extended number. You should always follow the above instruction with CLR.W 8(AØ) to ensure that the bit 64 through 80 are zero, just to ensure compatibility with future products.
16. A pointer is simply the address of where our data starts in RAM
17. Note that the pointer given to us actually points to a copy of the variable, known as a dummy variable. This will prevent us from accidentally changing the value of the variable in the main program.
18. Dont worry, Im not assuming that you have gotten any great code from this article
19. Its the document that doesnt have a go-away box
20. Of course you could have complied and linked you program totally from within MPW and MPW Pascal, but if you know how to do all of that, you probably didnt need my help to begin with.
---------- Bibliography ----------
 MPW and Assembly Language Programing for the Macintosh, Scott Kronick, Hayden Books, 1987
 Programming the Macintosh in Assembly Language, James Coffron, Sybex Inc., 1986
 How to Write Macintosh Software, Scott Knaster, Hayden Book Company, 1986