Workstation
Volume Number:   3

Issue Number:   3

Column Tag:   MacCad

Workstation Potential of the New Macs
By Paul Zarchan, Cambridge, Mass
Paul Zarchan is an engineer for a well known east coast laboratory with a heavy investment in engineering workstations. In this article, Paul examines the potential of the new Macs to penetrate the world of Sun, Apollo and other desktop CAD systems. MacTutor is committed to covering this important applications development area of MacCad on the new Macs in the coming year and this is our first article for this new column.
Introduction
Today there are nearly 10 million personal computers in both the home and business world, including nearly 1 million Macs, according to Apple's press releases. Much has improved with personal computer hardware since the first Apple II was introduced 10 years ago. In addition, the personal computer industry has spawned the creation of new software applications such as word processing, spread sheets and desktop publishing. These desktop computers also have another, albeit relatively unknown capability  computing! It will be demonstrated that a currently available $6,000 Prodigy 4 (Macintosh plus Levco board) can run engineering simulations at approximately one half the speed of a $250,000 super minicomputer and about one tenth the speed of a multi million dollar mainframe. With the advent of a new family of Macintosh Workstations based on technology similar to the Prodigy board, it is expected that Apple will move heavily into the engineering workstation market. How effective will this new market be for Apple? If this article is any indication, the new Macs will be quite successful indeed!
A New World for Micros
Digital computer simulation is often used as an aid in solving many engineering problems. For example, simulation is the heart of many circuit analysis programs. In this application, system dynamics are modeled with differential equations and numerical integration techniques are utilized for solving these equations. Unlike most PC software applications, the speed of floating point arithmetic is critical to simulation. A slow computer or inefficient language can make the simulation of high order systems unattractive because of excessive computation time. It will be demonstrated that for simulation on a Macintosh, MacFORTRAN provides execution speed that is unmatched by any other high level language. A chart is presented showing how to predict execution time based upon system order, simulation time and integration step size for simulations written in FORTRAN on a variety of computers.
A simple, standard floating point benchmark, written in interpretive BASIC, is first used to compare a variety of 8, 16 and 32 bit commercially available PC's. Based upon the relative computer execution speeds, an empirical relationship involving number of bits and computer clock rate is developed not only to understand why certain micros are faster than others but also to be able to predict the relative speeds of PC's which are not yet commercially available. Next a simulation example, involving the transient response of a Butterworth filter, is programmed in FORTRAN. It is demonstrated that although this more realistic example yields different speed comparisons between micros as did the BASIC floating point benchmark, the trend is in the same direction  more bits and higher clock rates increase computation speed. The same simulation example is then run on a Macintosh computer using a variety of computer languages. It is shown that on this computer FORTRAN and compiled BASIC offer computation speeds superior to that of either compiled PASCAL or C. Finally, using FORTRAN as the common language for the simulation example, a variety of micro, super mini and mainframe computers are compared in terms of execution speed. It is shown that a 32 bit 16 Mhz Prodigy 4 (Macintosh computer plus Levco board) can run simulations at approximately one half the speed of a VAX 785 and approximately one tenth the speed of an IBM 3084Q.
Microcomputer Comparison
In order to compare microcomputer speed, a common example written in a common language is required. A standard benchmark, representitive of single precision arithmetic, was developed by BYTE magazine using the BASIC language. The BASIC language was chosen as the standard because it is available on all microcomputers. The program listing appears below. Note that the question of accuracy, which of course is vital to engineering applications, is not address here.
10 PRINT TIME$
20 a=2.71828
30 b=3.14159
40 c=1
50 FOR i=1 TO 5000
60 c=c*a
70 c=c*b
80 c=c/a
90 c=c/b
100 NEXT i
110 PRINT TIME$
120 END
Listing 1  BYTE Floating Point Benchmark
We can see from the listing that the program executes 10,000 single precision multiplies and divides. The timing results for a variety of commercially available PCs along with other information, appears in Table 1. In addition, the last column of Table 1 indicates resultant running times compared to an IBM PC in the form of a ratio. A ratio of 2.2 means that the indicated microcomputer was 2.2 times faster than an IBM PC for this BASIC benchmark. We shall see later that the ratios for computer speed using BASIC on this benchmark are indicative of relative computer speeds using other languages and more realistic problems.
Table 1 indicates that the microcomputers fall into three different speed classes. Generally these speed classes are a function of the number of bits and computer clock rates. The table shows that, contrary to popular opinion, the 16 bit 80286 based IBM AT is in the same class as the 68000 based 16 bit Atari, Macintosh and Amiga computers for the BASIC benchmark. The Table also shows that computer clock rate is important in determining computer speed. In fact, doubling either the clock rate or doubling the number of bits doubles the computer speed. For example, a Prodigy 4 is approximately 4 times as fast as a Macintosh (32 bits  16 Mhz versus 16 bits  7.8 Mhz) and a 6 Mhz IBM AT is nearly 2.7 times faster than an IBM PC (16 bits  6 Mhz versus 8 bits  4.77 Mhz). In the IBM world the importance of clock rate is well known. That is why there exist many third party upgrades to increase the clock rate of an IBM AT or compatable up to 12 Mhz. Figure 1 displays in bar chart form, the actual speed for the BASIC benchmark of various micros relative to an IBM PC. Superimposed on the chart is a line graph using the empirical relationship involving bits and clock rate. The proximity of the line graph and bars shows that the simple empirical relationship is quite accurate in the case of the standard BASIC benchmark.
Machine Bits Clock Time Ratio
Commodore 128 8 60 1.15
Apple 2E 6502 8 1.0 94 .73
IBM PC, XT  8088 8 4.77 69 1
Tandy 1000  8088 8 4.77 68 1.01
IBM AT  80286 16 6 26 2.65
Compaq 286  80286 16 6 22 3.1
Atari 520 ST  68000 16 8 31 2.2
Amiga  68000 16 7.16 18 3.8
Macintosh 512, Plus * 16 7.8 21 3.29
Compaq 386  80386 32 16 7 9.86
Prodigy 4  68020 32 16 5 13.8
* Language was MS Basic 3.0 Interpreter
Table 1  Microcomputer Speed Comparison
Fig. 1 Speed Improvement Predicted Knowing Bits and Clock Rate.
Simulation Example
In order to compare various machines, a more realistic example, representitive of simulation, is taken from filtering theory. A sixth order Butterworth filter can be represented by the transfer function:
where x is the filter input, y is the filter output, w0 is the natural frequency of the filter and s is the Laplace transform notation for a derivative. The coefficients of a sixth order Butterworth filter are given by:
a1 = 3.86

a2 = 7.46

a3 = 9.13

a4 = 7.46

a5 = 3.86

a6 = 1

Cross multiplying the numerator and denominator of the transfer function and solving for the highest derivative yields the following sixth order differential equation:
where each "dot" represents a differentiation. This differential equation can be represented in block diagram form as shown in Fig. 2. In this diagram each "1/s" represents an integration. The output of each integrator has been labeled x(1) thru x(6).
For simulation purposes it is more convenient to express the sixth order differential equation describing the Butterworth filter by 6 first order differential equations. These equations can be obtained from Fig. 2 by inspection and are given by:
A FORTRAN listing of the simulation of the sixth order Butterworth filter appears below.
INTEGER ORDER
DIMENSION X(6),XOLD(6),XD(6)
DATA B1,B2,B3,B4,B5,B6,W0/3.86,7.46, 9.13,7.46,3.86,1.,50./
DATA XIN/1./
ORDER=6
W02=W0*W0
W03=W02*W0
W04=W03*W0
W05=W04*W0
W06=W05*W0
DO 10 I=1,ORDER
X(I)=0.
10CONTINUE
T=0.
H=.0001
S=0.
5 IF(T.GE..5)GOTO 999
S=S+H
DO 20 I=1,ORDER
XOLD(I)=X(I)
20CONTINUE
CALL INTEG(T,XD,X,W02,W03,W04,W05,W06)
DO 30 I=1,ORDER
X(I)=X(I)+H*XD(I)
30CONTINUE
T=T+H
CALL INTEG(T,XD,X,W02,W03,W04,W05,W06)
DO 40 I=1,ORDER
X(I)=(XOLD(I)+X(I))/2.+.5*H*XD(I)
40CONTINUE
IF(S.GE..004999)THEN
S=0.
WRITE(9,*)T,X(1)
ELSE
END IF
GOTO 5
999 CONTINUE
END
SUBROUTINE INTEG(T,XD,X,W02,W03,W04,W05,W06)
SAVE
INTEGER ORDER
DIMENSION X(6),XOLD(6),XD(6)
DATA B1,B2,B3,B4,B5,B6,W0/3.86,7.46,9.13, 7.46,3.86,1.,50./
DATA XIN/1./
ORDER=6
XD(1)=X(2)
XD(2)=X(3)
XD(3)=X(4)
XD(4)=X(5)
XD(5)=X(6)
XD(6)=W06*(XINB5*X(6)/W05B4*X(5)/W04
B3* X(4)/W03 B2*X(3)/W02B1*X(2)/W0X(1))/B6
RETURN
END
Listing 1  FORTRAN Simulation of Sixth Order Butterworth Filter
We can see from the listing that the integration step size, H, is .0001 sec and that the numerical integration method is second order RungeKutta. Since the simulation time is 0.5 sec, the ratio of the simulation time to the step size is 5000. This means that 10,000 passes are made to the integration subroutine. The extremely small integration step size was chosen to ensure accurate answers if the natural frequency of the filter was made much larger. The resultant filter transient response due to a step input (x=1), as obtained on any of the 8, 16 or 32 bit micros tested is shown in Fig. 3.
Fig. 2 Block Diagram of Sixth Order Butterworth Filter
Fig 3.
FORTRAN Comparison
So far it has been shown how the various microcomputers perform relative to one another in terms of single precision floating point multiplies and divides for the BYTE BASIC benchmark. The simulation of the sixth order Butterworth filter with the FORTRAN source code of Listing 2 is now solved on PCs representitive of the 8, 16 and 32 bit world and their running times are compared.
The machines used in this comparison are the original IBM PC, an improved PC, an IBM AT, a Macintosh and Prodigy 4 microcomputers. First we must determine if the relative speeds between computers for the BASIC benchmark is indicative of speed ratios for this more realistic simulation example written in the FORTRAN language. The performance of the machines are compared with and without math coprocessors. Table 2 presents the running time comparisons.
Table 2  FORTRAN Running Time Comparison
First we notice that the original IBM PC is very slow, relative to the other machines on the simulation example  much slower than was predicted by the BASIC benchmark. However newer versions of the 4.77 MHz, 8 bit IBM PC and clones are significantly faster and their speed relative to other machines is more accurately predicted by the BASIC benchmark. For example, the IBM AT is about twice as fast as the improved IBM PC and the Prodigy 4 is four times faster than the Macintosh as was predicted by the BASIC benchmark. Addressing the math coprocessor significantly improves the speed of both the Prodigy 4 and improved IBM PC. However addressing the math coprocessor on an IBM AT results in negligible speed improvement. The performance improvement for the IBM AT is not as significant because the math coprocessor operates at 4 Mhz while the machine is running at 6 Mhz. Figure 4 depicts the relative speed comparisons between machines relative to the original IBM PC for the simulation example. Here we can see that the 32 bit Prodigy 4 is nearly 35 times faster than the original 8 bit IBM PC. When the math coprocessor is addressed it is nearly 70 times faster. Clearly much has improved since the introduction of the first IBM PC 5 years ago.
The sample problem was also run in FORTRAN on two super minicomputers and one mainframe computer. The running times are summarized in Table 3.
In this table the running time for the larger machines corresponds to CPU time with a singleuser load on a timesharing system. Usually large machines are shared among many users and the CPU time is indicative only of what the user is charged for a session. In addition, on large machines the turnaround time (the elapsed time it takes the user to get the output) may be hours even though the CPU time may be in seconds. On a micro the CPU time is the turnaround time. Nonetheless, Table 3 indicates that the Prodigy 4 is only 2.4 times slower than the VAX/785 and 12 times slower than the mainframe. Considering that the Prodigy 4 costs about $6,000 whereas the VAX/785 is about $250,000 and the IBM/3084Q is several million dollars  the comparison is more impressive. More importantly, the sample problem could be solved on any of the machines in a very reasonable amount of time. In the case of the microcomputers  the user gets the answers immediately whereas on the larger machines the turn around time may be significantly longer.
The answers for the sample FORTRAN problem were for 6 differential equations, a simulation time of 0.5 sec and and integration interval of .0001 sec. Doubling the simulation time, doubling the number of differential equations or halving the integration interval will work in the direction of doubling the computer running time  regardless of computer. Figure 4 presents computational information so that an engineer can approximate the computer running time for a given problem. All that has to be done is to multiply the number of differential equations by the simulation time and divide by the integration step size in order to compute the abscissa of the figure. The computer running time in seconds is the ordinate.
For example the simulation of 100 differential equations in FORTRAN for a 10 sec simulation with an integration step size of .01 sec would require a run time of 1730 sec on the original IBM PC, 116 sec on an IBM AT, 24.7 sec on a Prodigy 4, 10.3 sec on a VAX/785 and 2.03 sec on an IBM/3084Q.
Fig. 4 Relative Speed Performance for Realistic Simulation Example.
Table 3  Micro, Mini Mainframe Running Time Comparison
Fig. 4 Computer Running Time for Given Problem
Languages
FORTRAN was the first high level language introduced on a large scale. It was first proposed in the 19531954 time frame by Jim Backus of IBM. The goal of this language was to demonstrate that a high order language could be developed which would produce programs as efficient as hand coded programs in machine language for a wide class of problems. The objective of FORTRAN at that time was to make programming on the IBM 704 computer much faster, cheaper and more reliable. Originally the FORTRAN language reflected the hardware constraints of the IBM 704. Due to the popularity of the language other computer manufacturers developed other versions of FORTRAN, tailored to their own hardware. By 1963 virtually all mainframe manufacturers committed themselves to some version of FORTRAN. Since that time FORTRAN has evolved eliminating undesireable features and adding some elements of structured programming.
PASCAL was developed in the late 1960's by Niklaus Wirth of Eidgenossische Technische Hochschule in Zurich. The principal aims of PASCAL were:
(1)  to make available a language suitable to teach programming as a systematic discipline based on certain fundamental concepts clearly and naturally reflected by the language (structured programming)
(2)  to develop implementations of this language which are both reliable and efficient on presently available computers.
The first PASCAL compiler was available in 1970. PASCAL has gained widespread acceptance as a teaching language for structured programming.
BASIC was originally developed in 1963 at Dartmouth College by John Kemeny and Thomas Kurtz. The language was originally designed for beginners with interactive computing in mind. By 1970 the language had grown so that it could handle the most sophisticated and complex applications. During the 1970's, as graphics devices became available, easytouse graphics commands were added to the language. When microcomputers first appeared, BASIC was the most popular language for implementation on them, because it was a clean and simple language.
Although, commonly implemented as an interpretive language, excellent compilers exist today for BASIC in both the IBM and Apple microcomputer world. Although generally scorned by professional programmers, BASIC continues to be the worlds most popular programming language because it is easy to use and is generally included, without cost, with each PC purchase, at least until the Macintosh came along!
C is a general purpose programming language. It has been called a "systems programming language" because it is useful for writing operating systems. It has been used in the microcomputer world to write major text processing and database programs. C is a relatively low level language when compared to either FORTRAN or PASCAL. This language stems from the language BCPL, developed by Martin Richards. The influence of BCPL on C proceeded indirectly through the language B which was written by Ken Thompson in 1970 for the first UNIX system on the PDP7.
For a given machine the choice of language makes a significant difference in the computer running time. Interpretive languages, such as BASIC, are slower than compiled languages such as FORTRAN. However interpretive languages, due to the friendly debugging environment, often reduce the program development time. If one has to develop a program quickly and only run it a few times an interpretive language like BASIC might be the best choice. If the program has many differential equations and will be run many times, a compiled language such as FORTRAN might be a better choice.
Several popular microcomputer languages were used on the sample problem of the previous section and the computer running times on a Macintosh are summarized in Table 4.
Table 4  FORTRAN is the Fastest Language
Surprisingly, although C is the de facto standard for commercial program development on microcomputers, it is nearly as slow as interpretive BASIC for problems involving floating point computation on the Mac. Similarly, although PASCAL is the standard in many educational institutions, it is very slow for problems involving the integration of differential equations. The slowness of C on the Macintosh for simulation type problems is due to floating point operations. Both C and PASCAL use the standard package provided by Apple for floating point operations, called SANE. However BASIC and FORTRAN developed their own software floating point programs. In these packages there was a reduction in accuracy at greatly improved speed. The accuracy of both BASIC and FORTRAN in single precision for the sample problem was sufficient to get the correct answers. The table also shows that when BASIC is compiled, it is faster than either C or PASCAL. In the case of the Macintosh, FORTRAN is clearly the language of choice when speed is an issue.
Of the Macintosh languages considered only FORTRAN for now can take advantage of the 68881 math coprocessor. Table 5 shows the running time for all the languages on a Prodigy 4 (Macintosh plus Levco board).
Table 5  Using Math Coprocessor Reduces Running Time on Prodigy 4 By Factor of 2
We can see that BASIC and FORTRAN reduced running times by a factor of 4 when the Prodigy 4 was used. The reduction of running time was due to doubling the bits and clock rate from the Macintosh computer. On the other hand, running times reduced more significantly for Pascal and C. The reason for this is that both Pascal and C use the Apple numerics library, SANE, for floating point computation. The Prodigy 4 intercepts calls to this library and reroutes them to the math coprocessor. It is hoped the new Macs will also have this ability.
Conclusions
The purpose of this paper was to demonstrate that problems involving the integration of differential equations can expect to be simulated to great advantage on the new Macintosh Workstations. Simulation speed can be enhanced by using a language built for floating point speed such as FORTRAN. The paper demonstrates that the 32 bit, 16 Mhz Macintosh with Levco board costing $6,000 can perform simulations in FORTRAN at approximately one half the speed of a $250,000 super minicomputer and approximately one tenth the speed of a multi million dollar mainframe. It is expected this same capability will be present in the next Macintosh generation, recently announced.
References
1 MacLennan, B. J., "Principles of Programming Languages: Design, Evaluation, and Implementation", Holt, Rinehart and Winston, 1983.
2 Karni, S. "Network Theory: Analysis and Synthesis", Allyn and Bacon, Inc., 1966.
Acknowledgements
I wish to thank John McClure and Stuart Adams for converting my spaghetti code to C and PASCAL. I am also indebted to Owen Deutsch for using his persuasive powers in getting the appropriate people to run my FORTRAN code on the VAX and IBM computers.
A Minimal 68020/881 Addon
by John R. Novy, Novy Systems Inc.
Most Macintosh upgrades using the 68881 coprocessor include 32 bit memory enhancements and processor speed improvements. Such an upgrade is the Levco Prodigy 4 which increases the CPU clock rate to 16 MHz and adds 32 bit wide memory. Cost effective improvements may be obtained from a minimal upgrade which does not include the features of either a 32 bit memory or increased processor speed. Applications which are floating point intensive are improved as much by the use of the coprocessor as by added memory or increased processor speed. This is shown by the Fortran benchmark in Paul Zarchan's article which documents an improvement of about 200% when using a coprocessor in the IBM PC or Prodigy 4.
Based on Paul's article we would expect a 68881 coprocessor, with no increase in processor speed, to show a 200% improvement in the Macintosh run time of 61 seconds. Using a Novy Systems MAC20 (with the 020 cache disabled, and not using Fortran/020) a runtime of 31 seconds was obtained, which is an improvement of about 200%. The cache was disabled to approximate 68000 performance. The Novy Systems MAC20 is an implementation of a 68020 running at 7.8 MHz with a 68881 and no additional memory. With the cache enabled the 68020/881 combination ran in 22 seconds for an improvement of about 300%. A run time of 17 seconds was obtained when Fortran/020 was used to duplicate the testing done on the Prodigy 4.
The advantage of a minimal system is, for many applcations, a throughput improvement of 200 to 400% at a much lower cost. The Novy Systems MAC20, at less than $750, can run simulations at approximately one half the speed of a maximum system upgrade for less than one sixth the price.