|Column Tag:||Jörg's Folder
Accuracy & Speed in FORTRAN Compilers
MacFortran II compared with Language Systems Fortran on the Quadra.
By Jörg Langowski, MacTutor Regular Contributing Author
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
Absoft has recently announced MacFORTRAN II version 3.1, which is supposed to run very fast on Quadras. Since I reviewed Language Systems Fortran not long ago, I was curious how big the speed difference between these two compilers really was. So again, I went back to my benchmark programs. Two of them are (in)famous: the Whetstone benchmark, which tests primarily integer arithmetic and depends a lot on the efficiency of subroutine calling, because it works by calling lots of small routines repeatedly from a loop; and the Linpack, which tests the efficiency of floating point matrix operations using a package of linear algebra routines which are quite useful by themselves.
Furthermore, I found a third useful program between the demos included on the LS Fortran diskettes: it is called Paranoia (mentioned already in V8#1), and checks the consistency of the floating point arithmetic. Things are checked like whether 3*4 equals 4*3, 1+0 =1, 1*0 = 0 etc. These may seem trivial to you, but as you will see, they arent when you are dealing with Fortran compilers.
Numerical accuracy tests
Id like to describe the Paranoia benchmark first, and its results on the two Fortran compilers. The program was original written in Basic and then ported to Fortran; the person who was responsible for the version I used is Richard Karpinski at the Computer Center of the University of California (San Francisco, CA 94143-0704). His description of the program follows:
===== PARANOIA =====
A test program that evaluates the quality of a numerics environment. Note that numerics quality depends on the compiler used as well as the underlying system hardware and software.
Paranoia is a rather large program, devised by Prof. Kahan of Berkeley, to explore the floating point system on your computer. These files are being redistributed as encouraged in the note copied below.
Paranoia.f single precision Fortran, 29 Jan. 1986
Dear Fellow Paranoid,
1. Please distribute these programs as widely as you like. Be sure to include as much help as possible. Please include these requests as well.
2. Please let me know if you can provide it on other media. Other potential users may be saved from retyping 2500 lines of code if you can provide the programs on a Whizbang 200 disk/tape/notched-stick. It is reasonable to ask money for this.
3. PLEASE let me know about any source changes that you made to make Paranoia work on your system. The exact model/version/date of your system is quite important in order to understand the changes. If you send the new version, please indicate where the changes occur. Machine readable is preferred.
4. Please send me your results. Note which version of Paranoia gave the results and what machine/language model/version etc. they apply to.
5. Suggestions and comments are always welcome.
Thank you for participating in this unique investigation of contemporary floating-point arithmetic. Your help is vital to the project.
Modula Assured Quality Software, 6521 Raymond Street
Oakland, CA 94609
ps: Send a stamped, self addressed envelope to the above address to get a sheet on the current status of the project at any time.
The program is too long to print here (104 K, enclosed in compressed form on the source code disk). Just to give you a feeling for what it does, here are some typical lines of code:
C... LOOK FOR SOME OBVIOUS MISTAKES
IF (0.0E0+0.0E0 .EQ. 0.0E0 .AND.
1 1.0E0-1.0E0 .EQ. 0.0E0 .AND.
1 1.0E0 .GT. 0.0E0 .AND.
1 1.0E0+1.0E0 .EQ. 2.0E0) GOTO 930
920FORMAT( FAILURE: violation of 0+0=0 or 1-1=0 or
1 1>0 or 1+1 = 2.)
You wouldnt believe that these expressions could be evaluated the wrong way, but there must be a reason why these tests are done
Less obvious are some other mistakes that are tested by Paranoia. For instance, the basic arithmetic operations should carry some guard digits, that is, some more digits that are the representation of the floating point number in memory. Also, they should be correctly rounded in a way that doesnt introduce inconsistencies. It is not trivial to design floating point arithmetic in such a way that roundoff errors do not affect the consistency of the result. This is seen when the Paranoia benchmark is compiled with the Language Systems (v3.0) and Absoft (v3.1.2) compilers, using various settings of the optimizations and other options.
First, the code produced by the Language Systems compiler, at any optimization level, does not produce any error messages at all, so the arithmetic seems to behave in a consistent way. The rounding conforms to the IEEE p754 standard.
Compiling the benchmark with Absoft MacFortran 3.1.2, with and without the basic optimizations, produces a lot of error messages, like these:
(1-u1)-1/2 < 1/2 is false, so this program may malfunction.
Multiplication lacks a guard digit violating 1*x = x .
Division lacks a guard digit violating x/1 = x .
Computed value of 1/1.00...001 is not less than 1 .
The errors are typical of floating point implementations where roundoff is not handled correctly; it seems to be mainly due to the fact that Absoft tries to keep intermediate values in registers as long as possible. These errors are not as serious as they look, since they are mainly deviations of one unit in the last significant digit of the single-precision numbers used in the test. Some of the error messages are actually a consequence of the fact that the program computes the machine precision at the beginning, from the minimum distance of two floating point numbers close to 1.0. The Absoft-optimized code without any further options comes up with a much too small value for this distance, and therefore some later tests, where this machine precision is used, report errors as well.
All these considerations may seem quite academic to you, because were really dealing with very small differences between the computed and the true result, of the order of 10-8 or so. But problems where the precise computation of a small difference between two large numbers is important are not that rare, and in those cases such errors will quickly blow up. It is for this reason that the IEEE standard for floating point arithmetic has been developed, and that Apple has implemented the SANE package which follows the IEEE proposal.
There is a remedy, however: one can force the Absoft compiler (with the -e option) to use extended precision in all subexpression calculations, and store values to memory from the registers and re-load them after every assignment. That seems to make the Paranoia benchmark work correctly, and - almost - no more errors are reported. Only now, there are some numbers x and y where x*y is not equal to y*x! (See listing - I reprinted the part of the Paranoia benchmark that reported the error). The only way around this error, and the way to make the benchmark run in a fully consistent fashion, is to compile the code non-optimized and with the -e option. In this case, however, Absoft MacFortran loses all its speed advantage over Language Systems Fortran.
Speed tests with Whetstone and Linpack
To compare the speed figures for the Whetstone and Linpack benchmarks, we should therefore use the Absoft compiler at least with the -e option, and maybe also drop the optimizations. Since now both LSF and ABF compilers exist in versions that support 68040 code, Ive compared the benchmarks both on a Mac IIx and on a Quadra 700. LSF was always run at the highest optimization level, since the arithmetic seems to be OK at that level; for Absoft, I used the basic optimizations (-O) with and without the extended precision option (-e).
Linpack (single precision)
LSF 030/882 code (Mac IIx) 0.12 MFlops
(Quadra 700) 1.32 MFlops
LSF 040 code (Quadra 700) 1.37 MFlops
ABF 030/882 code -O (Quadra 700) 1.30 MFlops
ABF 030/882 code -O -e (Quadra 700) 1.27 MFlops
ABF 040 code -O (Quadra 700) 1.60 MFlops
ABF 040 code -O -e (Quadra 700) 1.58 MFlops
Whetstone (single precision)
LSF 030/882 code (Mac IIx) 974 K
(Quadra 700) 3976 K
LSF 040 code (Quadra 700) 3984 K
ABF 030/882 code -O (Mac IIx) 1215 K
ABF 030/882 code -O -e (Mac IIx) 955 K
ABF 030/882 code -O (Quadra 700) 4051 K
ABF 030/882 code -O -e (Quadra 700) 3829 K
ABF 040 code -O (Quadra 700) 5864 K
ABF 040 code -O -e (Quadra 700) 5381 K
The essence of these figures is that there is no significant execution speed difference between Absoft and Language Systems Fortran on a Mac IIx, if you make sure that both generate arithmetically consistent code. On the Quadra, however, the fact shows up that Language Systems really creates almost the same code for 68030 and 68040 systems; no additional speed is gained using the -68040 option, while Absoft gains 24% on the Linpack benchmark and 40% on the Whetstone benchmark when 68040 code generation is switched on. All in all, Absoft code on the Quadra is 15% faster than LSF code for the Linpack and 35% for the Whetstone benchmark.
If you are running Fortran on anything other than a Quadra, LSF is really the only choice, given its excellent support of the Macintosh user interface, System 7 features such as AppleEvents, 100% VAX compatibility going into such details as the syntax of I/O statements, and its documentation. For pure numeric performance on a Quadra, Absoft still features faster execution. Still, Id like to see Absofts threaded math library for the 68040 incorporated into LSF; that would be my dream system.
For the Forth readers of this column, I recently got news from the creators of two of the best public domain Forth development systems for the Macintosh. You read in MacTutor, Vol. 8, No. 4, August 1992 issue about the update of Mops (2.1), the object-oriented Forth implemented by Michael Hore. He now sent me the tutorial that he is distributing together with Mops, a 340K Microsoft Word file.
Im not going to print the Mops Tutorial here, obviously. But if you need it and dont find it on any of the obvious sources (ftp from oddjob.uchicago.edu or sumex-aim.stanford.edu), drop me some bytes at firstname.lastname@example.org and Ill be happy to mail the file to you.
Another message came from Chris Heilman, who created Pocket Forth:
Subj: Apple Events in Pocket Forth
Hello again. How have things been going for you? I read your latest article in MacTutors June (congrats by the way on getting it back together) about Apple Events in LS fortran. Ive been working along exactly those lines.
Just two weeks ago I completed and released Pocket Forth 6. While release 5 was capable of handling high-level events, PF6 makes it easy and fun. Two new words, AE: and ;AE are used to define event handlers. Another word, ,S takes a 4 character token from the input stream and puts a 32 bit number on the stack. Use them like this:
,s misc ,s dosc AE: blah, blah blah, .... ;AE
This installs the dosc event handler into a list. The dictionary is then saved. The next time the program starts, it handles dosc events. Of course the four required events are handled automatically, but their actions can be changed at any time.
But wait, thats not all! Ive added SANE floating point for any numeric token with an E or decimal point, new icons, drag&drop file loading and a rewritten manual in TeachText format.
Id like to send you a copy, so expect a disk soon. Id email a copy to you, but our system has been choking on loooong mail lately, and Ive gotten real frustrated sending 200K files 3 or 4 times at 2400 bps.
Later, Chris Heilman
Im looking forward to seeing the new version of Pocket Forth. Expect some lines in this column when Ive reviewed it.
Example: check if multiplication commutes
Cvery short main program
SUBROUTINE COMMUT( NUMTRY)
REAL ULPPLS, ULPMIN
REAL FP0, FP1, FP2, FP3, HALF
REAL R9, X, X9, Y, Y9, Z, Z9
INTEGER I, NN
FP0 = 0
FP1 = 1
FP2 = FP1+FP1
FP3 = FP2+FP1
HALF = FP1 / FP2
ULPMIN = 5.96046440E-08
ULPPLS = 2.0*ULPMIN
2920 WRITE(*,2921) NUMTRY
2921 FORMAT(/ Does multiplication commute?,
1 Testing if x*y = y*x for, I4, random pairs:)
2930 R9 = SQRT(FP3)
I = NUMTRY + 1
X9 = FP0 / FP3
2960 CALL RANDOM (X, Y, X9, R9)
CALL RANDOM (X, Y, X9, R9)
IF (I .GT. 0 .AND. Z9 .EQ. FP0) GOTO 2960
2970 IF (I .GT. 0) GOTO 3000
IF (Z9 .NE. FP0) GOTO 3000
2990 FORMAT( No failure found in ,I4, randomly chosen pairs.)
WRITE(*, 3001) X9, Y9
WRITE(*, 3002) Z, Y, Z9
WRITE(*, 3003) NN
3001 FORMAT( DEFECT: x*y = y*x violated at x = ,E15.7,, y = ,
3002 FORMAT( x*y =,E15.7,, y*x =,E15.7,, x*y-y*x =,E15.7)
3003 FORMAT( ... pair no., I4)
SUBROUTINE RANDOM (X, Y, X9, R9)
REAL X, Y, X9, R9