TweetFollow Us on Twitter

Fine Tune MPW
Volume Number:6
Issue Number:7
Column Tag:Mac Workshop

Fine Tuning Code With MPW

By Allen Stenger, Gardena, CA

Tuning with MPW

[Allen Stenger works on the F-15 radar software for Hughes Aircraft Co. His technical interests are software reliability, computer architecture, and computer languages. Programming the Macintosh has been his hobby for the past five years.]

The Macintosh Programmer’s Workshop documentation does a good job of describing how to use the performance analysis tools, but it does not explain what to do with the results. This article is a tutorial on how to tune a program with MPW, using the performance reports as a guide to which areas of the program are most in need of improvement.

For the example program we will use a dragon-curve drawer. This is an example in the Smalltalk books, and Listing 1 is a straightforward translation of the Smalltalk-80 program into SemperSoft Modula-2. This straightforward program takes about 8 seconds (477 ticks) to draw the dragon. This is not extremely slow, but one imagines that it could be done faster. In fact, we will be able to get a 16-times speedup with some simple transformations, and by being a little trickier we will get a 22-times speedup.

This is an artificial example, for a couple of reasons. First, for any imaginable use to which DrawDragon might be put, its present performance is adequate--there is no practical value to speeding it up. Second, the MPW tools would normally be applied to much larger programs, and are in some ways an overkill for this simple example.

Speculations

We will start out with some speculations on the causes of the slowness of the straightforward dragon. After doing the tuning we will revisit these and see how good our intuition was.

From inspecting the code, one might generate a list of possible “problem areas” such as the following:

Use of recursion. According to the folklore of programming, recursive programs are slow, and certainly the Dragon procedure seems to spend most of its time calling itself rather than doing any useful work.

Use of high-order language. Traditionally, in order to get the highest performance, one must write in assembly language. The SemperSoft compiler is a one-pass compiler, and presumably does little optimization.

Use of range-checking. This example is compiled with checks on, which in some programs can cause a significant slowdown.

Use of floating-point. This example was run on a Macintosh Plus, which has no floating-point hardware and does all floating point in software.

Graphics operations. These typically take a lot of CPU time. (There might not be much we can do about this one, since the program’s whole purpose is to draw on the screen.)

Tuning -- Preparation

We can instrument the dragon program following the recipes in the MPW manual, Chapter 14 (the method is the same regardless of language, although the interface details vary slightly). Here we will only change the calling program, DrawDragon, and leave the other modules undisturbed. Since the performance measurements are done by random sampling of the program counter, the whole program will be instrumented merely by starting the measurement at the beginning of the run and ending it at the end - it is not necessary to modify each routine. The instrumented DrawDragon module is shown in Listing 2.

The performance analyzer yields an execution “profile” of the program, characterizing it by where it spends its time. It is usually only helpful to look at the top five routines - everything below that takes such a small amount of time that, even if we could eliminate a routine totally, it would have a negligible effect on the timing. Here, on our first run, we need only look at the #1 routine, since it takes 72.6% of the time (total time for the program is 493 ticks). This is the Pack4 routine, which is not in the Dragon program -- it is the SANE floating-point routines, as may be looked up in Inside Macintosh. So our first tuning step is to reduce the amount of time spent doing floating-point work.

Tuning -- First Pass

Some people (particularly FORTH fans) advocate doing no floating point at all, but using instead scaled integer arithmetic. For most programs, eliminating floating point altogether would be a drastic operation. For our little DrawDragon program it is not so bad, but let’s see what else we could do first. Unfortunately the report does not break down the floating-point operations used, but by examining the code we see that each line drawn requires 2 integer-to-float conversions, 2 float-to-integer conversions, 3 floating-point multiplies, and 1 each sine and cosine. It seems likely that most of the floating-point time is spent on the sine and cosine, with the multiplies coming in a distant second. Further examination of the program reveals that sine and cosine are called with only 4 possible arguments (0, 90, 180, and 270 degrees), so this suggests the use of a look-up table storing the values of sine and cosine for these arguments.

Making this change and re-running the program and reports shows that this does cause a large speedup - the timing is now 86 ticks, or 5.7 times faster. The top item on the performance report is still Pack4 with 41.6%, so our next step is to try to eliminate still more floating-point operations.

Tuning -- Second Pass

After scrutinizing the improved program some more, we may remember that the reason we put in the sines and cosines in the first place was to handle drawing of lines at arbitrary angles. For the 4 angles that we actually use, the sines and cosines have integer values, so (for our particular case) there is no need for floating point anyway! We therefore make a simple re-coding of the Go procedure to do all calculations as integers. (Since the routine is now getting faster, there are not enough program-counter samples to get a good breakdown of where the time is spent, so we will also draw the dragon 10 times in a loop and divide by 10 to get the time per dragon.) Upon re-running this, we find that the time has decreased to 30 ticks (2.9 times faster), and the top 5 routines in the performance report are all QuickDraw routines and comprise 73.2% of the time. Further improvement seems to depend on inventing some way of drawing lines faster than QuickDraw can, which seems a dim hope.

Tuning -- Third Pass

Surprisingly, this is possible, in the special case we are dealing with (which has only horizontal and vertical lines). By trying some separate timing tests, it appears that our lines can be drawn in 1/3 less time by using FillRect rather than Line. Implementing this leads to a program which runs in 22 ticks (1.4 times faster), for an overall improvement over the straightforward Dragon of 22:1. This speedup requires changes to only one routine, Go. The revised Pen module is shown in Listing 3.

Speculations Inspected

The big winners among our speculations were floating point and graphics operations. Floating point took up most of the time in the original program, and by eliminating it we were able to get most of the speedup. Graphics operations take up most of the remaining time, and although we made some reduction in this, it seems unlikely that there is much more we can do. Possibly we could do something sneaky such as building the whole picture in memory without using graphics operations, then copying it to the screen with CopyBits; but this would be very complicated and perhaps not much of an improvement. (One of the hardest parts of tuning is knowing when to stop. Most programs can be improved indefinitely, but they gradually get more complicated and error-prone.)

The issue of recursion turned out to be a red herring. The total time spent in the Dragon procedure is only 3.9% of the total, and this includes the recursive calls. Recursion has a bad reputation, for two reasons. First is the bad recursion examples often given in textbooks (usually factorials or Fibonacci numbers), which can be done more simply and faster by iteration. Second, in some older languages and implementations, either there is no recursion in the language, so it must be simulated by the program (FORTRAN), or it is treated as a special case (JOVIAL, PL/I) and really is much slower because it requires a special dynamic allocation of variables rather than the normal static allocation. In more modern languages such as Pascal or Modula-2, all variables are dynamically allocated and recursive calls are no different (and therefore no slower) than other calls.

The impact of using a high-order language or of range checking varies a great deal depending on the program and on the language implementation. In this example most of the time was taken by things outside the generated code, which seems to be inherent in the particular problem being solved and not likely to be affected much by the implementation language. In the final version of the program, the Modula-2 routines take a total of only 13.3%, so re-writing in assembly or taking out the range checks could not possibly save more than this. The typical Macintosh application tends to consist mainly of calls to the Toolbox, interspersed with occasional calculations, so the timing of a tuned program tends to be dominated by the Toolbox time and does not depend greatly on the implementation language. For programs such as spreadsheets and compilers this is not true, since they really do spend a great deal of time on internal calculations, but it is true of most applications.

Conclusion

By applying the MPW performance analysis tools and making the improvements suggested by the program profile, we were able to speed up the dragon-drawer 22 times. The value of the execution profile is in telling us what not to look at. Most of a program’s execution time is spent in a few very places, and we should concentrate our efforts on those few places. (In Quality Control this is called the principle of the “vital few and the trivial many.”) Profiles are more valuable for large programs, since there is more to ignore there. In any case, success in tuning depends on measurement.

For Further Reading

Jon Louis Bentley, Writing Efficient Programs. Prentice-Hall, 1982. An excellent book on tuning; of value both to the professional programmer and the hobbyist. Our Dragon example illustrates his principles Space-For-Time Rule 2 -- Store Precomputed Results (sine and cosine tables), and Procedure Rule 2 -- Exploit Common Cases (eliminate floating-point since not needed in our case, and use FillRect for faster horizontal and vertical line drawing). He gives many other principles which we did not have a chance to use. In the beginning of the book he goes through 10 steps of optimizing an Approximate Travelling-Salesman Tour to get an overall improvement of 17:1.

Adele Goldberg and David Robson, Smalltalk-80: The Language and Its Implementation. Addison-Wesley, 1983. The source of the Dragon program (on pp. 372-3).

Donald E. Knuth, “An Empirical Study of FORTRAN Programs”, Software--Practice and Experience, v. 1 (1971), pp. 105-133. Source of the term “profile.” This article is written from the viewpoint of a compiler developer, and considers several levels of optimization which can be applied to programs. Many examples of optimization.

Mike Morton, “Faster Bitmap Rotation”. MacTutor, v. 4 (1988), no. 11, pp. 86-90 (reprinted in The Definitive MacTutor, pp. 56-60). Another example of tuning, using only the total time of the routine as a measure and yielding a speedup of 3:1. Morton was careful to measure each attempted improvement to the routine; the exact method of measurement is not important, but you must measure.

Stephen Dubin, Thomas W. Moore, and Sheel Kishore, “Using Regions in Medicine with C”. MacTutor, v. 3 (1987), no. 10, pp. 27-31 (reprinted in The Essential MacTutor, pp. 146-150). Description of re-coding a routine to calculate the area of a region from Pascal or C to assembler, yielding a 1000:1 speedup.

James Plamondon, “Finding The Area Of A Region in C” (letter). MacTutor, v. 4 (1988), no. 4 (reprinted in The Definitive MacTutor, p. 699). Criticizes the previous reference for solving the wrong problem (i.e., working very hard to get a very good estimate of the area of a region drawn freehand). Gives a faster C routine for doing a less-accurate but adequate estimate. In most applications you have some leeway in the problem you solve, and you can use this to make the solution easier and faster. We did not take advantage of this in the DrawDragon program; one speedup we might have applied is to draw the line only one pixel wide (which is about twice as fast as drawing it 4 pixels wide). Bentley (reference above) also considers this in his Approximate Travelling-Salesman Tour -- since it is only an approximate solution anyway, there is little harm in going from floating-point to slightly less accurate scaled fixed point numbers.

Source Listings

Listing 1 - Straightforward Dragon

(******************************************************)
(*   *)
(* file:  DrawDragon.m      *)
(*   *)
(* Main program to test Dragon method.   *)
(*   *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(*   *)
(* Allen Stenger August 1989    *)
(*   *)
(******************************************************)

MODULE DrawDragon;

FROM InOutIMPORT WriteLong, WriteString, Read;
FROM InsideMac   IMPORT TickCount;
FROM DragonModuleIMPORT Dragon;
FROM PenIMPORT Home;

VAR
 oldTime,
 newTime: LONGINT;
 ch     : CHAR;

BEGIN
 oldTime := TickCount();
 
 Home;
 Dragon( 8 );
 
 newTime := TickCount();
 WriteString( “Run time is “ );
 WriteLong( newTime - oldTime, 6 );
 WriteString( “ -- press space to exit “ );
 Read( ch );
END DrawDragon.

(******************************************************)
(*   *)
(* file:  DragonModule.d    *)
(*   *)
(* Implementation of Dragon method from                    *)
(* Goldberg and Robson,     *)
(* Smalltalk-80:  The Language and Its   *)
(*   Implementation, pp. 372-3.      *)
(*   *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(*   *)
(* Allen Stenger August 1989    *)
(*   *)
(******************************************************)

DEFINITION MODULE DragonModule;

PROCEDURE Dragon( order : INTEGER );

END DragonModule.

(******************************************************)
(* file:  DragonModule.m    *)
(* Dragon method - see definition module for         *)
(* description.    *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(* Allen Stenger August 1989    *)
(******************************************************)

IMPLEMENTATION MODULE DragonModule;

FROM PenIMPORT Go, Turn;

PROCEDURE Dragon( order : INTEGER );
BEGIN
 IF order = 0
 THEN 
 Go( 10 );
 ELSE
 IF order > 0
 THEN
 Dragon( order - 1 );
 Turn( 90 );
 Dragon( 1 - order );
 ELSE
 Dragon( -1 - order );
 Turn( -90 );
 Dragon( 1 + order );
 END; (* IF *)
 END; (* IF *)
END Dragon;

BEGIN
END DragonModule.

(******************************************************)
(* file:  Pen.d    *)
(* Implements some of the methods for the Pen class  *)
(* in Smalltalk-80.  *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(* Allen Stenger August 1989    *)
(******************************************************)

DEFINITION MODULE Pen;

(*
 Go in the current direction the specified distance
 (units of pixels).
*)
PROCEDURE Go( distance : CARDINAL );

(*
 Change the current direction by turning degrees 
 (positive degrees = clockwise).
*)
PROCEDURE Turn( degrees : INTEGER );

(*
 Move to original pen position.
*)
PROCEDURE Home;

END Pen.

(******************************************************)
(* file:  Pen.m    *)
(* Pen methods - see definition module for                 *)
(* descriptions.   *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(* Allen Stenger August 1989    *)
(******************************************************)

IMPLEMENTATION MODULE Pen;

FROM Terminal  IMPORT ClearScreen;
FROM InOutIMPORT Write;
FROM MathLib0  IMPORT sin, cos, entier, real, pi;
FROM InsideMac IMPORT Line, MoveTo, PenSize;

CONST
 DegreesToRadians = pi / 180.;
 (* conversion factor *)

VAR
 currentDegrees  : [0..359];
 (* direction we are
 facing -- 0 = right. *)

PROCEDURE Go( distance : CARDINAL );
VAR
 currentRadians  : REAL;
 realDistance  : REAL;
BEGIN
 currentRadians := DegreesToRadians 
 * real( currentDegrees );
 realDistance := real( distance );
 Line(  entier( realDistance 
 * cos( currentRadians ) ),
 entier( realDistance 
 * sin( currentRadians ) )
 );
END Go;

PROCEDURE Turn( degrees : INTEGER );
VAR
 tempDegrees:  INTEGER; 
BEGIN
 tempDegrees := INTEGER(currentDegrees) + degrees;
 DEC( tempDegrees, 360 * ( tempDegrees DIV 360 ) );
 IF tempDegrees < 0 
 THEN INC( tempDegrees, 360 );
 END; (* IF *)
 currentDegrees := tempDegrees;
END Turn;

PROCEDURE Home;
BEGIN
 MoveTo( 110, 200 );
 currentDegrees := 270; (* facing up *)
END Home;

BEGIN
 ClearScreen;
 Write( 0C );  (* graphics initialization kludge *)
 PenSize( 4, 4 );
END Pen.
Listing 2 - Instrumented DrawDragon module
(******************************************************)
(* Main program to test Dragon method.   *)
(* This version includes MPW performance analyzer    *)
(* calls.   *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(* Allen Stenger August 1989    *)
(******************************************************)

MODULE DrawDragon;

FROM InOutIMPORT WriteLong, WriteString, Read;
FROM InsideMac   IMPORT TickCount;
FROM PerformIMPORT InitPerf, PerfControl, 
 PerfDump, TermPerf, 
 TP2PerfGlobals;
FROM DragonModuleIMPORT Dragon;
FROM PenIMPORT Home;

VAR
 oldTime,
 newTime: LONGINT;
 ch     : CHAR;
 thePerfGlobals : TP2PerfGlobals;
 junk : BOOLEAN;

BEGIN
 (* Initialize performance measurement *)
 thePerfGlobals := NIL;
 IF InitPerf(  
 thePerfGlobals, (* measurement block *)
 20,    (* sample interval *)
 8,(* bucket size *)
 TRUE,  (* measure ROM *)
 TRUE,  (* measure application *)
 “CODE”,(* resource type to 
 measure *)
 0,(* ROM ID *)
 ‘’,    (* ROM name *)
 FALSE, (* measure RAM misses *)
 0,0,0  (* for RAM misses *)
 )
 THEN
 ELSE WriteString( “Initialization failed” );
 END; (* IF *)
 
 (* Start performance measurement *)
 junk := PerfControl( thePerfGlobals, TRUE );
 
 oldTime := TickCount();

 Home;
 Dragon( 8 );
 
 newTime := TickCount();
 
 (* End performance measurement *)
 IF 0 = PerfDump(  thePerfGlobals,
 “DrawDragon.dump”,(* dump file for 
 results *)
 FALSE, (* histograms *)
 0 (* histograms *)
 )
 THEN
 ELSE WriteString( “PerfDump failed” );
 END; (* IF *)
 TermPerf( thePerfGlobals );
 
 WriteString( “Run time is “ );
 WriteLong( newTime - oldTime, 6 );
 WriteString( “ -- press space to exit “ );
 Read( ch );
END DrawDragon.
Listing 3 - Tuned Pen module
(******************************************************)
(* file:  Pen.m    *)
(* Pen methods - see definition module for                 *)
(* descriptions.   *)
(* This version contains improvements suggested by   *)
(* running MPW performance analysis tools.                 *)
(* Written in SemperSoft Modula-2 v.1.1.2                  *)
(* Allen Stenger August 1989    *)
(******************************************************)

IMPLEMENTATION MODULE Pen;

FROM Terminal  IMPORT ClearScreen;
FROM InOutIMPORT Write;
FROM InsideMac   IMPORT FillRect, SetRect;
FROM InsideMac   IMPORT black, Rect;

VAR
 currentDegrees  : [0..359];
 (* direction we are
 facing -- 0 = right. *)
 currentH,
 currentV : CARDINAL;(* where situated *)
 
PROCEDURE Go( distance : CARDINAL );
CONST
 LineSize = 4;
VAR
 lineRect : Rect;
BEGIN
 CASE currentDegrees OF
 0:SetRect( lineRect, currentH, currentV, 
 currentH + distance + LineSize,
 currentV + LineSize ); 
 INC( currentH, distance ); |
 90:    SetRect( lineRect, currentH, currentV, 
 currentH + LineSize,
 currentV + distance + LineSize
 ); 
 INC( currentV, distance ); |
 180: SetRect( lineRect, currentH - distance,
 currentV, 
 currentH + LineSize,
 currentV + LineSize ); 
 DEC( currentH, distance ); |
 270: SetRect( lineRect, currentH, 
 currentV - distance, 
 currentH + LineSize,
 currentV + LineSize ); 
 DEC( currentV, distance ); 
 END; (* CASE *)
 FillRect( lineRect, black );
END Go;

PROCEDURE Turn( degrees : INTEGER );
VAR
 tempDegrees:  INTEGER; 
BEGIN
 tempDegrees := INTEGER(currentDegrees) + degrees;
 DEC( tempDegrees, 360 * ( tempDegrees DIV 360 ) );
 IF tempDegrees < 0 
 THEN INC( tempDegrees, 360 );
 END; (* IF *)
 currentDegrees := tempDegrees;
END Turn;

PROCEDURE Home;
BEGIN
 currentH := 110;
 currentV := 200;
 currentDegrees := 270; (* facing up *)
END Home;

BEGIN
 ClearScreen;
 Write( 0C );  (* graphics initialization kludge *)
END Pen.

 
AAPL
$570.56
Apple Inc.
+13.59
MSFT
$29.11
Microsoft Corpora
-0.65
GOOG
$609.46
Google Inc.
+8.66
MacTech Search:
Community Search:

Fruit Ninja Gets New Update With Powerup...
Fruit Ninja is about to get its biggest update yet to celebrate its second anniversary on Thursday, May 24th. The key new element in the game appears to be that players will now be able to earn an in-game currency, called starfruit, that can be used... | Read more »
Fotor – CameraBag Review
Fotor – CameraBag Review By Jennifer Allen on May 23rd, 2012 Our Rating: :: PLENTIFULiPhone App - Designed for the iPhone, compatible with the iPad A photography app that wants to be able to do everything that could ever be asked... | Read more »
playGO AP1 is the Next Generation of Aud...
With all of Apple’s relatively recent success in the smartphone and tablet market, we can forget sometimes that what kicked off their modern dominance was a device that simply played music. BICOM, Inc. has been recognizing how important music is to... | Read more »
Monkey Pong Review
Monkey Pong Review By Angela LaFollette on May 23rd, 2012 Our Rating: :: BALL BUSTING ACTIONiPhone App - Designed for the iPhone, compatible with the iPad Help the hungry monkey reach all the fruit by bouncing a ball in this family... | Read more »
Heroes & Generals Enters Closed Beta
Creators of Hitman, Roto-Moto, has launched a closed beta of their game, Heroes & Generals. The game is a massively multiplayer first-person shooter involving online fighting between the Axis and Allied forces in Europe. | Read more »
FeedFriendly Review
FeedFriendly Review By Angela LaFollette on May 23rd, 2012 Our Rating: :: EASY TO USEUniversal App - Designed for iPhone and iPad Combine the top three social network newsfeed updates into one location with the help of FeedFriendly... | Read more »
Favorite 4: Euro 2012 Apps
In a matter of weeks, one of the biggest soccer tournaments out there begins: Euro 2012. Qualification is over and 16 European teams are all lined up to prove which one is the best of the bunch. As a Brit, I’m ever hopeful that England will achieve... | Read more »

Price Scanner via MacPrices.net

Are You Sure You Really Want A Retina Display MacB...
Apple didn’t invent the laptop computer, but over the past 21 years they’ve continuously set and reset the bar for laptop innovation and engineering advances, with PC competitors mostly playing catch... Read more
Two PC Pundits Weigh In On PC To Mac Switching (Or...
ZNet’s Stephen Chapman and Forbes’ Brian Caulfield have posted recent blogs on the topic of their personally switching from Windows PCs to Macs. From PC to Mac 10-Months Later ZNet blogger Stephen... Read more
Apple Maintains Top Mobile PC Share in Q112 on Str...
Apple shipped nearly 17.2 million mobile PCs in Q112, accounting for 118% year-over-year shipment growth, according to preliminary results from the latest NPD DisplaySearch Quarterly Mobile PC... Read more
Apple offering refurbished 17″ MacBook Pros for $3...
 The Apple Store has Apple Certified Refurbished 17″ 2.4GHz MacBook Pros available for $2119 including free shipping. That’s $380 off the price of new models. Apple’s one-year warranty is standard. Read more
Week’s Best MacBook Deals
We’ve posted the Week’s Best Deals on MacBook Airs and MacBook Pros for Wednesday, the 23rd of May. Find the lowest price or the best set of bundles from Apple’s Authorized Resellers with these deals... Read more
MacBook Airs on sale for up to $101 off MSRP, free...
 Adorama has MacBook Airs on sale today for up to $101 off MSRP including free shipping. NY and NJ sales tax only. Their prices are among the lowest available for these models from any Apple... Read more
Open-box special: 2.3GHz Mac mini for $493
MacMall has open-box return 2.3GHz Mac minis available for $493 including free shipping. That’s $106 off MSRP. Apple’s one-year warranty and all materials are included. Act now if you’re interested,... Read more
Apple iPhone Charger’s Secrets And Engineering Sup...
Blogger Ken Shirriff’s has posted a thoroughgoing Apple iPhone charger teardown and analysis, the one-line takeaway being: “quality in a tiny expensive package.” Shirriff says that disassembling... Read more

Jobs Board

*Apple* Solutions Consultant-Retail Sal...
Requisition Number 15545402 Job title Apple Solutions Consultant-Retail Sales Location Mobile Country United States City Mobile State Alabama Job type Job description Read more
iPhone Developer at Mastech (Los Angeles...
We are currently seeking an Android/ iPhone Developer for our client in the Insurance domain. We value our professionals, providing comprehensive benefits, exciting challenges, and the opportunity... Read more
24 funny 2d Charaters for iPhone game. a...
We are developing an iPhone game and desire to have 24 characters drawn to our specification. Attached is the detailed spec. Desired Skills: Cartoon, Illustration Read more
*Apple* Solutions Consultant-Retail Sal...
Requisition Number 15545261 Job title Apple Solutions Consultant-Retail Sales Location Spanish Fort Country United States City Spanish Fort State Alabama Job type Job Read more
Android and Iphone Application at Elance...
I need an interval timer application to be created for iphone and android platforms... I am on a tight budget but this ... & IPHONES) not just one so if you can only do one don't waste your time... Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.