|Column Tag:||Programmer's Spotlight
Andy Hertzfeld on QuickerDraw
By Chester Peterson Jr., Reporter-At-Large, Lindsborg, Kansas
The Story of QuickerDraw
QuickDraw, the imaging program used on the Mac, doesnt always live up to its name when used on the Mac II.
Incredibly fast and wonderfully crisp at one bit per pixel, it bogs down to something that could be more aptly described as SlowDraw at eight bits per pixel. The normal subtle responsiveness of the Mac II suffers.
Actually, in the eight bit per pixel mode it almost feels like youre using your Mac II under water, is how Andy Hertzfeld describes the action--or lack thereof.
Hertzfeld is, of course famous in Mac circles as the man responsible for much of the Macs Operating System and design of the Toolbox.
So, in late December he decided to satisfy his curiosity about the QuickDraw graphics routines and how they were coded. This is really easy to do, he says.
Just get the Mac II to do the graphics operation in which youre interested, and then randomly hit the interrupt button. This will interrupt it statistically in the place its executing the most--the inner loop.
What Hertzfeld discovered was that the inner loops werent optimally coded. His initial strategy was to move the entire QuickDraw into RAM. He wrote an INIT that moved 60k of the ROM out into RAM where he could patch it.
And, although there were some problems with that, Hertzfeld got it working. But, as he progressed, disassembling to the bottom of the system, he saw this wasnt really necessary.
The reason: Apple had the foresight to have this low-level jump table that all the inner loops are bottle-necked through. All he had to do was replace addresses in the little memory jump table to take over the inner loops in a clean way.
So, once I saw that, I thought, Hey, this could be a project worth looking into, Hertzfeld recalls.
And, the more I got into it, the more I was able to find ways to increase the speed of QuickDraw. I ended up improving the speed of some important operations by a factor of three or so, ending up with QuickerDraw., or as Apple has called it in release 6.0 of the operating system, QuickerGraf.
Something that confuses people and which he thinks is important is that the performance increases are anything but flat. Instead its a spiky curve, with some things speeding up a whole lot and others not at all.
The explanation is that the speed-ups are both case dependent and also data dependent. Depending on exactly what youre doing, youll get different responses.
My point is that the speed-ups arent uniform, Hertzfeld points out. Apple has some of the code, such as when you say either EraseRect or PaintRect with black, that are already fairly well optimized. I wasnt able to improve them only because theyre already about as good as they can be.
But, if you take PaintRect with a color that isnt black or white, then it goes to a different loop that wasnt well done. Heres where I was able to improve speed by that factor of three.
Hertzfeld believes that the most important item in the graphical programmers bag of tricks is special casing. In other words, certain instances of a particular problem are easier to handle than are other instances.
So, he thinks that when speed isnt important that a programmer should try to fold his cases to write as little code as possible to handle the entire situation.
But, when speed is essential, as it is in the QuickDraw routines, the opposite approach must be used, he says. This involves picking off all the different cases and seeing if you can handle each case a little faster.
A compromise Apple made on its standard graphics card was that it has to support one bit, two bits, four bits, and eight bits per pixel.
A lot of the QuickDraw routines were coded in such a way that they were common for four different screen formats, according to Hertzfeld.
I was able to special case the eight bit per pixel case , because thats the only one thats really important from a performance point of view, he says.
While Apple used rather slow bit-field instructions, I used special cache code to take advantage of the faster addressing modes in the 68020 to do things faster.
I also saved some registers doing that, registers that the Apple code uses just for maintaining which bits per pixel are to be used. Freeing up these for other things allowed me to go faster.
Hertzfeld also took advantage of the principal of locality. He defines this as meaning whatever youre doing, its pretty likely you just did the same thing a short time ago.
He exploited this in producing QuickerDraw through the use of caches. In the computationally intensive parts of QuickDraw like the arithmetic transfer modes, he put in caches that say, Hey, this is just the same as what I saw before--I dont have to do all the work again, because Ive already figured out the answer.
Hertzfeld used this technique in the instance of copy bits to two different pix maps that have different color look-up tables, a common thing on the Mac II with digitized images.
Each digitized image would have its own color look-up table that wouldnt be identical to the one on the screen.
When you do a copy bits, it has to do a mapping operation, taking each pixel and looking it up in a table to find the correct pixel in destination bit map.
Hertzfeld changed this so that long word maps are remembered, short- circuiting the memory references involved in doing the look-up. He used similar techniques in many places to gain significant speed-ups.
You want to hit memory as little as possible, he advises. A lot of the Apple loops were doing essentially one memory reference per pixel.
My routines always do one memory reference per long word. Why? Because the 68020 is capable of pulling in 32 bits just as quickly as it can pull in eight bits at a time.
The Apple routines makes it a little easier to code just accessing memory eight bits at a time, while Hertzfeld accesses memory 32 bits at a time, spinning it around in the registers and mailing it faster.
You just attempt to be as clever as possible when youre trying to code, he says. This is interesting code to write, because it has an unusual sort of design criterium.
With most code in normal circumstances youre always balancing the twin trade-offs between speed and space, or as Hertzfeld puts it, trying to serve two masters while producing the nicest code possible.
But, the interesting thing about the QuickerDraw code he wrote is that space isnt a consideration. He says the system spends so much time in the QuickDraw inner loops that he did everything to make them go faster. He used a different coding style that also made it a little more interesting and fun.
Like, for example, I did everything possible to avoid a subroutine call in the inner loops, Hertzfeld explains. You copy 50 in-line instructions, because its worth it in the context of the inner loop.
Hertzfeld also devised another creative and interesting technique to speed up QuickDraw, something he calls region counting.
As I was speeding up QuickDraw, I was just a little bit disappointed that I wasnt getting as much speed-up as I would have liked when I was clipping to regions, he says.
What I then realized is that the region mask doesnt change much from scan line to scan line.
The other thing to notice is an eight bit per pixel region mask is eight times as long as it would be in one bit per pixel, or eight times as likely to be homogeneous, Hertzfeld observes.
If you pick up a long word of the region mask its extremely likely that it will be all ones or all zeros. Hertzfeld started special casing the region mask.
He found that normally when masking you have to do something like a seven-instruction sequence that involves three memory references to plot a long word with a mask. But, if it turns out the mask is all zeros you dont have to do anything, because its all going to be masked out.
You dont even have to hit memory at all, just skip over it. If the mask is all ones, you can just use one store instead of having to read it back and do the coding in order to accomplish the masking.
So, he began special casing that way. And, even though the tests cost him a little, he still won enough to make it worthwhile, because the region mask does tend to be homogeneous. The result: A 40 percent speed increase from that special casing of the region mask.
Then as Hertzfeld was looking at the region mask as it went by, he began counting up runs in it so it could remember how many successive long words in a row were all zeros or how many successive long words in a row were all ones.
If the region mask doesnt change from scan line to scan line, which it doesnt more than 90 percent of the time, I dont have to fetch it. As a matter of fact, if its all masked out at the beginning I can just skip over it, he observes.
Where the Apple routines were pulling a long word from memory, then sticking the same long word back, Hertzfeld just skipped over all that.
Hes proud of this original technique of region counting for obtaining a tremendous speed increase when things are heavily clipped.
Contrary to a misconception about its size, the QuickerDraw memory resident code is only approximately 10k. And, half of that is devoted to the arithmetic transfer modes that arent used too often.
The QuickerDraw file is 27k, but that includes logo resources. The nice colored picture that it comes up with is 12k alone.
Incidentally, the arithmetic transfer modes were introduced with the Mac II and are only relevant to color. Most applications dont use them yet.
Hertzfeld accomplished his QuickerDraw core work in a two-week period between this last December 22 and January 7. It then became apparent that Apple was interested in his acceleration of QuickDraw.
Hertzfeld realized that if he was truly producing a speed-up, then hed also have to address the arithmetic transfer modes. A second two-week burst of work got these speed-ups implemented, too.
The bottom line: QuickerDraw involves no change in the architecture of QuickDraw. Instead, view it as implementing a high performance tune-up of Apples standard.
Hertzfeld signed a non-exclusive contract with Apple for QuickerDraw in February, accepting less money so he could upload it to CompuServe and distribute it on his own.
Apple will incorporate QuickerDraw in its next release file 6.0, due out at the end of May.
Although there are a few cases that I didnt handle, I do think Im pretty close to the optimal plotting speed of QuickDraw, Hertzfeld comments. I basically just re-implemented the inner loops so they were more efficient.
There will be no need to further refine QuickerDraw for the 68030. This is because it has an instruction set identical to the 68020s.
The things that will make Apple change QuickDraw next are the architectural issues such as scaleable fonts and resolution independent display routines--basically catching up with Display Postscript, Hertzfeld thinks.
Hed like to see Apple offer both an enhanced QuickDraw and Postscript so applications programmers could select their choice for both screens and printers.
The Macintosh would be better off if it could have both. And, I also think it would be a little less risky for Apple than to continue trying to develop on their own all the things that Postscript does so well, Hertzfeld says.
In the meantime, my QuickerDraw tune-up will make graphics production easier and faster on the Mac II.
Hertzfeld on Creativity
Is computer programming creative, creative in the sense as producing a masterpiece painting or writing a best-seller?
Absolutely! Hertzfeld states.
There are two different types of programming creativity, though, he advises, and both are equally important in a good programmer.
The first sort of creativity is involved in initially picking the right area and then the right problem on which to work. This involves thinking about what the users really need that will help them the most.
Then theres the actual writing of code and choosing instructions which can be as individualistic as any painting or writing style, he says.