Jul 94 Dialogue Box
|Column Tag:||Dialogue Box
By Scott T Boyd, Editor
In The Cornfield == In The Weeds?
You raised many fine points about the need (or lack thereof) to write assembly language for Power PC. I would have appreciated more detail to illustrate the points you made. For instance, I would like to know what the two lines of code that sped up the search algorithm by a factor of three did, in general. Also, a simple example of Power PC code you saw rewritten by someone else where the rewrite ran slower. Or, an example where a Power PC rewrite looks faster but did not run faster. Some of your points raised big question marks for me.
The article as a whole appeared directed toward the C programmer, but very little of how to write C that is compiled to good Power PC code was addressed.
A cursory examination of the PowerPC instruction set shows that it is not a C machine; there are instructions that do not match well to C, as well as C constructs that do not map well to PowerPC instructions. For instance, the expression: unsigned char c = f; where f is a float compiles to over 20 instructions on an RS6000 using the xlc compiler with the O3 optimization (the highest level), regardless of the rounding convention chosen. This is in part because there are no instructions that move data between the general purpose registers and the floating point registers, and in part because both the unsigned char data type and the float data type, while supported on the Power PC in theory, are not natural data types; at least the compiler does not think so.
Another example true for both 68K and Power PC is that while the processors make overflow detection easy, the C language does not provide any natural method to write code that detects arithmetic overflow in integer arithmetic. Thus, while 68K and Power PC both have instructions to support multiple word precision arithmetic easily, writing it portably is not so easy (although it is possible; QuickDraw GX supplied PowerPC with C implementations of 64-bit-wide numbers thanks to the magic of Apple engineer Rob Johnson).
You mention that bit manipulation is too hard to do in C, etc., claiming that if the arguer knew C well, there would be little or no argument. Not always so; for instance, finding the first set bit in a register is a single instruction for both PowerPC and 68K, but I think youll have a hard time generating a C construct to generate that instruction. Similarly, I challenge you to generate C constructs for the 68K instructions BFINS or BFEXT, or the more complicated cases of the PowerPC instruction rlwimi; for instance, some compilers will not generate a simple bit rotate no matter how clever your C is (MPW C will, but only because of requests on my part).
Sometimes the best way around these sorts of problems is to use a standard C library, since the compiler author is likely to make sure that at least the standard library calls generate the correct instructions. For instance, the fabs() function will generate the corresponding instruction on the xlc compiler. But, I tried to get the compiler to generate the fnabs instruction (negative absolute value of a double) using various permutations of
-(a > 0 ? a : -a)
by moving the negation inside the expression and reversing the conditional, and not only did it never generate the correct instruction, each time it generated a different set of instructions for each of the identical permutations!
To suggest that the compiler is always smarter than the programmer is a bit naive. While I do not doubt that this can be the case, I suggest that a thorough understanding of the instruction set and the experimental knowledge gained from attempting to write C constructs to generate those instructions can go a long way towards high end program tuning. For instance, I had no trouble getting both MetroWerks and xlc to generate a single instruction out of the line: d = -(d * d + d); where d is a double. At first, reading your article seemed to support the argument that I hear that all engineers should write C as well as they can without regard to the assembly that is produced, because the compiler will always be smarter than they are, and because they want to be portable, etc. It is fine to drive a car without understanding how the engine works, but a little less savory to drive one knowing that the designer did not understand the same. In order to become expert at computer programming, you have to understand how computers work, including dirty assembly. It is difficult to gain that understanding without ever having written a line of it.
I am not suggesting writing the next version of MacWorks in assembly; far from it. I just finished working on QuickDraw GX, and the native PowerPC version has nearly no assembly at all; the 68K version has maybe 1%, and all assembly has portable C equivalents. But as I consider graphics algorithms for the next version, I immediately consider what assembly best implements the algorithms, and how that defines the high level representation for those algorithms. I rely on my experience writing very large projects completely in assembly, and expect those working with me to have a similar depth of knowledge.
Further on in the article, I get the feeling that you do expect the reader to understand assembly. But rather than heading the paragraph How to Write the Code in Assembly Language, I suggest How to Verify the Algorithm Written in a High Level Language. Disassemble it. Understand what you are asking the computer to do. In the PowerPC case, important concepts include leaf node routines, how floats and integers work, register conventions, C switch statements (not cheap because of the architecture), to mention a few.
Finally, I disagree that PowerPC assembly language is more difficult than 68K. I suggest it is just different. 68K code can slow down or speed up by a factor of 2 depending on how it is loaded in the cache, and another factor of 2 if the data is misaligned; you wont detect either easily by looking at a code listing. While the scheduling and pipe-lining issues in PowerPC complicate writing good assembly, there are not a lot of rules to learn; the simplified memory model and uniformity of instruction latency makes it actually much easier than the cycle counting I used to do for 68K.
Well, I thought I was done, but I disagree with one more thing: your assessment that THINK C has adequate performance tools. By this I assume that you mean the profiler option. First, it is not easy to adopt this to monitor the performance of any piece of code (as you suggest) since the compiler must generate special callouts in the body of the code for the profiler to work at all. Secondly, it works unmodified using Ticks, not the Microsecond timer; coarse by any measure. Third, when adapted to use the Microsecond timer, the software must be calibrated to eliminate the time used by the profiler code itself. Without doing this, the software cant tell the difference between one function callout and two. Fourth, even with the corrections, I have found that I need to either remove all network connections or turn off interrupts on my IIfx (slow enough for microseconds to be useful) to get accurate timings. And you cant just jam a register on a PowerPC to turn off interrupts. Lastly, I have to rewrite the printouts to get useful info; the THINK standard ones just arent very good. Not a big deal, but work nonetheless.
I hope I wasnt being too big of an pain in the backside with everything written above; I liked the article. And, if I am wrong about any of the points in my rebuttal, please do not hesitate to correct my shortcomings.
- Cary Clark, Apple Computer, Inc.
More 601 Assembly Feedback
I would like to comment on the prevailing propaganda regarding assembly language and the PowerPC. Every time I hear it, I feel that my intelligence is being insulted.
It has been expressed (primarily by Apple, but also by Metrowerks) that trying to use assembly language on the PowerPC is a bad idea. They give a number of reasons including the difficulty of porting, the difficulty of optimizing, the need to adapt to different PowerPC implementations and the quality of existing compilers. I am told that I couldnt match the speed of compiled C/C++ code even if I tried.
In fact I agree with their reasons and have often generated optimal code simply by breaking long expressions into many pieces, using extra variables to hold intermediate values and reordering the statements to schedule well on the PowerPC.
However, there is a important point that everyone seems to be missing! The PowerPC architecture defines an instruction set that is significantly larger than what the compilers actually use. For example, compile:
x = (x << 31) + (x >> 1)// Rotate right one bit
and you will probably get at least three instructions, when one (a right longword rotate by one bit) would suffice on both the 680x0 and the PowerPC. No compiler I have ever seen (and Ive seen quite a few) generates rotate instructions.
There are numerous other examples, including absolute value, multiply long high word, add/subtract with extend, and count leading zeros. All are useful in specialized compute-intensive operations. In addition, certain no-op instructions are necessary on the PowerPC 601 to keep the pipeline running at full speed.
Of course, none of them correspond to built-in operations of the C, C++, or Pascal language -- and thats the real reason we need assembler. In fact, its one of the reasons that inline assembler is part of the developing ANSI C++ specification.
I would appreciate it if a more generous attitude were taken towards assembly language in the future.
- Robert P. Munafo, Malden, MA
Sometimes our mailbox gets to be a bit interactive. In a followup letter, Robert added
Thank you for your reply. I feel a lot better after hearing what you had to say about the use of assembler at Apple. [Ibasically said that most everyone was writing system software in C, with a handful of exceptions, like Mixed Mode, and the emulator - Ed stb]
I was trying not to respond exclusively to Steves article (Thoughts from the Cornfield, MacTech vol. 10, No. 5) but to each of the other times Assembler has been discouraged and the general C++ is now sufficient for everything propaganda. For example:
Same issue, p. 68, column 1, last ¶
Vol. 10, No. 3 Page 82 column 1, ¶ 3
The place in the CodeWarrior manual where they explain why inline assembler for PowerPC is not yet supported.
I do appreciate the section on How to write the code in assembly language inasmuch as it points out the advantages of using the compilers output as a first step, and benchmarking your efforts to see if they really speed things up, etc.
I think that even if you are not going to write any assembler, you still need to be rather familiar with the operation of the PowerPC chip in order to optimize your C or C++ code.
It would be very interesting indeed if compilers could be improved a bit as a result of this discussion! Even GCC and G++, (the GNU compilers) which are widely regarded as pretty much the best thing going, do not generate rotate instructions. However, it is quite easy to generate rotate instructions if you have already implemented a peephole optimizer.
- Robert Munafo
You Can Beat Mpws Code Generator
You must have been up late for you to have thought people like Symantecs quick turnaround tools and MPWs code generation. Its OK, cuz we know it was probably a simple case of getting the object files swapped - and, alas, it was very late. <grin> I think MPWs code generation is brain-dead; Thinks code is significantly more intelligent. Is it not smaller, too?
Thanks for mentioning your visit to the Software Developers Conference. Going behind enemy lines is good for our side. Maybe as a result you will broaden the horizons of Mac tool developers; causing the number and the quality of Mac development tools to grow. Miracles are possible...
- dk smith, mtn. view, ca
I must have been asleep when the balance changed. Besides, I never said that MPWs code generation wasnt brain-dead - Ed stb
But You Cant Beat A Good Algorithm
Your article on performance and the misconceptions about the use of assembly (Thoughts from the Cornfield, May 1994) was excellent. The importance of a good algorithm cannot be overlooked. The thought of thousands of lines of 68K code being ported to run native on the PowerPC architecture is scary - there's a lot of Toolbox/OS code in this category. Lets hope those porting 68K code to C dont blindly port the assembly language program's structure, interfaces, and algorithms. This could waste the PowerPCs speed and nullify our hardware performance advantages over Pentium.
- dk smith, mtn. view, ca
Free the SDKs
This is an open letter to decision makers at Apple in which I request that the policy of charging extra for crucial SDKs be discontinued.
Why are SDKs important? Software Development Kits (SDKs) contain critical information that enables developers to support specific components of the Macintosh OS and User Interface.
As a small developer, Ifind it difficult enough to budget for necessities like E.T.O., the Developer Program, and WWDC, but the added pain of buying an SDK for every feature Iwant to add to my application is burdensome in terms both of time and money.
Apple is fond of comparing itself to Microsoft. So lets make a rough comparison of the yearly cost of the Apple and Microsoft developer programs, and see how Apples policy of charging for SDKs dramatically alters the cost equation.
The comparisons assume I want:
1. Access to all the SDKs Ill need for a platform.
2. Programming tools to use the SDKs.
3. System software to test with.
4. Programmers documentation.
Associates Program (& dev. CDs) $350 Yearly
E.T.O (development tools) $400 Yearly
(initial $1295) ---
Add a few SDKs (a sampling for reference only):
QuickTime SDK $195
Easy Open SDK $150
AppleScript SDK $199
Some SDKs $544
Add it all up to get $1294 Total
Developer Network Level 2 $495 yearly
Visual C++ $99 update
initial $599 ---
Ill point out areas in which the above comparison favors Apple:
1. As SDKs are updated, Apple charges update fees in the ~$100 range per SDK (reference AppleScript SDK and QuickTime SDK). So though Ive listed Apples initial SDK charge, theres a recurring cost component as well. Microsoft ships updates to SDKs on each quarterly Developer Network CD. Apple forces developers to purchase these updates (once they realize something is not up-to-date).
2. Well also ignore the fact that Microsoft ships the Windows programming documentation with Visual C++. Apple developers need to spend untold extra $$$ hundreds $$$ for $Inside $Mac.
3. The items bought above from Apple do not guarantee access to all SDKs and all copies of system software that we need from Apple. The list above contains only a few. Im omitting some little things...like AOCE :-). The items from Microsoft include ALL WINDOWS SDKs and OS versions. Including NT. And soon Chicago. Its coming...
So what am I bitching about, and why?
My major gripe is that Apples policy of charging for SDKs adds substantial cost and time overhead to Macintosh development that hinders support of new features.
Not only do I need to spend the money to purchase the additional SDKs, but I need to take the time to order from APDA and wait for the material before I can start to implement new features. This extra pain makes it less likely that I (or others) will add support for new system SW features.
Whats the point, Apple?
Do you want us to support new System 7.x features like scripting, QuickTime, Easy Open?
Are you trying to fund development by charging developers for SDKs?
Wouldnt you rather do everything you can to encourage developers to continue to support the Mac and to add Macintosh-specific features to cross-platform programs in order to maintain the shrinking differentiation between the platforms so that you can maintain or improve market share?
Ive heard folks from Apple argue that the charge for SDKs is simply to cover the cost of their delivery. Surely youre joking, Mr. Spindler!!! Apple claimed that the E.T.O. (and/or, at one time or another, The Developer CD Series) was the one-stop source for developer tools and information for Macintosh development. Okay, we paid for it. But where are those SDKs? Surely this is the place?
The recent re-shuffle of the Developer CD series into Tools, Systems Software, and Reference Library was to make more room on the CDs. So where are those SDKs? Surely you can fit on the Easy Open SDK ($150 for a floppy and 100 pages of docs - isnt that just pure greed?). And surely theres room for AppleScript docs, headers, and samples?
Go ahead, make my day - Heres what I want:
Put all SDKs, System versions, System Extensions, and DocViewer copies of associated documentation on the Developer CD Series. If I want paper documentation, Ill pay extra. Or print it.
So why should Icare?
Ive supported Apple for a long time. Ive been writing commercial Macintosh Software since 1984. In 1984 Apple did everything it could to encourage development on the Mac. 1994 is like 1984 in that this is a hard sell. 1994 is not like 1984 in that Apple is doing less to support developers. Lets make 1994 more like 1984. I want the Mac to succeed, and to trample Windows. But on a recent project (Houdini) I took a trip to the dark side. And discovered some truths: Microsoft does far better than Apple in providing access to its system technologies.. Go ahead, make my day. Make it easier for me to make the Macintosh shine.
- James Berry, email@example.com
As Ike Nassi, Apples Vice President of Development Products, said at the WWDC in May, Apple is committed to improving the way in which we currently distribute SDKs to developers. In fact, we are - right now - finalizing a new plan that will allow us to deliver a complete set of system software SDKs to developers on a regular basis for a very attractive price. We expect to be able to present the details of this plan within the next month or so. Were confident that our new approach to SDK distribution will address most of the concerns of Mr. Berry and other developers with whom weve discussed similar issues in the past few months.
- Gary Little, Product Manager,
Macintosh Development Tools
Apple Computer, Inc.