Nov 94 Dialog Box
|Column Tag:||Dialog Box
By Scott T Boyd, Editor
Dont Scream, Send em The Article!
Id like to applaud Eric Shapiros article entitled Multiple Monitors vs. Your Application. As a sometimes one, two, or three monitor user I routinely experience everything he griped about. Purchasing multiple monitors is the most cost effective solution for attaining a large on-screen workspace and developers would do well to support it better. (Having multiple monitors has also worked in the past to really freak out some of my IBM user/programmer friends!)
Even Apple has problems writing friendly windowing code. I cant tell you how many times Ive disconnected my second monitor to use somewhere else, restarted, launched AppleTalk Remote Access 1.0, and found one of its crucial windows to appear offscreen. I have to reconnect the second monitor, move the windows to the main monitor, and try again! At least when I encounter this bug in Excel I can arrange it back on...
I too was aghast to see that the new version of one popular developer tool creates the window and then snaps it into position. The first few times I brought up a window I was sure that two were appearing. How could something that obvious slip through, especially when the fix should be a single line of code? Anyway, thanks for the article. If I see one more window jump back to my main monitor when I try to enlarge it on my second monitor by clicking the zoom box, Ill scream!
- Jeff Mallett, firstname.lastname@example.org
PPC Assembly Article Comments
When Bill Karsh repeated last month the worn-out advice originally promoted by the Apple folks last year (You dont need assembler, the compiler can do better than handwritten assembly or words to that effect), it hit me with particular irony. You see, the lack of adequate compiler tools (Thanks, Apple, for your inimitable support here) has forced me to write more assembly code for the 601 in the last couple months than for all other computers combined over the previous decade. Anyway I read his article with great interest. Some comments:
1. Perhaps your readers should know that the sequence,
is unlikely to load the hex value ABCDEF23 into r3, for two reasons. First of all, the result of the addis instruction will be discarded by the second, since it sums ZERO + EF23, not the previous result. Better to use r3 as the second parameter. But it still wont work, because addi takes a SIGNED immediate operand, and EF23 sign-extends to FFFFEF23, not 0000EF23, which adds -1 to the previously loaded upper half. The correct sequence for loading ABCDEF23 into r3 is:
addi r3,r3,0xEF23 or alternatively,
or still better, because its more understandable (ori takes an UNsigned operand):
2. Im not sure how Bill intends to use the -ze variants for add and subtract as register-to-register move or negate and move mnemonics but hes likely to be surprised when he tries to do so and finds the previous contents of the carry flag (XER.CA) randomizing his results somewhat. Better to stick to ORI for move, and NEG for negate and move.
3. Bill tells us that Divide operations treat rA as a 64-bit dividend... Perhaps somebody should tell Motorola, because their manual reports the much more reasonable proposition that the dividend is 32 bits. If its 64, where do the other 32 bits come from?
4. Its really too bad we are stuck with the IBM syntax for the rotate operators. Or I should say YOU are stuck with it: very early on I realized I was, like Bill, burning a lot of time on this stuff, and altered my assembler and disassembler to reflect what is REALLY going on. All three of the rotates have very simple semantics: they rotate the source operand left n bits, then replace some bits in the destination with the rotated bits under the control of a mask. The remaining bits are either zeroed or left unchanged (the fundamental difference between the rlwimi and rlwinm). The problem is specifying the mask. See how much simpler these two instructions are to read:
rotm r29,r27,#3,=0007FFF8 ; rotate left 3, replace indicated
rotz r6,r15,#1,=00000080 ; rotate left 1, pick out a single
when compared to:
5. The latency figures Bill gives for branch instructions are likely to be misleading - perhaps this is why everybody makes the case for compilers being better. Branches are free if you give them enough setup time, basically three integer instructions after the one that altered the CR or CTR or LR register the branch depends on, but a sequence of branches with no data dependencies has a different kind problem. After a sequence of integer operations, the fourth consecutive branch not taken will introduce a bubble in the pipeline, for an effective 1 cycle delay. Consecutive branches taken cost two cycles each; they become free only if two or more integer operations separate each pair of branches taken. Then there are boundary conditions, but these three rules make for pretty efficient code.
6. I think Bill temporarily forgot that IBM numbers the 601 bits Big-Endian when he illustrated the mtcrf instruction. If CRM = 0x08, then its cr4 (not cr3) that is replaced with bits 16-19 (not 12-15). He got the visual image correct, but he would be surprised when he went to use the bits by number. Another argument for the superiority of a visual mask over bit numbers. And yes, my assembler lets me use a visual mask syntax here as in the rotates. Perhaps somebody will come up with a macro preprocessor for the MPW assembler to parse the bit image syntax into something the assembler understands.
- Tom Pittman, Itty Bitty Computers
Bill Karsh responds I am grateful to Tom Pittman for scrutinizing my article in such detail, and pleased that the readers and I will benefit from the corrections. I agree with Tom on most of his points, but let me respond to each.
0) High Level vs. Assembly programming - To everybody (not just Tom who hates tired dogma) maybe I was not clear enough on my personal feelings about assembly. First of all, there is absolutely no question that just about anything coded in (good) PPC assembly can beat the pants off the best compiler yet available and probably ever likely to be available. I never could have intended otherwise. In fact, I code in assembly myself, but my particular work demands peak performance for a handful of core operations. It takes a great deal of effort to achieve this, and one can always improve the code by small changes here and there in a never ending process of refinement. If your particular job specification is to speed up existing and otherwise correct code, you can do much in C, but you can always do more in assembly by paying the price of being absolutely tied to machine-specific code. Thats fine if you think its worth the time that could be spent writing new, more portable and maintainable code. Yes, sometimes it is worth it. The optimization should be well targeted in any case.
What I wanted to argue about compilers was that the capability is there in the hardware to ease the compiler writers job of optimizing. It ought to be possible for compilers to do better than they do today at PPC code and better at PPC code than they have ever been at 68K code. Since there is so much for you to do just to get your project on its feet, personally optimizing things should be a lower priority than making them correct and meeting specs. You will gain (some) optimization implicitly as the compilers improve, and there is some reason to be optimistic about this happening. Dont forget that as the machines get faster, the need for touch-ups keeps diminishing. There will always be a place for some killer assembly or some compiler hand-holding, but the genuine need should not arise as often as it used to. A blanket statement about assembly or optimization being evil would just be foolish.
1) Loading 0xABCDEF23 into a register - Tom is correct. My example is the result of hastily copying notes from place to place and incurring typos, for which I have no excuse. Each of his examples of loading a long literal constant is correct.
2) Clever uses of addze and subze as moves - Of course, the carry bit would have to be cleared for the moves to work as suggested by me. Tom is right again. If writing ones own assembly, his are preferred methods for effecting the moves reliably. Otherwise, the ze instructions should really only be employed for extended arithmetic.
3) The sizes of divw rD, rA, rB operands and results - I have no argument with Tom here either. The numerator (N), denominator (d) and quotient (Q) are all 32-bit quantities. When I said what I did about N being treated as 64-bits, I was merely likening the division to that familiar in the 68K divs.w instruction, where N is exactly twice the width of the d or Q registers. I intended that you might consider N in your mind as extended in this way as a formal convenience, not that the hardware operates this way.
4) Shift and rotate semantics - I think Tom is saying that he has created some simplifying macros for himself, which can only be lauded. However, I was concerned in the article with interpreting what most users are likely to see in their standard disassemblers output.
5) Branch timing - I agree only in spirit. There is much to say about branch timing. I reported a latency of one cycle for branch execution which is generally true - thats how long a branch takes to execute (in vacuum, so to speak). This gives little hint that a variety of things can happen depending on the context of the branch. I take issue with Toms trying to characterize timing based on the language of branches taken or not taken. Those are the rules for 68K branch timing. On the PPC that is too simplistic. Branch timing is mainly governed by whether branches are correctly or incorrectly predicted. Incorrectly predicted branches hurt something awful, causing the IQ to be flushed, everything contingently executed to be flushed and new instructions to be fetched. This can cause a delay of more than one or two cycles. Further, the BPU handles one branch at a time, which is why stacking them up is a no-no. The rules for employing branching to best advantage are complicated - too much so to be meaningfully summarized in the space of a letter.
6) mtcrf mask bits - Yep, I mistakenly reversed the bit numbering in the CRM mask parameter. The left-most bit of CRM corresponds to the left-most CR field (cr0) and similarly the right-most bit <-> right-most field (cr7). Whoops!
Let me elaborate on one thing that can be confusing and that occurs frequently in code. The extsb (sign extend byte) instruction extends to a width of 32-bits, unlike the ext.b 68K instruction which extends a byte to 16 bits. This behavior is in keeping with the idea that PPC arithmetic instructions act in general on all the whole of a register.
If anything else is annoying or just plain wrong in the presentation, let me hear about it.
OpenDoc, OLE, and Real Developers
I have just received Microsofts OLE SDK (free of charge) and been browsing it. There is a lot of marketing (evangelizing?) stuff in it, including some deep technical comparations between OLE and OpenDoc. If you program the Macintosh, your future is OpenDoc - Apple says.
Well, Microsoft has different plans. If you program for OLE, which is available now, and its free of charge, you can port your components to Windows and you can work with Excel or Word now - B. Gates says. After much reading and studying I came to a conclusion. I will support OpenDoc because frankly I dont care nor like Windows and I do vertical apps for Macs and UNIX, but if I was a mainstream developer I would go for OLE.
There are some technical differences. According to MS, OLE is a superior technology now and it will get better in the future. OpenDoc has several technical merits but, alas, its not yet available. OpenDoc has a HUGE advantage too: Its open, and that means that source code is available and its going to be ported from PDAs to Mainframes. Microsoft says that OLE is cross-platform (Win-Mac) now and its true, and it says it will run under UNIX for free only if you licence (surprise) its Win32 API!!! And they call that Open. Please dont make me laugh.
I would like to see some input about ISD future plans on this technologies and some technical comparisons too.
So, take a pick, because we are going to start coding parts and putting them together like chips in a computer.
- Daniel Nofal TecH S.A Buenos Aires, ARGENTINA
Dylan Takes A Load Off
Thanks for the Sept. Dylan article. I was disappointed, though, that the article didnt use the example code to emphasize what makes Dylan different from C++ and other static languages. Im not sure how many readers would wade through the code to discover the link between the interface definition of the OpenMovieFile function:
function OpenMovieFile, output-argument: resRefNum;
and its invocation in the open-movie-file method:
let (err,ref-num) = OpenMovieFile(spec, file.data-permission);
nor notice some of the pleasures of Dylan they illustrate. err and ref-num didnt have to be declared prior to their use - Dylan figured it out from the context and created the properly-typed objects. OpenMovieFile() is returning multiple values. The developer didnt have to concern herself whether arguments should be passed by value or by reference, nor worry about the intricacies of memory management because Dylan has automatic garbage collection.
- Steve Palmen, email@example.com