
- Home
- Magazine
- Conference & Seminars
- News
- Archives
- Forums
- Store
- Directory
- Editorial
- Advertising
- User/Login
- Contact



Volume Number: 15 (1999)
Issue Number: 8
Column Tag: Tips & Tidbits
by Jeff Clites, tips@mactech.com
The June issue of MacTech highlighted some ways to move pixels around but there was no mention of a little-known PPC ASM instruction that can be used to speed pixel blitting (and all memory moving functions for that matter) called lmw/stmw.
When the data is aligned, it takes 4 cycles to execute a normal move memory. These two instructions are very powerful because they take 3 + n cycles to move n words. Hence each additional move only takes 1 cycle. This instruction works differently on different PPCs. The 601 treats these instruction like multiple lwa while the 603e, 604 and, G3 treat the instruction like a multimove.
There are a few restrictions on the following code:
The following is an example of how to move pixels three times faster than the fastest method presented in Fast Blit Strategies.
export SpeedCopy[DS] export .SpeedCopy[PR]
toc tc SpeedCopy[TC], SpeedCopy[DS] ;TOC entry "SpeedCopy" for ;transition vector "SpeedCopy"
csect SpeedCopy[DS] ;Define transition vector "SpeedCopy" dc.l .SpeedCopy[PR] ;Pointer to code dc.l TOC[tc0] ;Pointer to TOC dc.l 0
# Prolog: SpeedCopy ;void SpeedCopy(long height, long width, long srcRowbytes, ; unsigned long *dest, long destRowbytes, unsigned long* src);
csect .SpeedCopy[PR] ;Prolog begins here
;r3 = dest.height ;r4 = dest.width ;r5 = dest.rowbytes ;r6 = dest.baseAddr ;r7 = src.rowbytes ;r8 = src.baseAddr
stmw r22,-36(SP) ;store temp register space li r22, 1 li r23, 24
@lineLoop mr r12,r4 ;x = dest.width mr r10,r6 ;tmpdest = dest mr r11,r8 ;tmpsrc = src
@pixelLoop: lmw r25,0(r11) ;Move 4 + 4 + 4 + 4 + 4 + 4 from ;tempSource to r25 thru r31
subf. r12,r23,r12 ;Subtract num Pixels from total, test against 0 addi r11,r11,24 ;Add pixel width to tempSource even for ;different size pixels
stmw r25,0(r10) ;Move pixels from r25 thru r31 to screen addi r10,r10,24 ;Add pixel width to dest even for different ;size pixels
bgt @pixelLoop ;Loop if the subtraction is greater than 0
subf. r3, r22, r3 ;Subtract one line from height, test against 0 add r8,r8,r7 ;Add src.rowbytes to src.baseaddr add r6,r6,r5 ;Add dest.rowbytes to dest.baseaddr
bne @lineLoop ;Loop if height not equal to 0
lmw r22,-36(SP) ;Restore register space blr
end
With a few changes this code can pixel double, copy every other line, or both.
The best way to make a machine go faster is to make it do less. This is one of only a handful of cases where you can do more with less.
Brad Anderson anderson@rpmmusic.com




