Code Optimization

Volume Number: 15 (1999)
Issue Number: 4
Column Tag: CodeWarrior Workshop

Code Optimization

by Andrew Downs

Code Optimization Options in CodeWarrior

Introduction

Code optimization involves the application of rules and algorithms to program code with the goal of making it faster, smaller, more efficient, and so on. Often these types of optimizations conflict with each other: for instance, faster code usually ends up larger, not smaller. Optimizations can be performed at several levels (e.g. source code, intermediate representations), and by various parties, such as the developer or the compiler/optimizer.

This article discusses and demonstrates the code optimization options available in CodeWarrior Pro. In spite of its ever-growing sophistication, CodeWarrior still makes it easy to select the right level of optimization for your program. Understanding how CodeWarrior applies optimizations will help when it is time to debug your code, especially when using a low-level debugger or when the source code is not available.

Optimization Terms

In order to understand the process, you need to speak the language. Here are definitions for the terms discussed in this article, plus a few extras. Keep in mind that the words replace, substitute, etc. do not imply physical changes to your original source code, but rather are reflected in the generated machine code, after your source code has been parsed and optimized.

Each term includes a brief description, including why it is important. Code examples illustrating some of these terms will be presented later in the article.

Common subexpression elimination: If the value resulting from the calculation of a subexpression is used multiple times, perform the calculation once and substitute the result for each individual calculation.
Constant propagation: Replace variables that rely on an unchanging value with the value itself.
Copy propagation: Replace multiple variables that use the same calculated value with one variable.
Dead code elimination: Code that never gets executed can be removed from the object file to reduce stored size and runtime footprint.
Dead store elimination: Variables and related store operations that are never used can be removed from the object file to reduce stored size and runtime footprint.
Global register allocation: Variables that do not overlap in scope may be placed in registers, rather than remaining in RAM. Accessing values stored in registers is faster than accessing values in RAM.
Global vs. peephole scope: Global optimization allows the compiler/optimizer to look at the overall program and determine how best to apply the desired optimization level. Peephole provides local optimizations, which do not account for patterns or conditions in the program as a whole. Local optimizations may include instruction substitutions.
Inline calls: A function that is fairly small can have its machine instructions substituted at the point of each call to the function, instead of generating an actual call. This trades space (the size of the function code) for speed (no function call overhead).
Instruction scheduling: Instructions for a specific processor may be generated, resulting in more efficient code for that processor but possible compatibility or efficiency problems on other processors. This optimization may be better applied to embedded systems, where the CPU type is known at build time. Most desktop software should use the Generic PowerPC option in CodeWarrior.
Lifetime analysis: A register can be reused for multiple variables, as long as those variables do not overlap in scope.
Loop counting using CTR: The count register (CTR) on the PowerPC is intended to be used for counting, such as in loop iterations. There are some branch instructions that specifically use the value in the CTR.
Loop invariant expressions (code motion): Values that do not change during execution of a loop can be moved out of the loop, speeding up loop execution.
Loop transformations: Some loop constructs can be rewritten to execute more quickly. For example, a for{} loop can be rewritten as a while{} loop, saving one or more instructions.
Loop unrolling: Statements within a loop that rely on sequential indices or accesses can be repeated more than once in the body of the loop. This results in checking the loop conditional less often.
Strength reduction: Certain operations and their corresponding machine code instructions require more time to execute than simpler, possibly less efficient counterparts.

CodeWarrior Optimization Dialog

Figure 1 shows the dialog box pane used to specify instruction scheduling and peephole optimization in CodeWarrior.

Figure 1. PPC processor pane.

Global optimizations are specified in the optimization pane, shown in Figure 2. Global optimization can be performed with the intent of making the resulting code faster or smaller, but not both.

Figure 2. Global optimization pane.

The code examples presented in this article use either peephole optimization or the faster execution speed settings. The smaller code size setting is briefly discussed later in the article. Profiled results for smaller code size are provided in the article archive, but are not illustrated in code examples.

Disassembling Code

The Disassemble menu item under the Project menu in CodeWarrior allows you to view the compiler's output resulting from application of any optimization choices. This is extremely useful in that it gives you a chance to save an assembly listing of your code for use during debugging sessions. That menu item was used to produce the listings presented in this article.

Just Enough Assembly Language

Listed below are the PowerPC instructions used in this article. Following each is the expanded mnemonic, and a short description of what the instruction does (using the specified operands). The instructions are presented in mnemonic alphabetical order, not the order in which they are encountered in the code. For a thorough treatment of Power PC assembly language, refer to the references listed at the end of this article.

add r30,r30,r0: Add: store the sum of the values stored in r0 and r30 in r30.
addi r30,r30,1: Add Immediate: add 1 to the value stored in r30, and store the result in r30.
b *+12: Branch: continue execution at the location 12 bytes (3 instructions) after this instruction.
bdnz *-80: Branch if Decremented Count Register is Not Zero: decrement the Count Register, and if its value is not zero, continue execution 80 bytes (20 instructions) before this instruction.
bl.whileTest__Fv: Branch and Link: branch to the location specified and continue execution, saving the address of the next instruction to be executed in this routine in the Link Register.
blt *-12: Branch if Less Than: if the result of the previous comparison resulted in the Condition Register LT bit being set, continue execution at the instruction 12 bytes (3 instructions) prior to this instruction.
bne *-28: Branch if Not Equal: if the result of the previous compare operation is not zero, as recorded in the Condition Register EQ bit, continue execution at the instruction 28 bytes (7 instructions) before this instruction.
cmplw r31,r29: Compare Logical Word: perform an unsigned comparison between r29 and r31, and update the Condition Register as appropriate.
cmplwir31,$0000: Compare Logical Word Immediate: compare the value in r31 against 0, and set the appropriate flags in the Condition Register.
li r30,0: Load Immediate: move the value 0 into register r30.
lwzr3,gLimit(RTOC): Load Word and Zero: move the value of the global variable gLimit into register r3. Global variables are referenced as offsets from the Table of Contents register (RTOC).
mr r31,r29: Move Register: copy the contents of r29 into r31.
mtctr r0: Move To Count Register: copy the value in r0 to the Count Register (typically used for loop iteration).
mullw r0,r29,r28: Multiply integer word, return Low Word: store the product of the values stored in r28 and r29 in r0.
stmw r26,-24(SP): Store Multiple Word: store registers r26-r31 beginning at the location -24 bytes off the current stack pointer location.
stw r31,-4(SP): Store Word: store register r31 at the location -4 bytes off the current stack pointer location.
subir31,r31,1: Subtract Immediate: subtract 1 from the value in r31, and store the result in r31.
subic.r31,r31,1: Subtract Immediate Carrying: similar to subi, but set the Exception Register CA bit with the result of the carry (whether it occurs or not).

Example Code

The archive for this article includes the .c and .dump files. Listing 1 contains the source code used to test the optimization options. For illustration purposes, some of the functions contain redundant or unused variables, unoptimized loops, and so on.

The profiler calls and header include statement are commented out. The sample project file in the archive includes the profiler library, so if you would like to use the profiler, simply uncomment the four lines in the source code, compile, and run.

Listing 1: test.c

test.c
Sample code with loops and variable usage.

#include <stdio.h>
#include <stdlib.h>
//#include <profiler.h>

unsigned long cseTest( void );
unsigned long doWhileTest( void );
unsigned long whileTest( void );
unsigned long forTest( void );

unsigned long gLimit = 1000L;

int main( void ) {
//ProfilerInit( collectSummary, bestTimeBase, 10, 3 );
//ProfilerSetStatus( 1 );
   unsigned long   theResult;
   unsigned long   j = 0L;

   j++;

   theResult = whileTest();
   
   theResult = doWhileTest();
   
   theResult = cseTest();

   theResult = forTest();
   
   j-;
//ProfilerDump( "\ptest.c.prof");
//ProfilerTerm();
   return 0;
}

unsigned long whileTest( void ) {
   unsigned long   theResult = 0L;
   unsigned long   i, limit = gLimit;

   i = 0;
   
   while ( i < limit )
      i++;
   
   return 0;
}

unsigned long doWhileTest( void ) {
   unsigned long   theResult = 0L;
   unsigned long   i, limit = 1000L;
   unsigned long dest[ 1000 ], index = 0;

   i = limit;
   
   do {
      dest[ index++ ] = i;
      i-;
   } while ( i > 0 );

   return 0;
}

unsigned long cseTest( void ) {
   unsigned long   theResult = 0L;
   unsigned long   i, limit = 1000L;
   unsigned long   product = 0L;
   unsigned long   x = 4;
   unsigned long   y = 3;

   theResult = limit;
   
   i = limit;
   
   do {
      product += x * y;
   } while ( i- > 0 );

   return 0;
}

unsigned long forTest( void ) {
   unsigned long   i, limit = 1000, sum = 0;
   
   for ( i= 0; i < limit; i++ )
      sum += i;

   return sum;
}

Peephole Optimization

The peephole optimizer applies a set of pattern-matching rules to the code, with the goal of replacing inefficient sequences of instructions with more efficient sequences. A second goal is to perform some register usage optimizations based on variable lifetimes. Both benefits are shown in the following examples.

Note that function prolog and epilog code is only occasionally presented in the listings in this article. The full listings are included in the article archive.

In the first example, the non-optimized version of main() contains the following assembly code, where register r30 is used to hold the value of j:

int main( void ) {
//...

   // Initialize j.
   // unsigned long   j = 0L;
      li    r30,0

   // Increment j.
   // j++;
      addi   r30,r30,1

   // theResult = whileTest();
      bl    .whileTest__Fv

   // ...

Enabling the peephole optimizer results in no action being taken on j, since its value is never used:

int main( void ) {
//...

   // Initialize j.
   // unsigned long   j = 0L;

   // Increment j.
   // j++;
   // No instructions generated!

   // theResult = whileTest();
      bl    .whileTest__Fv

   // ...

The second example uses the function doWhileTest(). The variable limit is accessed twice, but its value never changes. In addition, there is a subtract/compare instruction sequence. First, the unoptimized version:

unsigned long doWhileTest( void ) {
// ...

   // unsigned long   i, limit = 1000L;
   // Initialize limit in r29.
      li    r29,1000

   // i = limit;
   // Copy limit into r31.
      mr    r31,r29

   // i-;
   // Subtract 1 from i (in r31).
      subi   r31,r31,1

   // } while ( i > 0 );
   // Compare to 0 and branch if not equal.
      cmplwi  r31,$0000
      bne   *-28

The peephole optimizer detects the redundant register usage for the variable limit, and the fact that the value of limit never changes. The peephole optimizer also finds the inefficient subtract/compare sequence. It transforms the function into:

unsigned long doWhileTest( void ) {
// ...

   // Initialize limit in r31.
   // This is the same register used for i, so no copy/move operation is required.
      li    r31,1000

   // i-;
   // Subtract with carry and set condition flags.
      subic.  r31,r31,1
      
   // Branch based on result of subtraction operation.
   // Comparison operation is implicit.
      bne   *-20

The profiler says:

The CodeWarrior profiler, which can be used to measure (among other things) the amount of time spent in various functions, showed the following execution times for doWhileTest():

Optimization Level	Function	Time (ms)	% of Total Time
None	doWhileTest()	0.131	39.3
Peephole	doWhileTest()	0.105	34.4

These results were obtained using the bestTimeBase setting. At runtime, this translated into the PowerPC timebase setting, which uses the Real-Time Clock (RTC) register on PPC 601-based machines, and the Time Base (TB) register on later implementations, for accurate, low-overhead timing.

The fourth column is the percentage of total program running time spent in this function. This is very useful in determining where to begin optimizing. To use it properly, you should compare the results for all of the functions at a given optimization level. (The profiler summary for all functions and all optimization levels is included in the article archive.) You need to be careful, because the time and % of total time columns do not necessarily mirror each other. You will see in one of the examples (Global Optimization - Level 4) that sometimes an inverse relationship occurs: less time spent in a function may be accompanied by a greater percentage of total time!

In summary, even though its scope is relatively local, the peephole optimizer may be able to provide some performance benefits, depending on patterns it detects in the code. In this case, it decreased execution time slightly (by approximately 20%).

Global Optimization - Level 1

This level of optimization provides global register allocation and dead code elimination.

Here is the non-optimized version of whileTest():

unsigned long whileTest( void ) {
      stw   r31,-4(SP)
      stw   r30,-8(SP)

   // unsigned long   theResult = 0L;
      li    r0,0
      stw   r0,-16(SP)

   // unsigned long   i, limit = gLimit;
      lwz   r3,gLimit(RTOC)
      lwz   r30,0(r3)

   // i = 0;
      li    r31,0

   // while ( i < limit )
      b    *+8
      i++;
   
      addi   r31,r31,1
      cmplw  r31,r30
      blt   *-8

The level 1 optimized version eliminates the non-volatile register saves, and uses volatile registers for local variables. Also, the assignment of theResult is eliminated, since its value is never used.

unsigned long whileTest( void ) {
   // unsigned long   theResult = 0L;
   // unsigned long   i, limit = gLimit;
      lwz   r3,gLimit(RTOC)
      lwz   r0,0(r3)

   // i = 0;
      li    r3,0

   // while ( i < limit )
      b    *+8

      // i++;
      addi   r3,r3,1
      cmplw  r3,r0
      blt   *-8

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
None	whileTest()	0.025	7.5
Global - Faster 1	whileTest()	0.025	7.5

In this case, the result is not faster, but it is more efficient in terms of number of instructions used (code size).

Global Optimization - Level 2

This level of optimization adds common subexpression elimination and copy propagation to those provided by level 1.

Here is the non-optimized version of cseTest():

unsigned long cseTest( void ) {
   // Preserve the non-volatile registers (r26-r31).
      stmw   r26,-24(SP)

   // unsigned long   i, limit = 1000L;
      li    r27,1000

   // unsigned long   product = 0L;
      li    r30,0

   // unsigned long   x = 4;
      li    r29,4

   // unsigned long   y = 3;
      li    r28,3

   // theResult = limit;
   
   // i = limit;
      mr    r31,r27

   // do {
      // product += x * y;
      // A multiply is an expensive (slow) operation.
      mullw  r0,r29,r28
      add   r30,r30,r0

   // } while ( i- > 0 );
      cmplwi  r31,$0000
      subi   r31,r31,1
      bne   *-16

Enabling level 2 optimization changes the multiply instruction to an add (using a constant value of 12). As with level 1, the use of volatile registers (r0, r3-r12) replaces usage of non-volatile registers (r13-r31). As a result, the store multiple instruction goes away.

unsigned long cseTest( void ) {
   // No store multiple register instruction.
   // unsigned long   theResult = 0L;
   // ...

   // theResult = limit;   
      li    r4,0
   
   // i = limit;
   // do {
      li    r3,1000

      // product += x * y;
      // Add the constant value 12 to product.
      addi   r4,r4,12

   // } while ( i- > 0 );
      cmplwi  r3,$0000
      subi   r3,r3,1
      bne   *-12

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
None	cseTest()	0.131	39.3
Global - Faster 2	cseTest()	0.047	18.7

The optimized version is approximately 64% faster than the original!

Global Optimization - Level 3

This level of optimization adds loop transformations, strength reduction, and loop-invariant code motion to those provided by level 2.

Here is the non-optimized version of forTest():

unsigned long forTest( void ) {
      stw   r31,-4(SP)
      stw   r30,-8(SP)
      stw   r29,-12(SP)

   // unsigned long   i, limit = 1000, sum = 0;
      li    r29,1000
      li    r30,0

   // for ( i= 0; i < limit; i++ )
      li    r31,0
      b    *+12

      // sum += i;
      add   r30,r30,r31
      addi   r31,r31,1
      cmplw  r31,r29
      blt   *-12

Applying the optimizations results in an unrolling of the loop. This code now contains the equivalent of ten loop iterations in the body of the loop. Notice that the iterator (limit) has been reduced to the value of 100 to compensate.

unsigned long forTest( void ) {
   // unsigned long   i, limit = 1000, sum = 0;
      li    r3,0
      li    r4,0
      b    *+4

   // for ( i= 0; i < limit; i++ )
      // Iterate over the loop 1000 times (total).
      // Use the count register to hold the adjusted value of limit (100).
      li    r0,100
      mtctr  r0
      b    *+4

      // sum += i;
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      add   r3,r3,r4
      addi   r4,r4,1
      bdnz   *-80

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
None	forTest()	0.046	13.9
Global - Faster 3	forTest()	0.031	14.9

The optimized version is approximately 33% faster. Notice the increased percentage of total time: the time savings obtained in this loop must have been outweighed by more significant savings elsewhere in the optimized program.

Global Optimization - Level 4

This level of optimization adds repeated loop-invariant code motion and common subexpression elimination. The optimized code becomes a candidate for further optimizations, until no more opportunities are found.

For this example, we will compare the level 3 and 4 versions of the same function. Here is the level 3 optimized version of cseTest():

unsigned long cseTest( void ) {
   // unsigned long   theResult = 0L;
   // unsigned long   i, limit = 1000L;
   // unsigned long   product = 0L;
   // unsigned long   x = 4;
   // unsigned long   y = 3;

   // theResult = limit;
   
   // i = limit;
   
   // do {
      li    r4,0
      li    r3,1000
      b    *+4

      // product += x * y;
   // } while ( i- > 0 );
      addi   r4,r4,12
      cmplwi  r3,$0000
      subi   r3,r3,1
      bne   *-12

Here is the level 4 version. Notice that the value of product is never calculated, probably because it is never used.

unsigned long cseTest( void ) {
   // unsigned long   theResult = 0L;
   // ...
   
   // do {
      li    r3,1000
      b    *+4
      b    *+4

      // product += x * y;
   // } while ( i- > 0 );
      cmplwi  r3,$0000
      subi   r3,r3,1
      bne   *-8

The profiler says:

Optimization Level	Function	Time (ms)	% of Total Time
Global - Faster 3	cseTest()	0.046	22.4
Global - Faster 4	cseTest()	0.028	14.8

Again, much (39%) faster the second time around!

Faster vs. Smaller

Although I am not presenting examples using global optimizations for smaller code, the profiler results for those settings are included in the archive. For reference, here are the generated code sizes for faster vs. smaller, for each optimization level:

Faster 0 360 48 (constant) Faster 1 260 Faster 2 220 Faster 3 308 Faster 4 316 Smaller 0 312 Smaller 1 260 Smaller 2 220 Smaller 3 236 Smaller 4 244

Faster/Smaller	Level	Code Size (bytes)	Data Size (bytes)
Faster	0	360	48 (constant)
Faster	1	260
Faster	2	220
Faster	3	308
Faster	4	316
Smaller	0	312
Smaller	1	260
Smaller	2	220
Smaller	3	236
Smaller	4	244

The smaller option always generated code that was no larger, and usually smaller, than its faster counterpart at each optimization level. Applying levels 3 and 4 always resulted in increased code size (compared to level 2). However, the percentage increase from level 2 to level 4 for the smaller setting (10.9%) is much lower than for the faster setting (43.6%).

Be careful in trying to apply these results to another program. Each program has its own unique characteristics, which may respond favorably to some types of optimization but not to others. Plus, your needs may dictate which type of optimization to perform. For example, code destined for embedded systems may require the smaller option.

Doing it Yourself

Hand-optimized programs may still present the best performance. For example, you can optimize loops yourself, or include assembly functions mixed with high-level code. On the other hand, that approach is time-consuming, costly, and possibly less portable. Plus, not all developers have the necessary skills to hand-optimize. Relying on the development environment to perform optimizations is a good thing.

Writing a straightforward, correct program is key to generating an optimized program. That version becomes the starting point in the optimization process. Every subsequent adjustment or optimization, whether applied by the developer or the development environment, should only make the code faster or tighter.

Debugging efforts should focus on the unoptimized version, since the generated machine code most closely mirrors the original source. Once the program is bug-free (!), the desired type and level of optimization should be applied.

Conclusion

Optimization techniques can improve code size or speed. CodeWarrior provides several categories of optimization, including peephole, global, and instruction scheduling. You can apply any of these in combination via the project preferences dialog. It is worth reviewing your disassembled code to assess the impact the optimizations will have on subsequent debugging sessions. Debugging will be easier if performed on the unoptimized version of your code, since the optimized version of the code may not closely mirror the original source. Once the bugs are out, add the appropriate type of optimization.

References

Compiler Construction, Niklaus Wirth, Addison-Wesley, 1996.
Optimizing PowerPC Code, Gary Kacmarcik, Addison-Wesley, 1995.
PowerPC Microprocessor Developer's Guide, John Bunda et al, Sams Publishing, 1995.
Writing Compilers and Interpreters, Ronald Mak, Wiley Computer Publishing, 1996.

Credits

Thanks to Tom Thompson of Metrowerks for reviewing this article and providing insight into CodeWarrior's optimization process.

Archive URL

The article archive, containing a CodeWarrior project file, source code, and disassembler and profiler dump files, is available at <www.mactech.com/>.

Andrew Downs is a Technical Lead for Template Software in New Orleans, LA, designing and building enterprise apps. He also teaches Java programming at Tulane University College. Andrew wrote the Macintosh freeware program Recent Additions, and the Java application UDPing. You can reach him at andrew@downs.net.

Software Updates via MacUpdate

Latest Forum Discussions

Make the passage of time your plaything...

While some of us are still waiting for a chance to get our hands on Ash Prime - yes, don’t remind me I could currently buy him this month I’m barely hanging on - Digital Extremes has announced its next anticipated Prime Form for Warframe. Starting... | Read more »

If you can find it and fit through the d...

The holy trinity of amazing company names have come together, to release their equally amazing and adorable mobile game, Hamster Inn. Published by HyperBeard Games, and co-developed by Mum Not Proud and Little Sasquatch Studios, it's time to... | Read more »

Amikin Survival opens for pre-orders on...

Join me on the wonderful trip down the inspiration rabbit hole; much as Palworld seemingly “borrowed” many aspects from the hit Pokemon franchise, it is time for the heavily armed animal survival to also spawn some illegitimate children as Helio... | Read more »

PUBG Mobile teams up with global phenome...

Since launching in 2019, SpyxFamily has exploded to damn near catastrophic popularity, so it was only a matter of time before a mobile game snapped up a collaboration. Enter PUBG Mobile. Until May 12th, players will be able to collect a host of... | Read more »

Embark into the frozen tundra of certain...

Chucklefish, developers of hit action-adventure sandbox game Starbound and owner of one of the cutest logos in gaming, has released their roguelike deck-builder Wildfrost. Created alongside developers Gaziter and Deadpan Games, Wildfrost will... | Read more »

MoreFun Studios has announced Season 4,...

Tension has escalated in the ever-volatile world of Arena Breakout, as your old pal Randall Fisher and bosses Fred and Perrero continue to lob insults and explosives at each other, bringing us to a new phase of warfare. Season 4, Into The Fog of... | Read more »

Top Mobile Game Discounts

Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links below... | Read more »

Marvel Future Fight celebrates nine year...

Announced alongside an advertising image I can only assume was aimed squarely at myself with the prominent Deadpool and Odin featured on it, Netmarble has revealed their celebrations for the 9th anniversary of Marvel Future Fight. The Countdown... | Read more »

HoYoFair 2024 prepares to showcase over...

To say Genshin Impact took the world by storm when it was released would be an understatement. However, I think the most surprising part of the launch was just how much further it went than gaming. There have been concerts, art shows, massive... | Read more »

Explore some of BBCs' most iconic s...

Despite your personal opinion on the BBC at a managerial level, it is undeniable that it has overseen some fantastic British shows in the past, and now thanks to a partnership with Roblox, players will be able to interact with some of these... | Read more »

Price Scanner via MacPrices.net

You can save $300-$480 on a 14-inch M3 Pro/Ma...

Apple has 14″ M3 Pro and M3 Max MacBook Pros in stock today and available, Certified Refurbished, starting at $1699 and ranging up to $480 off MSRP. Each model features a new outer case, shipping is... Read more

24-inch M1 iMacs available at Apple starting...

Apple has clearance M1 iMacs available in their Certified Refurbished store starting at $1049 and ranging up to $300 off original MSRP. Each iMac is in like-new condition and comes with Apple’s... Read more

Walmart continues to offer $699 13-inch M1 Ma...

Walmart continues to offer new Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) online for $699, $300 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBook for sale by... Read more

B&H has 13-inch M2 MacBook Airs with 16GB...

B&H Photo has 13″ MacBook Airs with M2 CPUs, 16GB of memory, and 256GB of storage in stock and on sale for $1099, $100 off Apple’s MSRP for this configuration. Free 1-2 day delivery is available... Read more

14-inch M3 MacBook Pro with 16GB of RAM avail...

Apple has the 14″ M3 MacBook Pro with 16GB of RAM and 1TB of storage, Certified Refurbished, available for $300 off MSRP. Each MacBook Pro features a new outer case, shipping is free, and an Apple 1-... Read more

Apple M2 Mac minis on sale for up to $150 off...

Amazon has Apple’s M2-powered Mac minis in stock and on sale for $100-$150 off MSRP, each including free delivery: – Mac mini M2/256GB SSD: $499, save $100 – Mac mini M2/512GB SSD: $699, save $100 –... Read more

Amazon is offering a $200 discount on 14-inch...

Amazon has 14-inch M3 MacBook Pros in stock and on sale for $200 off MSRP. Shipping is free. Note that Amazon’s stock tends to come and go: – 14″ M3 MacBook Pro (8GB RAM/512GB SSD): $1399.99, $200... Read more

Sunday Sale: 13-inch M3 MacBook Air for $999,...

Several Apple retailers have the new 13″ MacBook Air with an M3 CPU in stock and on sale today for only $999 in Midnight. These are the lowest prices currently available for new 13″ M3 MacBook Airs... Read more

Multiple Apple retailers are offering 13-inch...

Several Apple retailers have 13″ MacBook Airs with M2 CPUs in stock and on sale this weekend starting at only $849 in Space Gray, Silver, Starlight, and Midnight colors. These are the lowest prices... Read more

Roundup of Verizon’s April Apple iPhone Promo...

Verizon is offering a number of iPhone deals for the month of April. Switch, and open a new of service, and you can qualify for a free iPhone 15 or heavy monthly discounts on other models: – 128GB... Read more

Jobs Board

IN6728 Optometrist- *Apple* Valley, CA- Tar...

Date: Apr 9, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92308 **Requisition ID:** 824398 At Target Optical, we help people see and look great - and Read more

Medical Assistant - Orthopedics *Apple* Hil...

Medical Assistant - Orthopedics Apple Hill York Location: WellSpan Medical Group, York, PA Schedule: Full Time Sign-On Bonus Eligible Remote/Hybrid Regular Apply Now Read more

*Apple* Systems Administrator - JAMF - Activ...

…**Public Trust/Other Required:** None **Job Family:** Systems Administration **Skills:** Apple Platforms,Computer Servers,Jamf Pro **Experience:** 3 + years of Read more

Liquor Stock Clerk - S. *Apple* St. - Idaho...

Liquor Stock Clerk - S. Apple St. Boise Posting Begin Date: 2023/10/10 Posting End Date: 2024/10/14 Category: Retail Sub Category: Customer Service Work Type: Part Read more

Top Secret *Apple* System Admin - Insight G...

Job Description Day to Day: * Configure and maintain the client's Apple Device Management (ADM) solution. The current solution is JAMF supporting 250-500 end points, Read more

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech

Code Optimization

Code Optimization Options in CodeWarrior

Introduction

Optimization Terms

CodeWarrior Optimization Dialog

Disassembling Code

Just Enough Assembly Language

Example Code

Listing 1: test.c

Peephole Optimization

The profiler says:

Global Optimization - Level 1

The profiler says:

Global Optimization - Level 2

The profiler says:

Global Optimization - Level 3

The profiler says:

Global Optimization - Level 4

The profiler says:

Faster vs. Smaller

Doing it Yourself

Conclusion

References

Credits

Archive URL

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board