TweetFollow Us on Twitter

Parallel Programming With the Intel Core Duo Processor

Volume Number: 22 (2006)
Issue Number: 9
Column Tag: Performance Optimization

Parallel Programming With the Intel Core Duo Processor

Extracting maximum performance from your applications

by Ron Wayne Green

Introduction

This is the third and final part of a three part series that addresses the most effective techniques to optimize applications for the Intel(R) Core(TM) Duo processor-based Macintosh. Part one of this series introduced key aspects of the Intel Core Duo processor and exposed the architectural features for which tuning is most important. Also presented in that first article was a data-driven performance methodology using the software development tools available on an Intel-based Macintosh to highlight tuning and optimization opportunities. The second article of this three-part series introduced the Intel(R) Digital Media Boost technology of the Intel Core Duo processor, its capabilities, and how a programmer can exploit this computing power. We wrap up the final part of this three-part series with the most interesting and creative aspect to programming the Intel(R) Core Duo processor - taking advantage of both execution cores in the Intel Core Duo processor.

The Need for Parallelization

Multi-processor workstations have been available for many years. However, until recently systems with more than one execution core have been beyond the price range of the average home user. The Intel(R) Core(TM) Duo processor is helping to change this. Now parallel processing capability is within the reach of every user. The question that naturally comes up is "Now that I have two cores, what can I do with them"? The short answer to this question is that you either:

    1. Run more programs simultaneously (multi-tasking)

    2. Parallelize your applications (multi-theading or parallel programming)

In this article we will take a look at these two methods to take advantage of the additional processing power provided by a dual-core processor. We will emphasize parallel programming models as this is perhaps most of interest to software developers interested in high-performance, scalable applications.

Multi-tasking Parallelism

Multi-tasking parallelism is a model where two or more processes are created and run simultaneously by the OS. This follows The creation of the processes is done either programmatically via fork()/exec() function calls or often times by simply running multiple instances of the same program simultaneously. The processes created in either of these methods will be scheduled on the two cores by the OS.

This model of parallelism works well when the user is performing parametric studies, performing Monte Carlo simulations, statistical studies, or where a large number of job runs are used to characterize a solution space. Typically these types of problems use tens to thousands of input files with variations on the characteristic parameters. The same executable program is run as a separate job for each file. The goal with this model is throughput, or processing as many runs in a given wall clock time as possible.

Where this model is applicable: This model of parallelism benefits from the multiple cores by treating each core as an execution unit for each of the separate tasks (jobs). However, having the extra processing core in the Intel(R) Core(TM) Duo processor benefits this model by providing the system with twice the throughput capability over a single-core system. This model should be straightforward and familiar.

Parallelization via Multi-threading

The remainder of this article focuses on multi-threaded programming techniques. These techniques enable a single program to create multiple threads of execution. Threads are scheduled equally on available executions cores providing increased application performance.

Parallelization via auto-parallelizing compilers

Auto-parallelizing compilers attempt to analyze programs and automatically generate parallel code. As far as ease of use for the programmer, there is nothing easier than taking an existing serial code and allowing an auto-parallelizing compiler to generate parallel code. There is an old saying that applies here: If something seems too good to be true, it probably is. Creating an auto-parallelizing compiler that produces efficient code in all cases for a wide variety of applications has proven to be a very difficult task. Thus, auto-parallelization remains a hot research topic and the perfect auto-parallelizing compiler is yet to be produced. However, there are some cases where auto-parallelization is perfectly suited.

Let us examine the auto-parallelization capabilities of the Intel Fortran and C++ Compilers for Mac OS. Both these compilers support auto-parallelization. In addition, these compilers have to capability to produce a parallelization report. Auto-parallelizing compilers are most effective at very small, tight, uncomplicated loop structures. There are some applications where the majority of time is spent in a tight little uncomplicated loop. Consider the code example shown below which we save in a file named "heat.f90"

      1   program heat
      2
      3   real, dimension(500,500), target :: plate
      4   real, dimension(498,498)         :: new_inside
      5   real, pointer, dimension(:,:)  :: n, e, s, w, inside
      6   real    :: diff
      7   integer :: i,j, niter
      8
      9   ! Set up initial conditions
     10   plate = 0
     11   plate(1:500,1) = 1.0  ! boundary values
     12   plate(1,1:500) = (/ ( 0.01*j, j = 500, 1, -1 ) /)
     13
     14   !  Point to parts of the plate
     15   inside => plate(2:499,2:499)
     16   n => plate(1:498,2:499)
     17   s => plate(3:500,2:499)
     18   e => plate(2:499,3:500)
     19   w => plate(2:499,1:498)
     20
     21   write(*,*) "starting iterations"
     22   do
     23     new_inside = (n + s + e + w)/4.0
     24     diff = maxval(abs(new_inside-inside))
     25     inside = new_inside
     26     if (diff < 1.0e-4) then
     27       exit  !...exits the do loop, not the program
     28     end if
     29   end do
     30   write(*,*) "done"
     31   end program heat

This example is a highly simplified 2-D heat transfer problem. In this example we simulate a 2-D plate of material with some heat applied to two edges. For simplicity we assume that this initial boundary condition is maintained. The purpose of this example is to simulate the heat transfer over simulated time from the edges of the plate into and across the plate until steady state is reached. The general method to simulate the heat transfer is to divide the plate into cells. In our example, we divide the plate into 500x500 cells represented by the array elements in plate. Focusing on a single cell, we assume that the heat energy in a cell at time T+1 is the average of the heat energy at time T in the four neighboring cells in the north, south, east and west directions. The syntax for this is:

new = ( old(north) + old(south) + old(east) + old(west) ) / 4

or

new_energy( i,j ) = ( old( i-1, j )+ old( i+1,j )+ old( I,j+1 )+ old( I,j-1) )/4.0

Which we can visualize as shown in Figure 4 below



Figure 1. Energy for cell (i,j) at T+1

This example may take some consideration if you are not familiar with modern Fortran 90 array syntax and pointers. Let's examine this code. On line 3 we declare a 500x500 element array named plate. In Fortran 90, pointers can only point to variables with the TARGET attribute as we see in this declaration. The variable plate will hold the current, or time T, energy values. On line 4 we declare a second array named new_inside that will be used to store the new values inside the plate, the time T+1 values. The inside of the plate excludes the boundary cells which are assumed to be fixed energy and will not change over simulated time.

Keeping in mind that Fortran 90 pointers can point to array sections, we set up a pointer named inside that points to the non-boundary elements of plate. These are rows 2 through 499 and columns 2 through 499. This is done on line 15 in the code example and can be visualized by the diagram below. Keep in mind that pointer variable inside is pointing to a subset of the same memory as plate.



Figure 2. Pointer "inside" of "plate"

Now, visualize taking this inside section of plate and shifting it 1 cell north. This is what the pointer variable n (north) is set to with the statement on line 16. We do a similar initialization for the variables s (south), e (east), and w (west) on lines 17, 18, and 19 respectively. This can be visualized as shown in Figure 3.



Figure 3. Pointers for north, south, east, and west

Looking at the example code, on line 22 starts a do loop that simulates the time steps for the simulation. Note that the exit criteria for the simulation is handled in the conditional on lines 26-28. The bulk of the work then is contained in the statement:

new_inside = (n + s + e + w)/4.0

This seeming simple statement does an element-by-element addition of the cells pointed to by n, e, s and w. After some study you will see that this does the equivalent double-nested loop:

do i=1,498
  do j=1,498
    new_inside(i,j) = ( n(i,j)+s(i,j)+e(i,j)+w(i,j) )/4.0
  end do
end do

The next interesting statement, line 24, takes advantage of the fact that Fortran 90 intrinsic functions abs and maxval can operate on array operands. The statement:

 diff = maxval(abs(new_inside-inside))

will find the absolute values of the differences between the new energies ( new_inside ) and the old energies ( inside ) on all cells. This result of the abs function is an array of shape ( 1:498, 1:498 ). This array then is the argument to MAXVAL instrinsic which will return the scalar value of the maximum value of these differences. This we store in the scalar variable diff which is then used to determine when the difference between the old and the new energies have reached a threshold when we can assume that the plate has reached a steady state. Again, this same functionality could be implemented with a double-nested loop. Obviously this example ignores many thermodynamic principles such as realistic time steps and thermal conductivity constants. Still, it is representative of many iterative approximation methods.

To test the efficiency of the auto-parallelization capabilities of the Intel Fortran Compiler for Mac OS, we first need to do a baseline run. For this example we use -O2 which you will recall is the default optimization level when -O is not specified.

ifort -o heat -O2  heat.f90
$ time ./heat
 starting iterations
 done
real    0m17.614s
user    0m17.555s
sys     0m0.025s

To auto-parallelize an application with the Intel Fortran and C++ compilers for Mac OS, we use the -parallel compiler option. This causes the compiler to look for parallelization opportunites on an entire source file.

$ ifort -o heat -O2 -parallel heat.f90
heat.f90(23) : (col. 2) remark: LOOP WAS AUTO-PARALLELIZED.
heat.f90(24) : (col. 9) remark: LOOP WAS AUTO-PARALLELIZED.

Note the remarks for lines 23 and 24. These indicate that the Fortran 90 array syntax was recognized and parallelized. We now run this parallelized version of the code.

$ time ./heat
 starting iterations
 done
real    0m11.422s
user    0m22.284s
sys     0m0.193s

The wall clock time is shown on the line labeled "real". Keep in mind that the "user" time reflects the time spent on BOTH cores added together. Before parallelization the real time was 17.6 seconds. After parallelization this was reduced to 11.4 seconds. When analyzing the improvement, a value called Speedup is used. Speedup is calculated as:

Speedup = ( Time sequential ) / (Time parallel)

Which for our example gives: Speedup = 17.6 / 11.4 or 1.54. This is a 54% improvement. So is this acceptable? Keep in mind we used a crude wall clock measure to gauge our performance. This included application startup and teardown overhead. Also, when examples run for short periods of time such as this example there is not enough work in the parallel regions to amortize the overhead costs of thread creation, dispatch, join and teardown. We also have to look at the return on investment - how much effort did this take? We got a 54% speedup out of a simple compilation and link with zero changes to the source code. Quite a good return on investment by any measure.

From this example, the primary limitation of auto-parallelization may have become apparent. Auto-parallelization is applied to an entire source file. There is no method to control which loops or array syntax equivalents are parallelized. In some cases the loop may be parallelized by may not have enough work to amortize the cost of the parallelization overhead. Thus, it is not uncommon to find cases where an auto-parallelized application actually slows down relative to it's sequential counterpart. To help determine where this is occurring, the Intel Fortran and C++ Compilers provide a parallelization report capability. This functionality is enabled with the -par-report compiler option:

    Syntax: -par-report[n]

    Arguments: n is the level of diagnostic messages to display.

Possible values are:

    0. No diagnostic messages are displayed

    1. Diagnostic messages are displayed indicating loops successfully auto-parallelized

    2. Diagnostic messages are displayed indicating loops successfully and unsuccessfully auto-parallelized

    3. The same diagnostic messages are displayed as specified by 2 plus additional information about any proven or assumed dependencies inhibiting auto-parallelization (reasons for not parallelizing).

Where this model is applicable: Auto-parallelization is most effective for codes with very simple, tight looping structures that are easily identifiable by the compiler. Keep in mind that a profile of your code should identify that the majority of run time should be in these tight loops. Auto-parallelization is performed at the source file scope. Thus, it is most effective for modular code. Auto-parallelization should be applied selectively since in some cases it has been shown to negatively affect runtime performance.

Parallelization via parallel libraries

There is a class of applications that spend the majority of runtime in common function calls. "Common" in this context refers to a function designed to solve widespread or well-known mathematical operations or data processing tasks. Examples include matrix operations and manipulations, fast Fourier transforms, image processing, video and audio encoding, encryption, signal processing and compression/decompression. And I cannot repeat the following advice often enough: profile your application and know where it is spending time. Then examine these program hot spots and look to see if this hot spot is performing something very common. I am constantly amazed at how many applications contain hand-coded matrix solvers like matrix-matrix multiplication, eigenvalue calculation, matrix inversion, etc. Also commonly found in user applications are user-written 2-D or 3-D fast Fourier transforms. Performance of user written functions varies wildly. However, in almost all cases the performance of user written functions is substandard when compared to professionally written and tuned libraries from vendors.

It is also common for applications to rely on freely distributed source libraries such as BLAS, LAPACK, Atlas, and FFTW, to name a few. These library implementations are available for download in source form and are then compiled by the user and linked into an application. While these libraries often outperform user written code, they are typically not tuned for a particular target platform.

In last month's article we introduced the Intel(R) Math Kernel Library for Mac OS (Intel(R) MKL). We mentioned that Intel MKL provides functions for linear algebra ( BLAS, LAPACK, sparse solvers), fast-fourier transforms (FFTs), vector math libraries, and vector statistical functions. In addition to being highly vectorized, many of the Intel MKL routines libraries are internally threaded. Thus, a sequential application can call an Intel MKL library routine and that routine will automatically spawn threads and split the work across the available cores.

Where this model is applicable: This model applies to codes that spend the majority of their runtime in common library routines. In this model the programmer does not need to worry about manually parallelizing their application, they simply rely on the threaded parallel library function to leverage the available cores. Again, this model works best when the bulk of the application runtime is consumed in common library routines.

Parallelization via openmp

OpenMP is an application program interface (API) that provides a means for specifying parallelism in C++ or Fortran. OpenMP is a multi-platform, vendor-independent specification for a set of compiler directives, library routines, and environment variables used to specify shared memory parallelism. It is a widely, readily available, flexible and relatively easy to learn. OpenMP is supported by an independent organization that can be found at http://www.openmp.org. In addition, there are several books and many online tutorials and examples that can be used to learn OpenMP programming. A full description of OpenMP is far beyond the scope of this article. Instead we examine an example OpenMP application, describe how to build and run the application, and examine performance on a dual-core system.

OpenMP is not a computer language. Rather, it works with an existing OpenMP capable Fortran or C++ compiler. The Intel Fortran and C++ Compilers for Mac OS support the OpenMP version 2.5 API specification. OpenMP allows a programmer to specify parallelism at the loop level. This is done by adding pragmas (C/C++) or compiler directives (Fortran) to existing source code. The advantage of this design is that non-OpenMP compilers simply ignore the pragmas or compiler directives. Thus only one source needs to be maintained for both OpenMP and sequential targets.

Let us begin our exploration of OpenMP by examining a serial application:

      1 #include <stdio.h>
      2 int main() {
      3  int n, i;
      4  double sum, pi, x, h;
      5
      6  n = 1000000000;
      7  h = 1.0 / n;
      8  sum = 0.0;
      9  printf("starting \n");
     10  for ( i=1 ; i<n ; i++ ){
     11    x = h * ( i-0.5);
     12    sum = sum + (4.0/(1.0+x*x));
     13  }
     14  pi = h*sum;
     15  printf("pi is %f \n",pi);
     16 }

Without going into a lot of detail, this program calculates an estimate for the value PI by integrating the function f(X) = 4.0 / ( 1 + X^2 ) between the values 0 <= X < 1.0. This uses trapezoidal rule for the approximation of the integral.

Again, we generate a baseline run at -O2, this time focusing on the wall clock or "real" time:

icc -o pi -O2 pi.c
time ./pi
starting
pi is 3.141593
real    0m15.669s

Now we examine an OpenMP version of this same code:

      1 #include <stdio.h>
      2
      3 int main() {
      4  int n, i;
      5  double sum, pi, x, h ;
      6  n = 1000000000;
      7  h = 1.0 / n;
      8  sum = 0.0;
      9  printf("starting \n");
     10 #pragma omp parallel for private(i,x) shared(h) reduction(+:sum)
     11  for ( i=1 ; i<n ; i++ ){
     12    x = h * ( i-0.5);
     13    sum = sum + (4.0/(1.0+x*x));
     14  }
     15  pi = h*sum;
     16  printf("pi is %f \n",pi);
     17 }

The only difference in this version of the program from the serial version of the program is the OpenMP pragma at line 10. This pragma tells the compiler that the C for loop that follows should be parallelized. This defines what is known as a parallel region. The iterations of this for loop are split equally amongst the available threads. For a dual-core processor, two threads are created by default. The first thread will process loop iterations n=1 through n/2 -1. The second thread will process loop iterations n/2 through n.

The private, shared, and reduction clauses control sharing of variables within the loop. The private clause specifies variables that are owned created and owned by each thread and are not shared. The variable h holds a value that is shared by the two threads. Finally, the reduction clause specifies that each thread will get a private copy of the variable sum. On exit from the parallel region the separate copies are combined with the "+" operation.

To compile an OpenMP program using the Intel C/C++ or Fortran Compilers for Mac OS the -openmp compiler option is used:

icc -o pi_omp -O2 -openmp pi_omp.c
pi_omp.c(10) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.

By default, when we run this program we will create one thread for each core in the processor. For an Intel(R) Core(TM) Duo processor, this will be 2 threads. For an Intel(R) Core(TM) Solo processor, this will be 1 thread. Thus, on an Intel(R) Core(TM) Duo processor based system we get the following timing:

$ time ./pi_omp
starting
pi is 3.141593
real    0m8.421s

Calculating the speedup ( 15.669sec serial vs. 8.421sec parallel) we get:

    Speedup = 15.669 / 8.421 = 1.860 or approximately 86% speedup

OpenMP also allows a programmer to run an executable and force a non-default number of threads. This is done by setting the OMP_NUM_THREADS environment variable. For example, we can run this same executable on one core by setting OMP_NUM_THREADS=1 and simply rerunning the program (no recompilation required):

export OMP_NUM_THREADS=1
time ./pi_omp
starting
pi is 3.141593
real    0m16.651s

Collecting our results for a two-core system we see:

  Pi serial:                     15.669 (seconds)
  OpenMP PI run with 1 thread:    16.651 (seconds)
  OpenMP PI run with 2 threads:.   8.421 (seconds)

Note that the 1-thread OpenMP case has about a 1 second overhead compared to the non-OpenMP version of the same code. This is overhead associated with initial process startup and parallel region overhead even when we do not use more than one thread. Keep in mind that this 1 second overhead would be considered "noise" in a real-world, longer-running application.

Where this model is applicable: OpenMP provides parallelization of looping constructs (for, DO, etc). This model is applicable to user codes where the majority of runtime is consumed inside loops. Unlike auto-parallelization, these looping structures can be large and/or complex. This makes OpenMP applicable to a large number of applications. Because of this, OpenMP is considered by many to be the method of choice for portable, flexible and easy parallel programming.

Parallelization via explicit threading

POSIX(R) Threads programming, or more commonly known as pThreads programming, uses explicit API calls to create and manage user threads. This is the most low-level parallel programming method presented in this article. To understand this model, we take the PI example above and present a pThread implementation:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <pthread.h>
#define THREADS 2
#define N 1000000000
typedef struct {
        int id;
} parm;
double sum[THREADS], h;
void *Calc_Partials(void *arg) //thread function to calculate partial sum
{
  int my_thrd;           //this thread's number ( 0 through NUM_THREADS-1 )
  int my_iters;          //number of iterations for each thread
  int my_start, my_end;  //iteration start and end for each thread
  int i;
  double  x, local_sum;
  parm *p=(parm *)arg;
  my_iters = N / THREADS;
  my_thrd = p->id;
  my_start = (my_iters * my_thrd) + 1;
  my_end = my_iters * (my_thrd+1);
  for ( i=my_start ; i < my_end ; i++ ){
   x = h * ( i-0.5);
   local_sum = local_sum + (4.0/(1.0+x*x));
  }
  sum[my_thrd] = local_sum;  //store local partial to global partial
  return ( NULL );
}
int main(int argc, char* argv[]) {
        int n,i;
        pthread_t *threads;
        pthread_attr_t pthread_custom_attr;
        parm *p;
        h = 1.0 / N;
        threads=(pthread_t *)malloc(THREADS*sizeof(*threads));
        pthread_attr_init(&pthread_custom_attr);
        p=(parm *)malloc(sizeof(parm)*THREADS);
        /* Start up thread */
        for (i=0; i<THREADS; i++) {
                p[i].id=i;
                pthread_create(
                   &threads[i],
                   &pthread_custom_attr,
                   Calc_Partials,
                   (void *)(p+i));
        }
        /* Synchronize the completion of each thread. */
        for (i=0; i<THREADS; i++) pthread_join(threads[i],NULL);
        printf( "pi is %f\n", (sum[0]+sum[1])*h);
        free(p);
}  

Running this example we will use 2 threads. These will each get scheduled on each of the execution cores of the Intel(R) Core(TM) Duo processor.

time ./pi_pthreads
pi is 3.141593
real    0m8.123s

which is very similar to the OpenMP time of 8.421 seconds for the same problem.

There is obviously a lot of detail that could be discussed in this example. One immediate observation is that the pThreads version of the PI program is considerably more complex than the OpenMP version. In pThreads programming, the threads are dispatched via the pthread_create() function call. This call creates a thread and dispatches it to a user-written function, which in this example is named Calc_Partials(). You will recall that OpenMP is a method to parallelize loops. Thus pThreads programming provides a more coarse level of parallelism than OpenMP.

Where this model is applicable: This model is most useful for applications that do not necessarily have loop structures to parallelize. Certain applications do not have clearly defined hot spots or hot loops. Rather, these applications may have a deep and complex call tree, or speed a lot of time in IO, or have a need for background processing. Or perhaps there is a need for non-symmetric parallelism. In cases like this, one thread may run function A while the second thread performs something completely different in function B. In these ways pThread programming provides the greatest flexibility of all the methods presented at the cost of increased program complexity.

Summary

The Intel(R) Core(TM) Duo processor provides parallel processing capability at a very affordable cost. This is spawning a demand for applications that can utilize the extra processing capability. Developers are responding by taking at look at their applications and determining how best to parallelize their code. This article presented an introduction to parallelization methods. There is no one model that is universally applicable to all applications. It is hoped this overview will allow you to analyze your application and determine which method will work best for your application.

Further Reading

Examples shown in this article are available for download. Please visit the MacTech source code ftp and select this month's issue, 22.09 :

http://www.mactech.com/editorial/filearchives.html

There are many outstanding books in parallel programming techniques, OpenMP, and POSIX(R) threads programming. The following list provides a starting point but is by no means a conclusive list.

A good starting point for understanding the Intel(R) Core(TM) Duo processor and parallel programming models is Multi-Core Programming, Shameem Akter and Jason Roberts, Intel Press, ISBN 0-9764832-4-6.

OpenMP tutorials and examples are readily available online at www.openmp.org. In addition, this website has lists of several very thorough books on OpenMP.

POSIX(R) Threads programming information can be found in several books. A good starting point is Programming with POSIX(R) Threads, David R. Butenhof, Addison-Wesley Professional, ISBN 0201633922.


Ron Wayne Green is a member of the Intel Compiler team. He has been involved in Fortran and high-performance computing applications development and support for over twenty years and contributes his expertise to technical computing issues.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Adobe Premiere Pro CC 2015 9.0.1 - Digit...
Premiere Pro CC 2015 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Premiere Pro customer). Premiere Pro CS6 is still available for... Read more
Adobe After Effects CC 2015 13.5.1 - Cre...
After Effects CC 2015 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous After Effects customer). After Effects CS6 is still available... Read more
Adobe Creative Cloud 2.2.0.129 - Access...
Adobe Creative Cloud costs $49.99/month (or less if you're a previous Creative Suite customer). Creative Suite 6 is still available for purchase (without a monthly plan) if you prefer. Introducing... Read more
Tower 2.2.3 - Version control with Git m...
Tower is a powerful Git client for OS X that makes using Git easy and more efficient. Users benefit from its elegant and comprehensive interface and a feature set that lets them enjoy the full power... Read more
Apple Java 2015-001 - For OS X 10.7, 10....
Apple Java for OS X 2015-001 installs the legacy Java 6 runtime for OS X 10.11 El Capitan, OS X 10.10 Yosemite, OS X 10.9 Mavericks, OS X 10.8 Mountain Lion, and OS X 10.7 Lion. This package is... Read more
Adobe Muse CC 2015 2015.0.1 - Design and...
Muse CC 2015 is available as part of Adobe Creative Cloud for as little as $14.99/month (or $9.99/month if you're a previous Muse customer). Muse CS6 is still available for purchase (without a... Read more
Adobe Illustrator CC 2015 19.1.0 - Profe...
Illustrator CC 2015 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous Illustrator customer). Illustrator CS6 is still available for... Read more
Corel Painter 14.1.0.1105 - Digital art...
Corel Painter helps you create astonishing art in a variety of media. Paint with vivid oil paints, fluid water colors, and earthy charcoals. Corel Painter flawlessly recreates the tones and textures... Read more
Pacifist 3.5.4 - Install individual file...
Pacifist opens up .pkg installer packages, .dmg disk images, .zip, .tar. tar.gz, .tar.bz2, .pax, and .xar archives and more, and lets you extract or install individual files out of them. This is... Read more
Dropbox 3.8.4 - Cloud backup and synchro...
Dropbox is an application that creates a special Finder folder that automatically syncs online and between your computers. It allows you to both backup files and keep them up-to-date between systems... Read more

Mazes of Karradash (Games)
Mazes of Karradash 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: The city of Karradash is under attack: the monsters of the Shadow Realms are emerging from the depths.No adventurer is... | Read more »
Battle Golf is the Newest Game from the...
Wrassling was a pretty weird - and equally great - little wressling game. Now the developers, Folmer Kelly and Colin Lane, have turned their attention to a different sport: golfing. This is gonna be weird. [Read more] | Read more »
Qbert Rebooted has the App Store Going...
The weird little orange... whatever... is back, mostly thanks to that movie which shall remain nameless (you know the one). But anyway it's been "rebooted" and now you can play the fancy-looking Qbert Rebooted on iOS devices. [Read more] | Read more »
Giant Monsters Run Amok in The Sandbox...
So The Sandbox has just hit version number 1.99987 (seriously), and it's added a lot more stuff. Just like every other update, really. [Read more] | Read more »
Fish Pond Park (Games)
Fish Pond Park 1.0.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0.0 (iTunes) Description: Nurture an idyllic slice of tourist's heaven into the top nature spot of the nation, furnishing it with a variety of... | Read more »
Look after Baby Buddy on your Apple Watc...
Parigami Gold is the new premium version of the match three puzzler that includes Apple Watch support and all new content. You won't simply be sliding tiles around on your wrist, the Apple Watch companion app is an all new mini-game in itself. You'... | Read more »
Swallow all of your opponents as the big...
Eat all of the opposition and become the largest ball in Battle of Balls now available in the App Store and Google Play. Battle of Balls pits you against other opponents in real time and challenges you to eat more balls and grow larger than all of... | Read more »
PAC-MAN Championship Edition DX (Games)
PAC-MAN Championship Edition DX 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: It’s Your World. EAT IT! Get ready for more ghost chain gobbling and frantic action in PAC-MAN® CE-DX! The... | Read more »
incurve (Games)
incurve 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Get ready for 2 different gravities Goal is to hit as many white dots on your way up.When you're touching the screen, the dots have a... | Read more »
Crossy Road has its Own Merch Store Now....
Do you like Crossy Road? I mean do you really like Crossy Road? Well then you're in luck! Hipster Whale has opened up a Crossy Road store, so you can show off your fandom via official T-shirts. [Read more] | Read more »

Price Scanner via MacPrices.net

Sale! 13″ 1.6GHz/256GB MacBook Air for $1099,...
B&H Photo has the 13″ 1.6GHz/256GB MacBook Air on sale for $1099 including free shipping plus NY tax only. Their price is $100 off MSRP, and it’s the lowest price available for this model. Read more
iPad mini 4 To Be Upgraded To iPad Air 2 Spec...
There’s a certain inevitability about making Apple product predictions this time of year. Come September, we can pretty reliably count on the release of refreshed iPhones, along with the iOS 9... Read more
Apple restocks refurbished Mac minis for up t...
The Apple Store has restocked Apple Certified Refurbished 2014 Mac minis, with models available starting at $419. Apple’s one-year warranty is included with each mini, and shipping is free: - 1.4GHz... Read more
13-inch 2.5GHz MacBook Pro on sale for $899,...
Best Buy has the 13″ 2.5GHz MacBook Pro available for $899.99 on their online store. Choose free shipping or free instant local store pickup (if available). Their price is $200 off MSRP. Price is... Read more
21-inch 2.9GHz iMac on sale for $1299, save $...
Best Buy has the 21″ 2.9GHz iMac on sale today for $1299.99 on their online store. Choose free shipping or free local store pickup (if available). Their price is $200 off MSRP, and it’s the lowest... Read more
Free Image Sizer 1.3 for iOS Offers Photo Edi...
Xi’An, China based G-Power has announced the release of Image Sizer 1.3 for the iPhone, iPad, and iPod touch, an important update to their free photo editing app. Image Sizer’s collection of easy to... Read more
Sale! 13″ 1.6GHz/128GB MacBook Air for $899,...
B&H Photo has the 13″ 1.6GHz/128GB MacBook Air on sale for $899 including free shipping plus NY tax only. Their price is $100 off MSRP, and it’s the lowest price available for this model. Read more
13-inch Retina MacBook Pros on sale for $100...
Best Buy has 13-inch Retina MacBook Pros on sale for $100 off MSRP on their online store. Choose free shipping or free local store pickup (if available). Prices are for online orders only, in-store... Read more
Will BMW’s i3 Electric Vehicle Be The Automo...
The German-language business journal Manager Magazin’s Michael Freitag reports that Apple and the German performance/luxury automaker Bayerishe Motoren Werke (BMW) are back at far-reaching... Read more
Sale! $250 off 15-inch Retina MacBook Pro, $2...
B&H Photo has lowered their price for the 15″ 2.2GHz Retina MacBook Pro to $1749, or $250 off MSRP. Shipping is free, and B&H charges NY sales tax only. They have the 27″ 3.3GHz 5K iMac on... Read more

Jobs Board

*Apple* Customer Experience (ACE) Leader - A...
…management to deliver on business objectives Training partner store staff on Apple products, services, and merchandising guidelines Coaching partner store staff on Read more
Project Manager - *Apple* Pay Security - Ap...
**Job Summary** The Apple Pay Security team is seeking a highly organized, results-driven Project Manager to drive the development of Apple Pay Security. If you are Read more
*Apple* TV Product Design Internship (Spring...
…the mechanical design effort associated with creating world-class products with the Apple TV PD Group. Responsibilities will include working closely with manufacturing, Read more
*Apple* Watch SW Application Project Manager...
**Job Summary** The Apple Watch software team is looking for an Application Engineering Project Manager to work on new projects for Apple . The successful candidate Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.