Test Bench and Types

When we were choosing computer models, we chose the current model of Macs that give a good representation of what most people may have. Certainly, the faster models of these computers will perform even better.

We chose five Mac models to compare alongside each other: MacBook Air, MacBook Pro, a tricked out MacBook Pro Retina, iMac (late 2011), and Mac Pro. Given the RAM requirements of Windows 7, the minimum configuration tested (including on the MacBook) was 4 GB.

8 GB MacBook Air 13-inch, 1.8 GHz dual-core Intel Core i5 processor
Specifically: 8 GB/256 GB, Intel HD Graphics 4000

4 GB MacBook Pro 15.4-inch, 2.3GHz quad-core Intel Core i7 processor
Specifically: 4GB 1600MHz memory/500 GB 5400-rpm
Intel HD Graphics 4000 and NVIDIA GeForce GT 650M with 512MB of GDDR5 memory

16 GB MacBook Pro with Retina 15-inch, 2.7GHz quad-core Intel Core i7 processor
Specifically: 16GB 1600MHz DDR3L SDRAM/768GB Flash Storage
Intel HD Graphics 4000 and NVIDIA GeForce GT 650M with 1GB of GDDR5 memory

4 GB iMac 27″, 2.7GHz Quad-Core Intel Core i5 processor
Specifically: 4 GB/1 TB 7200-rpm
AMD Radeon HD 6770M with 512MB

6 GB Mac Pro, One 3.2GHz Quad-Core Intel Xeon processor processors
Specifically: 6 GB/1 TB 7200-rpm / ATI Radeon HD 5770 with 1GB GDDR5

Memory for virtual machines can be configured with a wide array of settings. As a rule, both VMware Fusion’s and Parallels Desktop’s default memory for each configuration (of physical RAM and “guest” OS) were the same, and we made sure that was the case. Windows 7 and 8 virtual machines ran with 1 GB of virtual machine RAM (except for gaming). Lion and Mountain Lion (OS X) ran with 2 GB of virtual machine RAM. For gaming, we used 1.5 GB for 4 GB hardware, and 2 GB on hardware with 6 GB or more.

Similarly, for disk allocation we set up all of the virtual machines similarly. We used default sized (64 GB on Parallels, and 60 GB on VMware) expanding disks in a single file (e.g., not 2 GB chunks), but the disks were pre-expanded so that expanding actions wouldn’t affect results. The disk location of the virtual hard drive was in a similar physical location between computers, as that can make a significant difference in disk performance.

The tests compared VMware Fusion 5.0.2 with Parallels Desktop for Mac 8.0.18314.813278, running on Mac OS X 10.8.2 with all updates. All “important,” but not “optional,” Windows updates also installed for Windows 7 and 8. By the time we got to testing games, we saw new versions released that could have impact, and therefore (games only) were tested on VMware Fusion 5.0.2 with Parallels Desktop for Mac 8.0.18354.823166, including additional updates to OS X 10.8.2.

Test Types

There are a variety of often referred to, and utilized benchmarking suites in the computer industry including SPEC tests, PCMark, WorldBench, Performance Test, Unixbench and others. Each of these tests uses a series of tests that measure in a consistent way to assess specific types of performances. The key to each of these tests is appropriateness, repeatability, and accuracy.

We are sometimes asked how we select the tests we have in benchmarks. The goal is to represent the types of actions that virtualization users are doing on a regular basis. In particular, we focus more on user tasks rather than installation or configuration (because they are typically only done once or infrequently).

In the PC market, PC World (magazine) uses WorldBench as their benchmarking tool of choice. If you don’t know about WorldBench, you can see more at http://www.worldbench.com. WorldBench 7 uses a combination of applications bundled together to determine benchmarks. This may be a very good way for PC World to approach things, but obviously it’s not reflecting the virtualization part of the experience or weighing towards the typical types of things people do in virtualization on the Mac.

There are a variety of other benchmarks available, and we’ve looked at many of them as possible tools to include in our mix. Often, we find their measurements simply don’t reflect a true user experience. Other times, they don’t reflect how a virtualization user may be using Windows, or even they simply report erroneous results. For example, in previous attempts to use PassMark, we found their graphics scores were not at all representative of real life graphics performance in games or other graphic uses. Those tests showed items as faster when, in fact, they were much slower. Benchmarks are only useful if they reflect the real world.

Rather than use WorldBench or others, we focus on the types of measurements we believe best represent the experience we see (in terms of speed). And while it takes practice and some skill, we test virtualization operations with a stopwatch—as the closest representation of how a real user would see it. CPU crunching ability, for example, is measured through tests like zip’ing files directly in Windows. For Office users, we use the most up to date version of Microsoft Office for Windows (2010), again with updates.

There are two exceptions to this: graphics/gaming and CPU utilization. In these two cases, we found that testing utilities not only work well, but also are necessary to give the most repeatable and concrete results. In the case of graphics benchmarking, we used 3DMark06 (the industry standard for tuning gaming systems). For game playing performance, we used FRAPS to measure frames per second. Finally, for CPU utilization we used “top” combined with Apple Remote Desktop.

Remember, benchmarks are not a review: they are benchmarks. They are simply meant to just tell you which runs faster. If you are interested in a specific feature, support, the user interface, or any of the other criteria for deciding on a product, that’s a different article.