TweetFollow Us on Twitter

Performance Sampling

Volume Number: 20 (2004)
Issue Number: 5
Column Tag: Programming

Performance Sampling

by John A. Vink

Making code faster through introspection

Do It

Profiling your code is essential. You can't speed up your code if you don't know what is taking so long. You might think you know where the slowdown is, but most likely you'd be surprised. Engineers from the Safari team recount that they had a perfect record of incorrectly predicting what was slowing down their code. Only after doing some profiling, they discovered the real bottlenecks.

The process of profiling is:

    1. profile your code

    2. find the parts of the profile that belong to you and take significant amounts of time

    3. optimize

    4. lather, rinse, repeat

Here I am going to discuss the first two steps of profiling. You should already be familiar with "repeat". The second and following times through this loop you also need to see if the changes you made really did make things faster.

What are you talking about?

Sampling can be done from a command line tool or from a GUI application.

First, let's talk about what sampling actually does.

Sampling is finding out what your application is doing at any given time. About every 10 ms your application is asked, "What are you doing now? How about now? And now?" Your application responds by giving a stack trace each time. These are called samples. When the sampling period has completed, the results are summarized into a call graph.

Actually, that's just a simple way to conceptualize it. What's really happening is that the sampling application suspends the sampled application at periodic times. While the sampled app is suspended, the sampling app walks the stack for each of the sampled process' threads to ascertain the stack trace.

As an aside, sampling is very useful when your application appears hung. You can sample your hung application to see exactly where it is hung, giving you clues about how to fix it.

So let's imagine you sampled for 5 seconds, which would mean 500 samples when sampling every 10 ms. When sampling the main thread, the main() function is going to appear near the top of the stack since it's part of that thread's entry point. So it'll show up 500 times. Let's say you only have 2 functions in main - KindaQuick() and KindaLong(). KindaQuick() might show up 100 times, and KindaLong() 400 times. So your sample log will show main at 500 samples, and inside that, it will show KindaLong() at 400 samples and KindaQuick() at 100 samples. It would look something like this:

    500 main
      400 KindaLong
      100 KindaQuick

Some things to note about samples is that if you have a function that can complete between two samples and you call it just once, then it might not show up in your sample log. Because it started after sample n, and completed before sample n + 1, no samples will show this function. But if you call that function a bunch of times, then chances are it will show up in your sample. This shouldn't be of much concern since if your function runs so quickly to be invisible to samples, there probably isn't much opportunity for optimization.

If your thread is sleeping, it is still being sampled. Sampling doesn't concern itself with actual CPU time used. It will look like a function is being really inefficient because it shows up in so many samples, but that's because the thread is just sitting around waiting for a reason to wake up. Sleeping threads are a good thing since they don't take up any CPU time. Your application would be really efficient if all its threads were always sleeping, although your application wouldn't do much.

Sampling doesn't tell you when a function appeared in a stack trace - only how often. The only "when" information you can learn is which function called the function you're interested in at a particular point in the sample. You also can't tell how many times a function was called, only the number of times that the function appeared in a stack trace. However, you can learn much of this from gprof, described later.

Sampler

Sampler is the GUI sampling application that lives in /Developer/Applications. You can attach to a running application, or specify an application you want launched and sampled.

Give it a whirl. Run Sampler. Pick Attach... from the File menu. You'll get a list of applications that Sampler is able to attach to. Typically these are applications that are running as the same uid as you. If you need to sample something that is running as another user, you can try running Sampler as root or that other user.

Pick an application and hit OK. You'll get a sampling window which lets you choose the sampling interval. Actual sampling doesn't start until you hit the Start Sampling button. Hit the start button, then play around in the application for a few seconds. Then come back to Sampler and hit the stop button. After a few seconds of processing, it displays the result of your sampling. Look at Figure 1 for an example.


Figure 1. Main Sampler window.

As you click on function names in the left column view, the next column to the right will populate showing all the functions called by the function you just clicked along with the number of samples for each. The right scroller will show you the stack trace up to that function, and the highest sampling functions after that. You can see in this figure that we've drilled down to __CFRunLoopDoSources(). You can see exactly where its parent, __CFRunLoopRun, spent all of its 534 samples. 202 samples were in mach_msg, which, if that path were followed, would reveal that the thread was sleeping. All of the time spent in __CFRunLoopDoSources() was spent in _sendCallbacks. The remaining 104 samples from __CFRunLoopRun were shared among __CFRunLoopDoObservers, __CFRunLoopDoTimers, __CFRunLoopDoSource1, and __CFRunLoopRun.

If you were tracking performance problems, you want to investigate the functions that are taking the most time, ignoring the samples that are sleeping. Keep drilling down until you see something that surprises you. 534 samples in __CFRunLoopRun is not surprising, and neither is 228 samples in __CFRunLoopDoSources, but perhaps 97 samples in WebIconLoader might be, so if that's the case, that's what you want to check out.

sample

sample is the command line tool that allows you to sample a process. This can be useful if you're remotely connected to the machine.

To sample a process, you invoke sample with the PID of the process you're interested in, and the number of seconds to sample for. You can optionally provide the duration between samples. So, first get the PID of the process you're interested in:

[vinkjo:~] jav% ps -aux | grep MyApp
jav    452   0.0  2.2    99616  22624  ??  S    Sun03PM   2:55.17 MyApp
jav   1696   0.0  0.0     1416    308 std  S+    5:49PM   0:00.00 grep MyApp

So now you know the PID you are interested in is 452. Now run the sample command:

[vinkjo:~] jav% sample 452 5
Sampling process 452 each 10 msecs 500 times
Sample analysis of process 452 written to file /tmp/MyApp_452.sample.txt

Opening the resulting sample file will reveal that it looks something like this:

Analysis of sampling pid 452 every 10 milliseconds
Call graph:
    500 main
      400 KindaLong
        400  BlockMoveData [STACK TOP]
      100 KindaQuick
        100  memcpy [STACK TOP]

Sort by top of stack, same collapsed (when >= 5):
        BlockMoveData [STACK TOP]        400
        memcpy [STACK TOP]        100

In this hypothetical example we see that KindaLong took 4 times longer than KindaQuick. Perhaps this surprises us since both functions copy the same amount of data. If that's true, we can see that memcpy is much faster than BlockMoveData for the type and size of data we're giving it.

The sample shows [STACK TOP] to show when a sample shows that particular function at the top of the stack. This means, at the time the sample was taken, the code in that function was executing - not code in any other function that might be called from it.

You can open the result of the sample command line tool in the Sampler 2.0 GUI application. You can select the sample file from the Open... dialog in Sampler, or open it from the command line like this:

[vinkjo:~] jav% open -a Sampler /tmp/MyApp_452.sample.txt

gprof

Sampler and sample watch your code while it's running. For gprof, you run your code with profiling compiled and linked in, and when you're done, you use gprof to analyze the results. This allows you to profile command line tools and quickly running applications.

Using gprof requires you to rebuild your code. Because you need to have your code recompiled to take advantage of gprof, it might not be suitable when you're using a lot of third party frameworks whose code you can't recompile. Make a new build style and set the OTHER_CFLAGS and OTHER_LDFLAGS as shown in Figure 2.


Figure 2. Setting compiler options in Project Builder

When your program completes, a file named gmon.out will be created in the current working folder from where you launched the application. This can be confusing, since if you launched it from the Finder, the gmon.out file will appear at /.

After you get your gmon.out file, you need to process it with gprof into something readable. To do that, run gprof something like this:

> gprof /BuildResults/MyApp.app/Contents/MacOS/MyApp gmon.out > gprof.out

This will give you a report in the file gprof.out. There are two main sections to this report - the Call Graph and the Flat Profile.

The Flat Profile looks something like this:

granularity: each sample hit covers 4 byte(s) for 1.56% of 0.64 seconds
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 12.5       0.08     0.08                             _objc_msgSend [1]
  4.7       0.11     0.03                             _DoLigatureXSubtable [2]
  3.1       0.13     0.02                             _CFHash [3]
  3.1       0.15     0.02                             __class_lookupMethodAndLoadCache [4]
  3.1       0.17     0.02                             _objc_getNilObjectMsgHandler [5]
  3.1       0.19     0.02                             _pthread_getspecific [6]
  1.6       0.20     0.01                             +[NSDictionary 
                                                         dictionaryWithObjectsAndKeys:] [7]
  1.6       0.21     0.01                             -[NSLayoutManager 
                                                         defaultLineHeightForFont:] [8]
  1.6       0.24     0.01                             -[NSString isEqual:] [11]
  1.6       0.25     0.01                             -[NSUnarchiver 
                                                         decodeValuesOfObjCTypes:] [12]
  1.6       0.27     0.01                             _CFAllocatorDeallocate [14]
  1.6       0.28     0.01                             _CFDictionaryGetValue [15]
  1.6       0.29     0.01                             _CFRelease [16]
  1.6       0.30     0.01                             _CFRetain [17]
.
.
.
  0.0       0.64     0.00       20     0.00     0.00  __ZN13BaseConverter15GenericSetValueEtPc 
                                                         [18043]
  0.0       0.64     0.00       10     0.00     0.00  -[ConverterView textFieldType:] [52]
  0.0       0.64     0.00        5     0.00     0.00  -[ConverterView textDidChange:] [53]
  0.0       0.64     0.00        5     0.00     0.00  -[ConverterView 
                                                         updateFieldsWithNewNumbers:] [54]

This shows the amount of time spent in each function, sorted in decreasing order by the number of seconds actually spent in each function (as opposed to time spent in it and the functions that it calls). Then it is sorted by the number of calls (this is only available for sources compiled with the -pg flag. So, your sources, not the frameworks), and then alphabetically by name.

The % time is the percentage of total execution time that your program spent in this function. The cumulative seconds is the amount of time that was spent running this function plus any function that it calls. If the number of calls for a function are available, you can discover the number of milliseconds spent in just this function per call (self ms/call), and the number of milliseconds spent in this function plus any functions it calls per call (total ms/call).

Here I can see that the C++ function BaseConverter::GenericSetValue() gets called 20 times. If this is more than I expect, then I should look into why it's being called so many times. You can see that the flat profile can tell you how many times a particular function was called, which is not easy to do with the output from sample, and you can also see the amount of time spent in an individual function compared to how long was spent in the functions that that function called.

It's important to note when a function appears to take a long time to execute because the function itself is slow or because it is called a large number of times. In the above example, _objc_msgSend comes out as the biggest "time sink", which may lead you to believe that it is the performance issue. When in fact, it probably isn't. The performance issue, if any, is likely to be that some code gets executed too much that happens to call _objc_msgSend a lot, and instead of focussing on speeding up the leaf routine, one should find out why the leaf routine is called so much. In your sources that you compile with the -pg flag, this will be more obvious since you get the call count, but keep this in mind for functions that you don't get the call count.

The other part of the gprof report is the Call Graph, which looks something like this:

granularity: each sample hit covers 4 byte(s) for 1.56% of 0.64 seconds
                                  called/total       parents 
index  %time    self descendents  called+self    name           index
                                  called/total       children
                0.00        0.00       5/10          -[ConverterView textDidChange:] [53]
                0.00        0.00       5/10          -[ConverterView 
                                                       updateFieldsWithNewNumbers:] [54]
[52]     0.0    0.00        0.00      10         -[ConverterView textFieldType:] [52]
-----------------------------------------------
                0.00        0.00       5/5           __nsNotificationCenterCallBack [85241]
[53]     0.0    0.00        0.00       5         -[ConverterView textDidChange:] [53]
                0.00        0.00       5/10          -[ConverterView textFieldType:] [52]
                0.00        0.00       5/5           __ZN13BaseConverter14SetUnsignedDecEm 
                                                        [18045]
                0.00        0.00       5/5           -[ConverterView 
                                                         updateFieldsWithNewNumbers:] [54]
-----------------------------------------------
                0.00        0.00       1/1           __start [85480]
[18052   0.0    0.00        0.00       1         _main [18052]
-----------------------------------------------

Using the call graph, you can see which functions call a particular function, and also see what functions a particular function calls. Looking at the first entry, we can see that -[ConverterView textFieldType:] is called a total of 10 times - 5 times from -[ConverterView textDidChange:] and 5 times from -[ConverterView updateFieldsWithNewNumbers:]. Either -[ConverterView textFieldType:] did not call any other functions, or the functions that it did call were not compiled and linked with the -pg flag.

In the next entry, we can see the functions that -[ConverterView textDidChange:] called. It called -[ConverterView textFieldType:] 5 times out of the 10 times that the function was called throughout the program execution. It also called BaseConverter::SetUnsignedDec and -[ConverterView updateFieldsWithNewNumbers:] each 5 times.

With the results you get from gprof, here are some of the things you should be looking for:

    1. Look for functions that use up a lot of self ms/call in the flat profile. A lot of time is spent in these functions, and the amount of time can not be blamed on other functions that it calls.

    2. Take a look at the number of calls that your functions get. If they are larger than you expect, track down why they are larger than you expect. Some functions may be called redundantly.

    3. Scan over the numbers and see if anything looks surprising or slightly unexpected. A big part of optimization entails looking for things that do not look right.

Which Functions to Optimize

Here are some ideas for finding which functions you should spend some attention on:

    1. If a function takes a long time to execute but only executes once, then tuning that function's code is the best thing you can do. If a function gets run millions of times but spends little time executing, then the best thing you can do is get rid of the need to call it millions of times.

    2. Scan your results to find "things that make you go hmmmm..." Surprising results means things aren't operating the way you had anticiapted. This could mean some design issues with your algorithm, some functions are more expensive than you had anticipated, or just implementation mishaps.

    3. Go for the biggest bang. You may have a terribly inefficient function, but if it only takes up 0.1% of the time, then the biggest gain you can possibly get is 0.1%. Go after the function that takes 10% instead.

Summary

Don't postulate at what's wrong. Look at what's wrong.

References

For additional information, see Inside Mac OS X : Performance. More information on gprof is available at <http://www.gnu.org/manual/gprof-2.9.1>. Thanks to Yan Arrouye, Robert Bowdidge, Scott Boyd, and John Wendt for reviewing this article.


John A. Vink is one of Apple's most gifted engineers. He currently does performance analysis on code that you, the user, run constantly every day. He hopes you'll read this and make his job easier. It's possible to email him at vink@apple.com.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

iExplorer 4.1.10 - View and transfer fil...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
iExplorer 4.1.10 - View and transfer fil...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
Adobe InCopy CC 2018 13.0.1.207 - Create...
InCopy CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous InCopy customer). Adobe InCopy CC 2018, ideal for large team projects... Read more
Microsoft Office 2016 15.40 - Popular pr...
Microsoft Office 2016 - Unmistakably Office, designed for Mac. The new versions of Word, Excel, PowerPoint, Outlook and OneNote provide the best of both worlds for Mac users - the familiar Office... Read more
Adobe InDesign CC 2018 13.0.1.207 - Prof...
InDesign CC 2018 is available as part of Adobe Creative Cloud for as little as $19.99/month (or $9.99/month if you're a previous InDesign customer). Adobe InDesign CC 2018 is part of Creative Cloud.... Read more
Apple iOS 11.1.2 - The latest version of...
iOS 11 sets a new standard for what is already the world’s most advanced mobile operating system. It makes iPhone better than before. It makes iPad more capable than ever. And now it opens up both to... Read more
Slack 2.9.0 - Collaborative communicatio...
Slack is a collaborative communication app that simplifies real-time messaging, archiving, and search for modern working teams. Version 2.9.0: Slack now officially, and fully, supports Japanese.... Read more
iExplorer 4.1.9 - View and transfer file...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more
PCalc 4.5.3 - Full-featured scientific c...
PCalc is a full-featured, scriptable scientific calculator with support for hexadecimal, octal, and binary calculations, as well as an RPN mode, programmable functions, and an extensive set of unit... Read more
iExplorer 4.1.9 - View and transfer file...
iExplorer is an iPhone browser for Mac lets you view the files on your iOS device. By using a drag and drop interface, you can quickly copy files and folders between your Mac and your iPhone or... Read more

Latest Forum Discussions

See All

Mighty Battles guide - how to build a so...
Mighty Battles, the latest title from Hothead Games, is set to take the App Store by storm. The game puts a welcome twist on lane battlers, adding FPS elements to spice things up a bit. You'll collect cards to put your own military unit to gether,... | Read more »
Rules of Survival guide - how to be the...
The PUBG craze makes its way to mobile, with more and more battle royale games debuting on iOS and Android. Rules of Survival joins the ranks of mobile PUBG-likes, offering a classic battle royale experiences that doesn't vary too much from its... | Read more »
The best new games we played this week -...
The weekend is upon us friends, and it's time to take a look back and reflect on all of the wonderful games we've played over the past few days. This week was jam packed with new releases. There were some big, long awaited launches, some fun... | Read more »
Lineage II: Revolution guide - tips and...
At long last, Lineage II: Revolution has now come to western shores, bring Netmarble's sweeping MMORPG to mobile devices. It's an addictive, epic experience, but some of the systems in the game can be a bit overwhelming. Here are a few tips to help... | Read more »
A Boy and His Blob (Games)
A Boy and His Blob 1.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0 (iTunes) Description: | Read more »
Fight terrible monsters and collect epic...
Released on Western markets early last month, Dragon Project, created by Japanese developer COLOPL, brings epic monster hunting action to mobile for the very first time. Collect a huge array of weapons and armor, and join up with friends to fight... | Read more »
I Am The Hero (Games)
I Am The Hero 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: I Am The Hero is a pixel art, beat 'em up, fighting game that tells the story of a "Hero" with a glorious but mysterious past.... | Read more »
Kauldron (Music)
Kauldron 1.0 Device: iOS Universal Category: Music Price: $3.99, Version: 1.0 (iTunes) Description: Kauldron is our warmest sounding, punchiest synth yet! A completely new modeling technology, combined with carefully designed... | Read more »
Lineage II: Revolution is mobile’s bigge...
NCSoft’s hit fantasy MMORPG series has just made the leap to mobile with the help of Netmarble in Lineage II: Revolution. With over 1.5 million players having already pre-registered ahead of the game’s launch, Revolution hit the app stores... | Read more »
Swing skilfully in new physics-based pla...
Sometimes it’s the most difficult of obstacles that can be the most rewarding. One game hoping to prove this is OCMO, the new tough but fair platformer from developers Team Ocmo. Primed to set every speedrunner’s pulse racing, as an otherworldly... | Read more »

Price Scanner via MacPrices.net

Black Friday pricing on Macs and iPads now av...
B&H Photo has lowered prices on many Macs, iPads, and iPad Pros as part of their Black Friday week sale. Save up to $200 on MacBooks and iMacs and up to $150 on iPads. B&H charges sales tax... Read more
Best Apple iPad deals this weekend, up to $80...
Apple resellers are offering 9.7″ iPads and 10.5″ iPad Pros for up to $80 off MSRP this weekend as part of their early Holiday and Black Friday sales: Adorama is offering new 2017 9.7″ 32GB WiFi... Read more
Early Black Friday sale: Apple iMacs for up t...
B&H Photo has 27-inch iMacs in stock and on sale for up $130-$150 off MSRP including free shipping. B&H charges sales tax in NY & NJ only: – 27″ 3.8GHz iMac (MNED2LL/A): $2149 $150 off... Read more
Apple restocks refurbished Mac minis starting...
Apple has restocked Certified Refurbished Mac minis starting at $419. Apple’s one-year warranty is included with each mini, and shipping is free: – 1.4GHz Mac mini: $419 $80 off MSRP – 2.6GHz Mac... Read more
Save on 12″ MacBooks, Apple refurbished model...
Apple has Certified Refurbished 2017 12″ Retina MacBooks available for $200-$240 off the cost of new models. Apple will include a standard one-year warranty with each MacBook, and shipping is free.... Read more
Early Holiday sale: 12″ iPad Pros for up to $...
B&H Photo has 12″ iPad Pros on sale today for up to $130 off MSRP. Shipping is free, and B&H collects no sales tax outside NY & NJ: – 12″ 64GB WiFi iPad Pro: $749, save $50 – 12″ 256GB... Read more
Holiday sale prices on Apple 13″ MacBook Pros...
B&H Photo has 2017 13″ MacBook Pros in stock today and on sale for $100-$150 off MSRP, each including free shipping plus NY & NJ sales tax only: – 13-inch 2.3GHz/128GB Space Gray MacBook Pro... Read more
Sale: 13″ MacBook Airs starting at $899, $100...
B&H Photo has 2017 13″ MacBook Airs on sale today for $100 off MSRP including free shipping. B&H charges NY & NJ sales tax only: – 13″ 1.8GHz/128GB MacBook Air (MQD32LL/A): $899, $100 off... Read more
Week’s Best Deal on 13″ MacBook Pros: Apple r...
Apple has a full line of Apple Certified Refurbished 2017 13″ MacBook Pros available for $200-$300 off MSRP. A standard Apple one-year warranty is included with each MacBook, and shipping is free.... Read more
Deal: 15″ 2.6GHz MacBook Pro for $1799 w/free...
B&H Photo has clearance 2016 15″ 2.6GHz Touch Bar MacBook Pros in stock today and available for $600 off original MSRP. Shipping is free, and B&H charges NY & NJ sales tax only: – 15″ 2.... Read more

Jobs Board

Product Manager - *Apple* Pay on the *Appl...
Job Summary Apple is looking for a talented product manager to drive the expansion of Apple Pay on the Apple Online Store. This position includes a unique Read more
*Apple* Pro/Consumer Apps Support Engineer -...
…exemplify AppleCare's expert technical support paired with exceptional customer service for Apple 's software apps. This person is a problem solver, who understands Read more
Partner Marketing Manager, *Apple* Pay - Ap...
Job Summary The Apple Pay partner marketing team is looking for a Marketing Manager to develop and drive US programs. The right candidate will be passionate about Read more
*Apple* Solution Consultant - Apple (United...
# Apple Solution Consultant - Rochester, MN Job Number: 113037950 Rochester, MN, Minnesota, United States Posted: 19-Sep-2017 Weekly Hours: 40.00 **Job Summary** Are Read more
Sr. Experience Producer, Today at *Apple* -...
# Sr. Experience Producer, Today at Apple Job Number: 56495251 Santa Clara Valley, California, United States Posted: 23-Jun-2017 Weekly Hours: 40.00 **Job Summary** Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.