CSI: Crash Scene Investigation

Volume Number: 25
Issue Number: 12
Column Tag: Debugging

CSI: Crash Scene Investigation

Examining crashes to catch the culprit

by David Garcea

The 911 Call

You've written an application. Great! However, every application will crash eventually. Every crash is a mystery waiting to be solved, and the deduction and experimentation required to solve it can make you feel like Gil Grissom, but first you must know it happened. When an application crashes, the Crash Reporter will gather information about it and send it to Apple. This works great for Apple's products, but it doesn't help third party developers. You will have to do some coding to redirect this information to you. There are several packages that you can use to accomplish this, or you can do it all yourself. Here are a few of the free, ready-made options:

Smart Crash Reports by Unsanity is an InputManager-based enhancement for Apple's Crash Reporter application, which causes the crash report to be posted to a CGI on either your web server, or on Unsanity's server, as well as sending it to Apple. http://www.smartcrashreports.com
HDCrashReporter resides solely inside your application, but requires the user to relaunch your application after a crash, in order to email the report to you. The source code is available under the GNU Lesser General Public License, so you can customize it. http://www.profcast.com/developers/HDCrashReporter.php
ILCrashReporter is a framework that contains a custom CrashReporter application. When you start your application, you launch the CrashReporter and it watches your process for unexpected termination. When a crash occurs, the crash report and console log are emailed to you. The source code for both the framework and application are available. http://www.infinite-loop.dk/developer

If you choose one of the above options, you may still want to expand on the information they gather. To avoid delays caused by going back to ask for more information, get everything you might need all at once. In addition to the crash report, you should acquire a system profile to determine which environments are susceptible to the crash, and therefore how many users are likely to be affected by it. You should also get the console log, which may hold only a single line that pertains to your program, but that line often pinpoints the problem. These files can be obtained automatically, so that all the user must do is permit the information to be sent to you. The less they have to do, the more likely they are to report the crash. Consider providing a way for the user to describe the incident as well. They witnessed it, so they may know something crucial to reproducing it. You will also want to note the name and contact information of the reporter, so that you can ask for more information if necessary, and get confirmation once you have fixed it.

The Witness Interview

Eyewitness statements are even more unreliable in the technology industry than they are in criminal investigations. While the witness statement could provide the clues that you need to solve the problem, they could also contain misused terminology, specious assertions, and misleading statements. Always examine what the witness said, and stay open to it as a possibility, but do not to assume any of it is accurate.

The problem with witness statements is the difficulty inherent in describing in words what is seen on the screen. To circumvent this, ask the reporter to take a screenshot of your application the moment before they reproduce the crash. You will notice the details that the reporter did not think to mention. At Telestream, we take this further and ask our Quality Assurance department to use ScreenFlow to record a video of the steps leading up to a crash or bug, which is better than a single picture, as it shows you the state of the program at each step. You could use the free demo of ScreenFlow (http://www.telestream.net/screen-flow), or Jing (http://www.jingproject.com) to do the same.

The Victim's Wallet

Once you have all of the documentation, examine it, starting with the crash report, which contains sections describing the process, the report, the crash, the threads, the registers, and the binaries.

The Process section contains identifying information about the process that crashed, including the name, process identification number, how the process was launched, and the executable path for the process.

Ensure that the identifier matches one of yours. If it does not, then the crash is out of your jurisdiction and you cannot fix it. Inform the reporter to send it to the appropriate party.

After the process name is a number in brackets. This is the process identification (PID) number that was assigned to the process when it started. Every process is given a number, starting with zero, and incrementing for each new process that is launched. If this number is high, you know that a lot of processes have been run since the last time the computer booted up, which suggests that the computer has been running for a while without a restart.

Next is the path to the executable for this process. If this is a location that you did not expect, investigate how your program behaves when run from this location. The user may have been running from a locked disk image, or in a directory where they did not have proper permissions, both of which could cause problems if your code is not designed to handle these situations.

The version number of your product is next, and it is essential in correlating the crash to a specific version of your code. The standard Mac version number scheme contains a major version, a minor version, and a bug fix number. This scheme lacks one essential feature. It does not provide a unique identifier for each and every build. Consider using a scheme that consists of the major version, minor version, bug fix number, and build number, thereby assigning a single unique identifier to each and every build. You can then correlate this number to the date and time that a build was made, and then retrieve the exact version of every source file used to make that build from your source control management system. This will save time that might otherwise be lost by trying to reproduce a crash with the wrong source code.

The code type specifies whether the PowerPC or Intel code inside your universal binary was the one that crashed. If you have code written for a specific architecture, such as AltiVec or SSE, this will tell you which was executed.

The parent process tells you how your application or plug-in was launched. For applications, this is typically launchd. If your product was launched by another process, it may have been in an environment or workflow that you had not anticipated.

The Crime Scene

The report section includes the date and time that the crash occurred, which can be used to correlate the crash with the console log, as most of its entries are time-stamped. You can then focus on the log entries immediately prior to the time of the crash.

The version of the operating system is important for reproducing issues, as they may be specific to a certain version of Mac OS X. If it is a new version of Mac OS X that was released after this version of your software, or if it is a very old version of MacOS X, this could signal an incompatibility.

Lastly, the report version describes the format of the crash report, for use by automated analysis programs.

The Cause of Death

Crashes are caused by exceptions. The crash section describes the exception that caused the crash using two identifiers: the exception type, which is the category for the exception; and the exception code, which is the specific identifier. The most common exception types are EXC_ARITHMETIC, EXC_BAD_INSTRUCTION, and EXC_BAD_ACCESS. The line for the exception code may also include the offending address or value that caused the exception. The last item is the number of the thread that was executing when the exception was encountered.

The EXC_ARITHMETIC exception type covers any arithmetic that is considered illegal, such as dividing by zero (EXC_I386_DIV). Mathematically, the result of a division by zero is undefined. Intel processors are strict when it comes to dividing by zero, and they will not allow it. PowerPC processors were more forgiving, albeit mathematically incorrect. Instead of causing a crash, they returned zero as the result.

Listing 1: ExceptionController.m
Divide By Zero

The following demonstrates causing an EXC_ARITHMETIC/EXC_I386_DIV (divide by zero) exception. Note that the compiler will warn you if it sees that you are trying to do a divide operation with a literal constant of zero as the divisor. However, it will not catch situations where a variable with a value of zero is used as the divisor.

int divisor = 0;

   
// This line will cause the exception on an Intel processor.
int result = 128 / divisor;
// Modulus operations use division, so they can
// also cause this exception.
result = 128 % divisor;

The EXC_BAD_INSTRUCTION exception type means that the processor was given an instruction that it does not understand. This means that your code has corrupted the instruction pointer, which is a register that points to the memory location that holds the next instruction to execute. When that pointer is corrupted it points to some other part of memory and the processor tries to interpret that memory as an instruction, when it was intended to be something else.

In order to prevent a problem in one program from crashing other programs, or even the entire system, Mac OS X uses protected memory. Every process is given a virtual address space, which is divided into segments. Each segment has permissions that specify whether you can read from it, write to it, or execute it. When you allocate memory, it is mapped from the physical address that it resides on to the virtual address that is given to your program. The EXC_BAD_ACCESS exception type means that your program attempted to access memory that either was not mapped (KERN_INVALID_ADDRESS), or was not allowed to access (KERN_PROTECTION_FAILURE) because of the permissions on that segment. To examine the virtual memory maps for your application, pass the PID of your application to the vmmap command line tool.

Listing 2: ExceptionController.m
Kernel Invalid Address

The following demonstrates causing an EXC_BAD_ACCESS/ KERN_INVALID_ADDRESS exception.

   
// On 32-Bit systems, each process can have up to 4GB of 
// memory. Here, we try to write to the very last byte,
// which is neither likely to be mapped already, nor 
// mapped by us via allocation. While you aren't likely
// to explicitly do this in application, if you try to 
// write to a pointer that has been corrupted, you may 
// end up doing just this.
memcpy( (void *)0xFFFFFFFF, "d", 1);

Listing 3: ExceptionController.m
Kernel Protection Failure

The following demonstrates causing an EXC_BAD_ACCESS/ KERN_PROTECTION_FAILURE exception.

// Trying to write to a NULL pointer will cause this 
// exception, as memory address zero resides in a 
// virtual memory segment called "__PAGEZERO", which
// does not allow write access.
long *badPointer = NULL;
// This line will cause the crash.
*badPointer = 0xFEEDFACE;

Now that you know how to cause these bugs, you will be better prepared to find them and fix them.

The Corpse

The body of the crash report is the threads section. This shows the call stack for every thread in your program when your process crashed. Each entry in the call stack contains a number defining its position in the stack, a universal type identifier, the address of the function, the function name, and the offset to the instruction that caused the crash. The first line is the function that the thread was in at the time of the crash. The identifier tells you what binary contains that function. If the identifier in the first line of the call stack for the crashed thread is not the identifier for your application, you can check the Binary Images Description portion of the crash log for more information on that binary. We will cover more on that section later.

Threads can either be actively executing, or blocked, waiting to execute. Determining which threads were active at the time of the crash and which were blocked, will allow you ferret out the potential culprits. Consider any thread that was blocked to have an alibi. Any thread, whose current function name contains one of the following words, was most likely blocked: wait, delay, semaphore, mutex, and sleep.

If you notice a thread with one or more function names repeated, particularly if the call stack is very deep, the exception might be caused by runaway recursion. Recursion is when a function calls itself, either directly or indirectly. This can be quite useful technique, particularly when dealing with hierarchical data, but if left unchecked, the recursion could keep going until it uses up all of the available memory, which will cause a crash. Recursion can also happen unintentionally, for instance, if you call were to call [self display] in the drawing routine of a custom view.

If you see question marks in place of the binary identifiers, you could have a stack corruption. These can be difficult to solve because the application will continue to run after the memory has been corrupted, crashing instead in code that is executed much later. If you suspect you are dealing with a stack corruption, try turning on stack canaries in Xcode by adding the –fstack-protector (or –fstack-protector-all) flag to the "Other C Flags" setting for your project. Stack canaries work like a canary in a coal mine, as an early warning system. When stack canaries are on, the integrity of the stack is checked when you return from a function. If the stack has been corrupted, an error message is printed to the console to help you find the problem.

Multithreading problems are also difficult to track down because the crashes may only happen a small percentage of the time, and the offending code might not be the in the thread that crashed. Your best resource for these types of issues is collecting multiple crash logs and comparing them together. If you find the same two threads are always in similar locations when the crash occurs, try checking for unsynchronized access to shared resources, which is usually the culprit. Check your semaphores, and mutexes, to see if there is a case you might have left vulnerable to simultaneous access.

The Brain

After the threads section, you will find a table listing the registers and their values at the time of the crash. The x86 processor architecture designates eight registers for general purposes, six segment registers for memory management, a flags register to describe or control the results of operations, and the instruction pointer, which holds the address of the next instruction to execute. Not all x86 registers are listed in the crash report, but the ones that are can provide clues as to the cause of the crash.

x86 General Purpose Registers Listed In Crash Reports

Register 
(32 Bit / 64 Bit)      Purpose   
EAX   /RAX      Accumulator    
EBX   /RBX      Base   
ECX   /RCX      Counter   
EDX   /RDX      Data   
EDI   /RDI      Destination Index   
ESI   /RSI      Source Index   
EBP   /RBP      Base Pointer   
ESP   /RSP      Stack Pointer

While the general-purpose registers can be used for anything, most have certain tasks that they are optimized for. The accumulator is where most arithmetic calculations are performed. The base register has no specialized purpose. The counter register is designed for use as the index in loops. The data register is for storing data used in the calculations occurring in the accumulator. The destination index is for use as a pointer to the current location in a write operation. Similarly, the source index is for use as a pointer in a read operation. The base pointer points to the bottom of the stack, and the stack pointer points to the top of the stack.

x86 Other Registers Listed In Crash Reports

Register       Purpose   
SS      Stack Segment   
EFL/RFL      Flags   
EIP/RIP      Instruction Pointer   
CS      Code Segment   
DS      Data Segment   
ES      Extra Segment   
FS      F (Extra) Segment   
GS      G (Extra) Segment   
CR2      Control Register 2

The remaining registers have dedicated purposes. The segment registers are for supporting memory protection via segmentation. However, paging is now the preferred method of memory protection, so most of these registers are set to the same value. The F and G segments may store data specific to a thread. The flags register is used to control the results of operations, and to store information about those results, such as if the result overflowed the register. CR2 contains the offending address when a page fault occurs.

The Known Associates

The Binary Images Description section of the crash report has a list of all of the binaries involved in running your application, including the frameworks, plug-ins, and dynamically-linked libraries. There is one line per binary, and each entry contains the memory address span, the identifier, the version, and the file path it was loaded from. This list is usually long, even for the most trivial applications. If your application uses plug-ins, look for them here to see what versions were present.

If you are having trouble finding the cause of your crash, it is worth taking a few minutes to review this list. Look for anything that is unusual, meaning any entry whose identifier is neither yours nor Apple's (i.e. com.apple.whatever). When you find one that you do not recognize, look it up online. If it seems like it could interfere, try uninstalling it and see if the problem disappears.

The Modus Operandi

Some bugs cause immediate crashes, such as the EXC_I386_DIV exception, others start causing problems that will lead to a crash. If the crash happens at the same line of code every time it is executed, it is probably going to be easy to fix. If, however, the crash only happens occasionally, or at different lines of code, then it is a delayed crash, and will be tougher. To fix these issues, you will have to backtrack and find the initial problem.

To tackle delayed crashes, there are several techniques you can use to narrow down the problem. Law enforcement has a better chance of catching a serial killer each time he commits a new murder because each incident provides the investigators with more information. Similarly, the more instances of the crash you have to examine, the easier it will be to solve. Collect documentation for multiple instances of the crash and compare them, using your favorite diff tool. The information that is the same may be the conditions that are required to cause the crash, which are hints to the cause. You can also use this technique to exclude unrelated crash reports, if they are dramatically different than all of the others that you have collected on the issue.

The Reenactment

Your goal now is to be able to reliably reproduce the crash. If you cannot, you will never be able to verify that it was fixed, so you might as well drop it in the cold case drawer.

The next step is reducing the time it takes to reproduce the crash to under a few minutes. If the crash takes an hour to occur, it will take an eternity to investigate it.

If appropriate, try stressing the program by reducing the resources it has available, such as RAM, virtual memory, disk space, network bandwidth, etc. The crash might require one of those resources reaching a critically low point. By reducing those resources from the beginning, such as by launching a lot of applications, filling up disk space with large files, or starting extremely large file transfers, you can induce the required conditions without the usual wait.

Try examining what you think it is doing around the time of the crash. If it is working on a certain part of a large file, then try making that part of the file into the beginning of the file, either by moving it, or by cutting out everything preceding it. If it is in last stage of a multistage process, then try disabling the prior stages.

Once your crash is easily reproducible, you will need to narrow down the problem. Try taking out easily removable items such as plug-ins and frameworks. Next try commenting out half the suspected code, in a way that leaves the other half still compiling and usable. If the problem persists, then you know the bug is in the uncommented half. Then try commenting out half of that code, and continue with this technique until the bug becomes evident.

Another useful technique is regression, which requires that you use a source control management (SCM) product, such as CVS, Subversion, or Perforce. It will also be useful to have unique version numbers for every build of your product like we discussed earlier. Try going back through previous builds of your product until you find one where the problem did not occur. Then, using your SCM tool, find the changes that were made between the unaffected build and the affected build. Those changes are likely to contain the bug.

The Suspect Lineup

You should now be able to turn the problem on or off at will, making the crash occur or not occur.

Now that you have found the fix, you might be thinking that you are done, but there are often many possible fixes for any problem. Do you really want to use whichever one happened to be the first that you found? Take the time to think of a few other possible ways to fix the problem. Then consider the benefits and drawbacks of each. Consider the time it takes to implement, the maintainability of the code, the scope of the changes, and the likelihood that the fix will cause more problems. Now you can pick the best fix and implement it.

The Conviction

You are almost done. Document the bug and the fix in your code, so that neither you, nor the other members of your team inadvertently reintroduce the problem. Document it in your source code management system as well, so that you know when you fixed it, both in regards to time and to versioning. And finally, document it in your release notes so that your users know that this is the update that will fix the problem they are encountering. Do not be so ashamed of the crash that you omit it from the release notes. All applications crash, even Apple's. The fact that you found it, fixed it fast, and made the fix available to your users quickly and honestly, is something to be proud of. Keeping excellent records like this will help prevent the problem from reappearing, and will also provide you with valuable resources for tracking down your next crash.

Software Updates via MacUpdate

Latest Forum Discussions

Make the passage of time your plaything...

While some of us are still waiting for a chance to get our hands on Ash Prime - yes, don’t remind me I could currently buy him this month I’m barely hanging on - Digital Extremes has announced its next anticipated Prime Form for Warframe. Starting... | Read more »

If you can find it and fit through the d...

The holy trinity of amazing company names have come together, to release their equally amazing and adorable mobile game, Hamster Inn. Published by HyperBeard Games, and co-developed by Mum Not Proud and Little Sasquatch Studios, it's time to... | Read more »

Amikin Survival opens for pre-orders on...

Join me on the wonderful trip down the inspiration rabbit hole; much as Palworld seemingly “borrowed” many aspects from the hit Pokemon franchise, it is time for the heavily armed animal survival to also spawn some illegitimate children as Helio... | Read more »

PUBG Mobile teams up with global phenome...

Since launching in 2019, SpyxFamily has exploded to damn near catastrophic popularity, so it was only a matter of time before a mobile game snapped up a collaboration. Enter PUBG Mobile. Until May 12th, players will be able to collect a host of... | Read more »

Embark into the frozen tundra of certain...

Chucklefish, developers of hit action-adventure sandbox game Starbound and owner of one of the cutest logos in gaming, has released their roguelike deck-builder Wildfrost. Created alongside developers Gaziter and Deadpan Games, Wildfrost will... | Read more »

MoreFun Studios has announced Season 4,...

Tension has escalated in the ever-volatile world of Arena Breakout, as your old pal Randall Fisher and bosses Fred and Perrero continue to lob insults and explosives at each other, bringing us to a new phase of warfare. Season 4, Into The Fog of... | Read more »

Top Mobile Game Discounts

Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links below... | Read more »

Marvel Future Fight celebrates nine year...

Announced alongside an advertising image I can only assume was aimed squarely at myself with the prominent Deadpool and Odin featured on it, Netmarble has revealed their celebrations for the 9th anniversary of Marvel Future Fight. The Countdown... | Read more »

HoYoFair 2024 prepares to showcase over...

To say Genshin Impact took the world by storm when it was released would be an understatement. However, I think the most surprising part of the launch was just how much further it went than gaming. There have been concerts, art shows, massive... | Read more »

Explore some of BBCs' most iconic s...

Despite your personal opinion on the BBC at a managerial level, it is undeniable that it has overseen some fantastic British shows in the past, and now thanks to a partnership with Roblox, players will be able to interact with some of these... | Read more »

Price Scanner via MacPrices.net

You can save $300-$480 on a 14-inch M3 Pro/Ma...

Apple has 14″ M3 Pro and M3 Max MacBook Pros in stock today and available, Certified Refurbished, starting at $1699 and ranging up to $480 off MSRP. Each model features a new outer case, shipping is... Read more

24-inch M1 iMacs available at Apple starting...

Apple has clearance M1 iMacs available in their Certified Refurbished store starting at $1049 and ranging up to $300 off original MSRP. Each iMac is in like-new condition and comes with Apple’s... Read more

Walmart continues to offer $699 13-inch M1 Ma...

Walmart continues to offer new Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) online for $699, $300 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBook for sale by... Read more

B&H has 13-inch M2 MacBook Airs with 16GB...

B&H Photo has 13″ MacBook Airs with M2 CPUs, 16GB of memory, and 256GB of storage in stock and on sale for $1099, $100 off Apple’s MSRP for this configuration. Free 1-2 day delivery is available... Read more

14-inch M3 MacBook Pro with 16GB of RAM avail...

Apple has the 14″ M3 MacBook Pro with 16GB of RAM and 1TB of storage, Certified Refurbished, available for $300 off MSRP. Each MacBook Pro features a new outer case, shipping is free, and an Apple 1-... Read more

Apple M2 Mac minis on sale for up to $150 off...

Amazon has Apple’s M2-powered Mac minis in stock and on sale for $100-$150 off MSRP, each including free delivery: – Mac mini M2/256GB SSD: $499, save $100 – Mac mini M2/512GB SSD: $699, save $100 –... Read more

Amazon is offering a $200 discount on 14-inch...

Amazon has 14-inch M3 MacBook Pros in stock and on sale for $200 off MSRP. Shipping is free. Note that Amazon’s stock tends to come and go: – 14″ M3 MacBook Pro (8GB RAM/512GB SSD): $1399.99, $200... Read more

Sunday Sale: 13-inch M3 MacBook Air for $999,...

Several Apple retailers have the new 13″ MacBook Air with an M3 CPU in stock and on sale today for only $999 in Midnight. These are the lowest prices currently available for new 13″ M3 MacBook Airs... Read more

Multiple Apple retailers are offering 13-inch...

Several Apple retailers have 13″ MacBook Airs with M2 CPUs in stock and on sale this weekend starting at only $849 in Space Gray, Silver, Starlight, and Midnight colors. These are the lowest prices... Read more

Roundup of Verizon’s April Apple iPhone Promo...

Verizon is offering a number of iPhone deals for the month of April. Switch, and open a new of service, and you can qualify for a free iPhone 15 or heavy monthly discounts on other models: – 128GB... Read more

Jobs Board

Relationship Banker - *Apple* Valley Financ...

Relationship Banker - Apple Valley Financial Center APPLE VALLEY, Minnesota **Job Description:** At Bank of America, we are guided by a common purpose to help Read more

IN6728 Optometrist- *Apple* Valley, CA- Tar...

Date: Apr 9, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92308 **Requisition ID:** 824398 At Target Optical, we help people see and look great - and Read more

Medical Assistant - Orthopedics *Apple* Hil...

Medical Assistant - Orthopedics Apple Hill York Location: WellSpan Medical Group, York, PA Schedule: Full Time Sign-On Bonus Eligible Remote/Hybrid Regular Apply Now Read more

*Apple* Systems Administrator - JAMF - Activ...

…**Public Trust/Other Required:** None **Job Family:** Systems Administration **Skills:** Apple Platforms,Computer Servers,Jamf Pro **Experience:** 3 + years of Read more

Liquor Stock Clerk - S. *Apple* St. - Idaho...

Liquor Stock Clerk - S. Apple St. Boise Posting Begin Date: 2023/10/10 Posting End Date: 2024/10/14 Category: Retail Sub Category: Customer Service Work Type: Part Read more

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech

CSI: Crash Scene Investigation

Examining crashes to catch the culprit

The 911 Call

The Witness Interview

The Victim's Wallet

The Crime Scene

The Cause of Death

The Corpse

The Brain

The Known Associates

The Modus Operandi

The Reenactment

The Suspect Lineup

The Conviction

Suggested Reading

Software Updates via MacUpdate

Latest Forum Discussions

Price Scanner via MacPrices.net

Jobs Board