OS X Investigation and Troubleshooting
Volume Number: 22 (2006)
Issue Number: 5
Column Tag: Programming
Mac In The Shell
OS X Investigation and Troubleshooting
by Edward Marczak
The Secrets to OS X success
"How did you know that?" A question I'm often asked. Usually right after pulling out some arcane bit of OS X knowledge. Now, I hardly know everything - far, far from it. But, I try to stay a little ahead of the curve. What you're reading now is part 1 of a multi-part column on learning the depths of OS X. Of course, the deeper you dig, the more quickly you can troubleshoot the system. At the end, I hope you will have picked up some new tips and tricks. Of course, we need to begin at the beginning.
One of the first things I like to install on my own machines is Tripwire (this, by the way, goes for end-user stations and servers). Long known as a security tool, a tripwire will take a snapshot of file system, and then report any changes made to that system. From a security perspective, that's incredibly important:, especially when you see something change in an area that shouldn't ever change! It is also a great way to learn about your Mac. What changes every day? Did the patch to your software install exactly what it claims? (And that's it...I'm not going to launch into the history of the product this time!)
Currently, I run the "Tripwire" tripwire (there are others out there). You can download an OS X binary from <http://www.macguru.net/~frodo/Tripwire-osx.html>. Go forth, download and install. Once you've installed it, the tricky part is the configuration and setup, and that's what I'll cover here. But, I will breeze through the install.
Please note that the binaries in the download are PPC only. If you're on an Intel Mac, grab the source, and compile it up yourself. Who knows, there may be a pre-packaged version somewhere by the time this column runs.
This is a full command-line install, so fire up Terminal.app (or iTerm, or...). Uncompress the tarball, get root (sudo bash, sudo -s, su -...take your pick), and run ./install.sh. Press ENTER to read through the license agreement (space, space, space, space, space, space), and agree. Note that:
This program will copy Tripwire files to the following directories:
Follow the rest of the instructions, and definitely initialize the database when asked, even though it will undoubtedly take a while. Tripwire is a deep, and somewhat complex product. I've been using it longer than I can remember on both servers, where I install it with a little more consciousness to security, and my personal workstations. Tripwire usage alone, could take an article or two. A quick Google search turned up this good tutorial - http://www.weberdev.com/Manuals/rhl-rg-en-80/ch-tripwire.html - and I recommend you read it, if you want to get into tripwire deeper than I present here. Just remember to adjust paths in your head for the install you just did.
Tripwire operates against a file policy. You should notice that all files were installed under /usr/local. In /usr/local/tripwire/policy, you'll find two files, tw.pol and twpol.txt. The first, is a signed binary file - the one tripwire uses to run from. The second, is just a text file. To change the policy, you need to sign a text file into a binary using your passphrase. Use twpol.txt as a guide. For the most part, the default policy is just dated, and you can comment out anything relating to System 9. Make your changes to your policy, change to the /usr/local/tripwire/bin directory and run:
# ./twadmin -m P ../policy/twpol.txt
Please enter your site passphrase:
Wrote policy file: /usr/local/tripwire/policy/tw.pol
Since this is security software, we can't just allow any change to policy, right? We need to re-initialize the baseline snapshot. So, if you've chosen to do this, run ./tripwire -m i from the /usr/local/tripwire/bin directory and enter your passphrase. Now you're using your custom policy. From there, when you run tripwire -m c, a report will be output to your terminal and to /usr/local/tripwire/report, where you can pick up a text-based report.
Tripwire just gives me that extra happy feeling that I know what's going on with my machine. When applications piggyback on another's install (yes Smart Crash Reports, I'm looking in your direction...), it won't surprise you later on. I would like to share how I automate Tripwire reports, since it may not be entirely obvious. Here's a portion of the shell script that I had perform some nightly maintenance:
## Check and report on differences
/usr/local/sbin/tripwire -m c > /var/root/logs/`date +%Y%m%d`.txt
## Update the database
echo "***********************" >> /var/root/logs/`date +%Y%m%d`.txt
/usr/local/sbin/tripwire -m u -a -r /usr/local/lib/tripwire/report/`ls
/usr/local/lib/tripwire/report` -P "My Passphrase Here" -v >>
First thing that gets done is a report, which is redirected to a file - the name of which is based on the date. Then, after writing a marker to my log, I update the Tripwire database with a new snapshot, so I'm ready for the next night. (You might notice from this snippet that I don't have things in the exact same place you might. So if you want to steal this, make sure you get the paths right!)
While I still do rely on Tripwire for high-level changes, please make note that it's far from perfect in a Macintosh environment - especially with Tiger. Tripwire is slightly aged at this point, and worked well when it arrived on the scene. However, it will miss changes in HFS+ metadata, like keywords for Spotlight and ACL information. Just understand that Tripwire is no longer an ideal security solution for the Mac (if it ever was).
The find command - something I've been meaning to dig into, in a column somewhere. Sounds simple, right? find finds files. However, there's an impressive array of options that let you narrow down the scope of your results. Of course, you can find by name:
find . -name "report" -print
You have to tell find where to begin looking, that's the first "." - start in the current directory. From there, you have to give find its criteria. In this case, we're looking for a name. Finally, we have to tell it what to do with the items it finds. Here, we just want it printed to our terminal. That's OK, but not for the purposes of this article. You should hit the man page for all of the options that find contains, but we'll look at some practical OS X example usage. Here's one of the more useful ones: find files updated since boot. Since OS X creates the /mach file at every boot, we have a great marker to use as a time stamp. If you want to find all files in the /System hierarchy that have been updated since boot, use this:
# find /System -newer /mach -print
On my machine, this currently yields this:
The kernelcaches file comes up because, well, I was fiddling with kernel extensions. The ARD files got modified because I needed to fire up ARD to get back into my machine. (looooong story why I had to configure and fire up ARD through ssh on my own machine...)
If you want to find files that have been modified since a certain time, not necessarily boot, use touch to drop a marker, and use find against that. Something like this:
# touch -t 200601011300.00 marker
# ls -l marker
-rw-r--r-- 1 root wheel 0 Jan 1 13:00 marker
# find / -newer marker -print
This will find all files created or modified on my machine since the first of January, 2006, 1pm. Since I suspect that would be a fairly high number, I'll skip the output and leave that as an exercise for the reader.
Of course, with no constraints, find will just return everything under a certain hierarchy, which comes in handy for quick before and after snapshots. find /Library -print > ~/liblist.txt will print out all files in /Library, and redirect the output to a file in your home directory named "liblist.txt". Run that before installing software, and again, with a different capture file name, after the software is installed, and compare the two. You'll find any new files that the installer may have dropped into /Library.
That's not an exhaustive look at find, as it's not the sole focus of this article. But, make no mistake - find is incredibly useful. If you've ever examined the locate.updatedb script that runs weekly, you'll see that it builds its database, using find. The deeper you dig, the more uses you'll find.
What's da BOM?
Speaking of file tracking and installation, did you know that the Apple installer will happily show you the files it will install before it installs them? Really. Next time you need to install software, look for "Show Files" under the File menu (or press Apple-I). That's part of the package's bill of materials. Figure 1 shows the beginning of the bill of materials for Viva Designer.
Figure 1: Installer showing a bill of materials
In conjunction with our other techniques above, this is a handy way to see where an installer may want to spray files. Additionally, if you end up running Tripwire, the file changes it reports, should match up with the BOM that an installer presents. If not, someone is lying!
You can also determine the bill of materials from the command-line, if you are installing or inspecting a package remotely. The lsbom binary will display the contents of a BOM archive. Witness:
$ cd VivaDesigner-Free-5.1.0-4055.pkg/Contents
$ ls -la
drwxr-xr-x 7 marczak marczak 238 Feb 15 11:13 .
drwxr-xr-x 3 marczak marczak 102 Feb 15 11:13 ..
-r--r--r-- 1 marczak marczak 46540 Feb 15 11:13 Archive.bom
-r--r--r-- 1 marczak marczak 76584362 Feb 15 11:13 Archive.pax.gz
-r--r--r-- 1 marczak marczak 1373 Feb 15 11:13 Info.plist
-r--r--r-- 1 marczak marczak 8 Feb 15 11:13 PkgInfo
drwxr-xr-x 12 marczak marczak 408 Feb 17 15:34 Resources
$ lsbom Archive.bom
. 40755 501/80
./Applications 40755 501/80
./Applications/Viva 40777 0/80
./Applications/Viva/AddIns 40777 0/80
./Applications/Viva/AddIns/RTF Import 40777 0/80
./Applications/Viva/AddIns/RTF Import/Resources 40777 0/80
./Applications/Viva/AddIns/RTF Import/Resources/ansi-gen 100666 0/80 3231 2569670117
./Applications/Viva/AddIns/RTF Import/Resources/ansi-sym 100666 0/80 1498 2978768719
./Applications/Viva/AddIns/RTF Import/Resources/ansi.code 100666 0/80 1449 784701331
./Applications/Viva/AddIns/RTF Import/Resources/mac-gen 100666 0/80 3137 1114243711
(output clipped for sanity)
If you've never looked at the contents of a package, take a look again at the previous listing. Archive.bom is the packages bill of materials. Archive.pax.gz contains the files themselves! So, if you ever need to grab one file from a package, that's where you can get it from.
The next-to-last last thing I'm going to delve into this month is the process model of OS X - an important area of understanding for the advanced topics later on. Despite Apple pushing the notion that OS X is Unix, it's not quite, really. It's a mach kernel with Unix-like behavior and APIs. This makes all of that Unix source code compile neatly (mostly), but you're still always operating under the monolithic mach kernel, which does things a little differently than the traditional *BSD, Sys V, and derivative Unix-like works such as Linux and IRIX kernels. All in all, it's a unique mix of a known kernel, a modified BSD Unix that rides on top, and unique parts from Apple that haven't been seen before.
Describing the Mach and microkernel architecture would take a book by itself (one that I would guess exists already), but its foundations are important to understand if we're to troubleshoot deeply. I'm going to run us through the talking points, and the highlights that get us to OS X.
Mach came about after Unix was already in existence. From that perspective, it could see the good points of Unix and use them, and the downsides of Unix and avoid them. Mach was originally developed at Carnegie Mellon University, leapfrogging off of a BSD Unix core. Little by little, Mach replaced parts of the BSD core. To keep compatibility, much of BSD remained in the Mach kernel. Mach v3 moved all BSD code outside of the kernel, resulting in the microkernel featured today. The goal of a microkernel architecture allows the kernel to provide a minimal amount of services, and extensions that run up in userland. Interestingly, this provides a system that allows other operating systems to sit on top. That's one of Mach's primary goals: a simple, extensible kernel. Note, finally, that traditionally, when talking about the kernel, you'd only be referring to the microkernel itself. Apple, with Mac OS X, gets a little more liberal with the definition, as they've forged their own path. When Apple refers to the kernel, that primarily encompasses the Mach kernel, BSD, and I/O Kit. This is done for valid performance reasons.
Here's the important part: Mach's execution environment is called a task. Other Unicies (like the System V, traditionally) break an executing program down into a process. A process allows the kernel to keep track of:
- * Context - the current location of program execution
- * The program's credentials (rights)
- * Memory space that the program has allocated/access to
...but that's not exactly what we're interested in. Mach abstracts things a little differently. A task provides the address space for execution. There is no such thing as a "process" in Mach! A thread is the basic unit of execution. A thread runs inside a task. A task does nothing unless it has a thread running inside it. A task allows communication with the rest of the system via ports (these have nothing to do with IP ports!). Threads communicate over ports via messages.
A task with just one thread running, is similar to a Unix process. The fork system call creates a new process under Unix, and it creates a new task under Mach. So, a task provides virtual memory space, and ports for the threads that are running inside of it. Tasks and threads can be in only one of two states: running and suspended. Operating on a task affects all threads in the task. Mach allows for kernel tasks and threads, and of course, userland tasks and threads.
There you have it: a ridiculously simplified view of Mach. The important point to take away: A task is either running in the kernel or in userland. Thanks to the BSD roots, and Apple's bundling of BSD in kernel-space, you're still going to see plenty of references to "processes" - don't be confused by that - the BSD in Apple's kernel space still references processes. These are retrofitted into OS X by associating a process to a Mach task.
Listing All Open Files
The last utility I'm going to cover this month wraps up everything we've talked about: lsof (list open files). First, one must remember that Unix treats just about everything as a file. So this command, if you're not already familiar with it, may do more than you expect. Go on, get a shell and try it. Just type lsof by itself. You got an absolute ton of output, right? Things that certainly don't look like files, for sure. If you run this command as root, you get everything - everyone else gets a little less. (This is a compile-time option that, thankfully, Apple chose to enable). Specifically, as non-root, you'll only see processes that you have credentials to see. Of course, when troubleshooting, grep comes in extra-handy here (and there are plenty of switches that modify lsof's output). Let's look at a snippet of the files that Word has open, while I type this month's column:
(That's only a handful of the 195 files that actually were listed!) What is all of that? Let's look at a shorter listing:
$ lsof | head -7
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kernel_ta 0 root cwd VDIR 14,2 1360 2 /
launchd 1 root cwd VDIR 14,2 1360 2 /
launchd 1 root txt VREG 14,2 80112 2471328 /sbin/launchd
launchd 1 root txt VREG 14,2 1165460 4986063 /usr/lib/dyld
launchd 1 root txt VREG 14,2 4314524 7229742 /usr/lib/libSystem.B.dylib
launchd 1 root 0r VCHR 3,2 0t0 47460484 /dev/null
The command column lists the name of the process that holds a file open. Well, at least the process' first 9 characters by default. That can be changed with a switch. Next is the PID, or, process ID column. The user column lists the user ID, or the ID number of the user that owns the respective process. The remainder of the columns may require a little deeper explanation.
FD is the file descriptor number of the file or one of the following:
- cwd current working directory
- jld jail directory
- ltx shared library text (code and data)
- Mxx hex memory-mapped type number xx
- mem memory-mapped file
- mmap memory-mapped device
- pd parent directory
- rtd root directory
- txt program text (code and data)
The file descriptor number may be followed by a character (see the final line in the example listing above), which has the meaning:
r - file is open for read.
w - file is open for write.
u - file is open for read/write.
space (no character) - unknown mode, no lock character.
- (hyphen) - unknown mode and lock character follows.
The lock character will be one of the following:
N for an NFS lock of unknown type;
r for read lock on part of the file;
R for a read lock on the entire file;
w for a write lock on part of the file;
W for a write lock on the entire file;
u for a read and write lock of any length;
U for a lock of unknown type;
The type column lists what type of file is open. While lsof can report on many different types, it makes the most sense to concentrate on the types you'll see most:
FIFO - A FIFO pipe. Much like a regular pipe, but operates as part of the file system and can be accessed by multiple processes. man 1 mkfifo, if you need to know.
IPv4 - An open IPv4 socket.
IPv6 - An open IPv6 file.
KQUEUE - A kernel event queue file. man 2 kevent if you're really interested.
PIPE - An open unix pipe.
PSXSEM - Posix semaphore file. A semaphore is like a lock, but with a little more control. With a semaphore, more than one thread can be performing a given operation at once, whereas a lock will restrict operations to a single thread.
PSXSHM - Posix shared memory.
VCHR - a character device.
VDIR - a directory on the filesystem.
VREG - a regular file on the filesystem.
VGER - that thing from Star Trek (oh, wait...you won't see that in lsof).
LINK - a symbolic link.
systm - a system domain socket.
unix - a unix domain socket.
The device column is an important one: it tells you which device said file is open on. On OS X, possibly not a big deal as you may be running with a single disk (as I am on my PowerBook at the moment). However, OS X Server may present you with more possibilities (as I hope you're separating the system and user data on a server...but that's another article). The listed 'device' may look a little odd. In some cases, it will be a memory address (in the case of PSXSHM, for example). Files and directories will list the device node number. A device node number looks something like this: 14, 2 - it is listed in the size column. Perhaps this pleads for further explanation.
Device files live in the /dev directory. Take a peek in there and you'll see files that are pretty much like none other on the system. In the permissions column, where you'd expect to either find a 'd' denoting a directory, or a '-' denoting a file, we see instead a 'c' or 'b'. Those represent character or block devices. A character, or raw, device is something like a tty (teletype terminal - what you're using when you fire up Terminal or ssh into another machine). A block device is typically used for a disk or tape device (OK, I still wish OS X had raw tape support...). Without getting too deep into this, a block device gets a buffer assigned by the kernel, and allows you to perform non-sequential access. A character device typically gets used where you'd be reading a stream of information (like from a serial port). What about those crazy numbers?
The numbers in the size column are called the devices major and minor numbers. All of these device entries represent device drivers. The actual driver is either compiled into the kernel (/mach_kernel) or loaded as an extension. The /dev entry is just a pointer to the driver in kernel space. Just because there is an entry in /dev, does not mean that there's a corresponding driver in the kernel. The major number represents the kind of device, while the minor number represents the specific part of that device we're interested in. Let's take a look at some examples:
brw-r----- 1 root operator 14, 0 Mar 7 16:40 disk0
br--r----- 1 root operator 14, 1 Mar 7 16:40 disk0s1
brw-r----- 1 root operator 14, 2 Mar 7 16:40 disk0s3
From this listing, we can immediately see that disk0, disk0s1, and disk0s3 are all block devices with major number 14. Notice what differentiates each of the devices: the minor number - representing a different slice on the disk. Let's look at another snippet:
crw--w---- 1 root tty 4, 0 Mar 7 16:42 ttyp0
crw------- 1 marczak tty 4, 1 Mar 14 06:26 ttyp1
crw--w---- 1 marczak tty 4, 2 Mar 7 16:46 ttyp2
crw--w---- 1 marczak tty 4, 3 Mar 13 19:20 ttyp3
Not only can we immediately see that these are character devices, but, as expected, have a different major number. Once again, the differentiating factor for each of the ttyp entries is the minor number, each addressing a different ttyp.
Take away this: entries in /dev are not device drivers, nor are they code, but rather, they are simple pointers. Creating an entry in /dev does not create code in the kernel to support the device. You create these special entries with mknod, but you should never have to touch a thing in /dev. Of course, how does this fit into our discussion about lsof? You'll notice that many times in a listing, the device entry will be a major and minor combination. Now you know what that means! Just go look it up in /dev if things aren't adding up. In my case, I have many files open on 14,2. So, I'd do this:
# ls -l /dev | grep "14, *2"
brw-r----- 1 root operator 14, 2 Mar 7 16:40 disk0s3
crw-r----- 1 root operator 14, 2 Mar 7 16:40 rdisk0s3
Ah, of course! That makes sense: 14,2 represents my main (and only, at the moment) disk.
Following the device column is the "Size/Off" column. This column shows the size of the file opened, if it is an actual file, the offset into the file, depending on the file type - look for the 0t or 0x prefix - or, possibly no value if lsof can't make a determination, or, is not appropriate.
The next-to-last column is "Node". This will list the file's node number on a local disk, the inode of an NFS file, the Internet protocol type, or possibly nothing, depending on the file type.
We made it! Whew! Last column: Name. This is yet another column whose contents will change based on what type of file is being displayed, and there are many possibilities. I'm going to cover ones that you're most likely to see. For a regular file or directory, the full path name to the file or directory will be displayed. In other cases, we may be looking at a block or character device. Network connections will be listed with appropriate information.
Now, lsof is an incredible utility. Do understand that we're lucky that Apple includes it with OS X: lsof is not included with most Unix distributions. It's not a "built-in," rather; it's an add-on. As such, it works its magic by digging in where it can, making inferences and generally peeking where most utilities don't. It works slightly differently on different varieties of Unix. What I'm getting at here is that it may not be 1000% accurate. Don't let that fill you with doubt, though, as it does a better job than anything out there, really. But there may be a minority of situations where it just doesn't pull up a file, or grabs data from the kernel cache that no longer reflects reality. For the purposes of files that a given process is accessing, though, I haven't had any issues to complain about.
We covered a lot of ground this month. All of this is to get you a little more intimate, intertwined, and aware of the system that is OS X. Many times, troubleshooting and learning about OS X comes down to figuring out where a file is or what files have changed within a certain period of time. Next month, we'll carry on using this column as a foundation.
Media of the month: Go rent (or take off your shelf - you know who you are) Tron. Seriously. Incredibly far ahead of its time. If you watch it after having read this column, it should connect a few more synapses.
Until next month, dig in, experiment and enjoy!
Mac OS X Tech O: Core OS
Kernel Programming Guide:
Ed Marczak owns and operates Radiotope, a consultancy that assists companies with technology planning, and implementation. He helps guide business leaders around the pitfalls of technology, and to find ways of connecting them with their clients. Guidance at http://www.radiotope.com.