Debugging with EtherPeek
Volume Number: 13 (1997)
Issue Number: 8
Column Tag: Tools Of The Trade
Debugging with EtherPeek
by Sean Gilligan, Open Systems Development
Use a Macintosh-based network analyzer to circumvent the Heisenberg Uncertainty Principle and debug your network code
If you are developing network software and you're not using a network analyzer to help you debug your product, you'll be amazed at how much time you can save. If you are using EtherPeek or a similar product to debug your network code, you probably wonder how you could do your job without it. After reading this article, you will have a better understanding of the capabilities of EtherPeek and how you can use them to get your job done faster. We'll start with an overview of network analyzers and then move on to the basic features of EtherPeek. Next, we'll look at the methodology of debugging with a network analyzer. Finally, we'll look at a simple HTTP client that contains a trivial bug and then show how to use EtherPeek to help fix it. (This example is based on a true story.)
What is a Network Analyzer Anyway?
In 1986, when I began developing network software on Ethernet at 3Com, our Macintosh team was given an internal, DOS-based, utility program called "EtherSpy". After reading the specifications for 3Com's DOS networking client, we used EtherSpy to understand how the DOS client really connected to the server and to design, develop, and debug the Macintosh version. I have used a variety of network analyzer programs over the years and I can't imagine trying to develop any substantial networking software without one.
A software-based network analyzer is a network utility program that runs on a standard desktop or laptop computer using a standard Ethernet adapter. A hardware-based network analyzer is sold as a complete package of a laptop or portable computer, Ethernet adapter, and software. (Network analyzers are also available for Token-Ring and other networks.) A network analyzer is able to receive all packets on its network (LAN segment) by using a feature of Ethernet called promiscuous mode. Normally an Ethernet adapter and its driver software are programmed to only receive packets that are directed to that specific adapter or are broadcast to all adapters on that network. (Packets destined for adapters on another network on an internet or on the Internet must be directed to a router.) When an Ethernet adapter is operating in promiscuous mode, it will receive all packets on its network.
Once triggered, the analyzer software captures all packets on the network using promiscuous mode, discards packets not matching user-specified filters, and then displays the captured packets in a human-readable format. This human readable format is achieved through the use of protocol decoders. An IP protocol decoder will know to display the 32-bit IP address in dotted decimal notation, based on the location of the IP address within the packet (e.g., 18.104.22.168). Good decoders will take this a step further and do a reverse-name lookup and display the machine name, such as "www.apple.com," in addition to the dotted decimal notation.
For most developers, a software-based analyzer is the only affordable option. A small company may be able to afford only a software-based analyzer, and getting access to a hardware-based analyzer in a large company often requires waiting your turn. A software-based analyzer can be installed on a machine in your office or in your engineering test lab -- you can set it up and leave it monitoring your software for extended periods of time. Hardware-based analyzers do offer a few advantages, such as protocol decoders for some of the more obscure protocols, expert analysis capabilities that can find and recommend a solution to a network problem, and the best performance and hardware error reporting capabilities. In general, these features tend to be more useful for network managers than for software developers. You should verify that any protocols you need are supported before purchasing an analyzer.
There are several software-based network analyzers available for both the MacOS and for Windows. In addition, there are hardware-based network analyzers such as the Sniffer from Network General Corporation. EtherPeek, from the AG Group, Inc., with a list price of $995 (developer discounts are available), is a fraction of the cost of the Sniffer, easier to use, and equally capable for the vast majority of applications. (Network General does offer optional modules to decode client/server messages for Oracle, Sybase, and Microsoft SQL Server. This capability could be very valuable to some developers.) EtherPeek is available for both Windows and the MacOS.
Basic EtherPeek Operation
Before using EtherPeek you must install it on a system with a supported Ethernet adapter. Built-In Ethernet on PCI and NuBus-based PowerMacs is supported. Most other common adapters are supported, but you should check the AG Group's web site before assuming a particular adapter is supported. EtherPeek is drag-installable and does not require any special files in the System Folder.
When you first run EtherPeek you'll be asked to select an Ethernet adapter, then the main window will open. Click "Start Capture" to begin capturing packets in the capture buffer. The "Start Capture" button will toggle to "Stop Capture". Click "Stop Capture" to end your trace. Figure 1 shows the main window with the results of a trace displayed in it. Each line of text in the window displays summary information for a single packet.
Figure 1. EtherPeek Main Window.
Double-click a packet summary line in the main window and you can view the details of a packet (as shown below).
Figure 2. EtherPeek Packet Contents Window.
As you can see, capturing and viewing packets is very easy. By using filters you can tell EtherPeek to capture only packets to or from certain addresses or capture packets containing particular protocols. Triggers are used to tell EtherPeek when to begin or end a trace. Filters are typically used to look at a dialog between two specific machines and ignore the remaining traffic on your network. Later, in the HTTP client debugging example, we will setup a filter to catch only the packets from the Macintosh on which we are debugging. Triggers are useful if you don't know exactly when an error is going to occur, but you know that the presence of a certain packet indicates the error has just occurred or is about to occur.
EtherPeek 3.1 allows you to automatically resolve AppleTalk and IP addresses into their corresponding human-readable names (using the NBP and DNS protocols, respectively). You can also manually enter names for the 6-byte Ethernet addresses of any adapter cards that you want to monitor.
Looking at Figure 2, you can see that EtherPeek has knowledge of the TCP/IP protocols and uses this knowledge to display various fields of the packet in a human-readable form. EtherPeek uses a set of rules called packet decoders to display these fields. EtherPeek comes with packet decoders for a wide variety of protocols and even provides a method for you to write your own decoders (using resource files) if you are working with custom protocols.
Another extensible feature of EtherPeek is support for Plug-Ins. Plug-Ins can extract information from packets and display it on the screen, log it to a file, send it to a pager, pass it to an AppleScript, or even "speak" the message. In the debugging example that follows, we will use a Plug-In that displays Web URLs in the packet list. As with decoders, you can write you own Plug-Ins if necessary.
Methodology: Using a Network Analyzer to Narrow the Field
A network analyzer is mandatory if you are developing low-level networking software such as device drivers and protocol stacks, and is essential in higher-level networking software or even network-based applications. In this section, we'll present a general approach, or methodology, for using EtherPeek to debug network software. For purposes of discussion we'll refer to the software you are developing as an application even though it could be system software or a device driver.
A typical type of error in networking software is the "host not responding" category. This is where the machine running your application sends a message to another machine that should respond to it with another message. For discussion purposes, we will define a message as the smallest chunk of data sent by your application to another machine in a single transaction. Typically, the target application on the other machine sends a message back to your machine as a response. The message could be as simple as an IP (ICMP) ping message that fits in a single packet or it could be a more complicated message that requires multiple Ethernet packets.
Imagine you're debugging your "pre-alpha" application and it reports a "host not responding" error. At this point you are probably smiling because a few minutes before your application was crashing. You step through your code in the debugger and everything looks fine: as far as you can tell the message was built correctly; the send() routine returned with no error; if your code is asynchronous, the completion routine was called with no indication of error; and now your application is waiting around patiently for a response, but the receive() routine never returns any data or your receive handler never gets called. You tell the debugger to "go" and after the timeout period you get "host not responding".
This is where you pull EtherPeek out of your bag of tools. There are six basic explanations for what may be going wrong at this point.
- Bad Message: You really aren't building the message right. It looks good to you, but you misread the specification -- or perhaps it contained an error. If you are communicating cross-platform, maybe there is an "endian" difference you haven't accounted for.
- Message Corrupted: The software below the send() routine that you are calling is corrupting your data. If you are writing an Ethernet driver and the send() routine you are calling is really a register in a prototype controller chip, maybe the chip has a bug.
- Message Not Sent: The software or hardware below the send() routine that you are calling is not working correctly or was not configured correctly and is failing to send your message.
- Message Sent, Remote Host Didn't Respond: A valid message was sent to the remote host, but either the host system failed to deliver it to the target application on the other machine, the target application failed to respond, or the send() routine and supporting software or hardware on the other machine failed to deliver the response to the network.
- Invalid Response Sent: The remote host is sending an invalid response. Perhaps the target application gave its send() routine a bad message or a bad address to send the data to. Maybe the send() routine and its supporting hardware or software corrupted the response data or sent it to the wrong address.
- Response Dropped: A correct response was sent back to your machine, but the software or hardware supporting the receive() routine failed to deliver the message. For example, the Ethernet driver or protocol stack could have run out of buffers. If you are writing an Ethernet driver, the adapter hardware could have run out of buffers. You might want to double check your machine's configuration, as well.
By looking at an EtherPeek trace of the session you can view the packets (if any) that were actually sent as part of your message and the response message (if any). With a basic understanding of your message format and the format of the underlying protocols (if any), you can usually narrow it down to one of the following six explanations.
- Bad Message: You will be able to see your message in the EtherPeek trace. For any normal protocol stack, such as TCP/IP, EtherPeek will have decoded the protocol information and provided you with a separate display of "user data" that will contain your message. If you see a message and no response, look closely at your message. Was some padding introduced that didn't show up in the debugger? Double check the specification. If you can trace a known working application that talks to the same target, compare the two traces. If you still don't see anything wrong, investigate case #4.
- Message Corrupted: Similar to case #1, but the bad data you see in the message doesn't look like anything you could have done (but don't be too quick to point the finger). If parts of the packet other than your "user data" are corrupted, that is a sign that it may not be your fault after all (unless your application is scribbling on memory).
- Message Not Sent: Easy to spot -- you have an empty trace. (Make sure you have EtherPeek installed correctly and your filters are correctly configured, though.)
- Message Sent, Remote Host Didn't Respond: You won't see a response message in the trace. If you're sure that the message you sent was valid, then the remote host is the culprit.
- Invalid Response Sent: You see that the remote host has sent an invalid response. Case closed.
- Response Dropped: You see a correct response coming back to your machine. If you're sure your debugger breakpoint was set correctly and no response was delivered, then you can safely blame the receive() routine or the configuration of the software or hardware responsible for receiving the message.
We'll demonstrate setting up a trace later, but for an easily reproducible failure, setting up, running, and analyzing a trace can be done in as little as fifteen minutes. With this much information instantly available from EtherPeek, I don't understand why more network software developers don't use it.
One of the nicest things about using EtherPeek as a debugging tool is that is has no effect on your program's timing. In physics, the Heisenberg Uncertainty Principle says that you can't observe a system without affecting it. In software development the presence of a debugger, let alone actually using one, can cause the bug's symptoms to change or go away altogether. Network software, particularly at the lower layers, is real-time software: packets are received from other systems asynchronously, timers are used to compute time-outs and to retransmit packets, and interrupt handlers contend for critical sections.
You can't step through this type of code in the debugger when you have a problem. With EtherPeek you can collect a significant amount of information about your bug without setting any breakpoints or stepping through your code. Sometimes the information you glean from a trace will tell you exactly what you must do to fix the bug. If not, you can set your initial breakpoints much more intelligently.
Example: Debugging a Simple HTTP Client
Now, it's time to solve a real problem with EtherPeek. This example is based on a real problem that OSD solved for a client. (The names have been changed and then rewritten to protect the innocent.) To follow this example step-by-step, you'll need two Macintosh systems connected to the same Ethernet network with access to any web server. On one machine you will run EtherPeek and on the test system you will run the buggy HTTP client. We'll add some dramatic context, to make things interesting, and get on with the debugging...
Imagine you arrive at work one morning to find out that one of the engineers working on your project has abruptly left the company to join a monastery overseas. His part of the project was a set of utility programs to download files from an HTTP server. He was working on a test application called "WebGet" that would use the HTTP protocol to download a single file from a web server. You are asked to finish the test application, so you copy the CodeWarrior project file to your hard disk and are able to successfully build WebGet. When you run WebGet, it prompts you for a URL to download. You enter a URL and press return and the application hangs. The hard disk doesn't sound like it downloaded anything, so you assume that the download was unsuccessful and that the program lacks some error checking.
Rather than dive into the source code, you decide to apply the methodology we examined above and take a look at the WebGet's network traffic using EtherPeek. The first thing to do is setup a filter to capture only IP traffic to or from the machine on which WebGet is running. Figure 3 shows the dialog used to create an EtherPeek filter named WebGet from Duo that will capture only IP packets from the Ethernet-attached PowerBook Duo 2300 on which you are running WebGet.
Figure 3. Filter Settings for WebGet.
For my network, I check the Address Filter box and enter the name SeanDuo IP as the only address captured packets should be sent to or from. This assumes that the name SeanDuo IP exists in EtherPeek's name table, otherwise I would have to enter a numeric IP address. I then check the Protocol Filter box and select Internet Protocol as the only protocol to capture -- this way I can still use AppleTalk on the test system.
After confirming these filter settings, I use the Plug-Ins dialog box and turn off IP Details and turn on Web URL for display in the packet list. This will show me the URLs of any web pages requested on the packet summary line in EtherPeek's main window. Finally, I click Start Capture in EtherPeek's main window and run WebGet.
Looking at the trace (Figure 4), our attention is drawn to packet #3, because EtherPeek's Web URL plug-in is showing a URL of "/" at the end of the packet summary line. Let's double click on packet #3 and take a closer look.
Figure 4. A Trace of the Problem Session.
In the TCP Data Area field of the Packet Contents Windows (Figure 5), we see what must be the HTTP GET Request. Unless you know HTTP fairly well, there is nothing obviously wrong with this packet. It says GET / using HTTP/1.0. Notice that the packet is terminated with a carriage-return/line-feed combination (0d 0a), which we'll refer to as CRLF. Looking back at the trace in the main window we don't see any packets large enough to contain any data coming back from the server. Packets #4, #5, and #6 are all 64 bytes long and therefore must be TCP overhead packets.
Figure 5. The Problem Packet.
Let's examine this evidence with our methodological framework from above. The basic situation looks like a "host not responding" error. We can rule out explanations #5 (invalid response) and #6 (response dropped), because the web server didn't send any response at all. Since we are talking to a production web server, we can probably rule out explanation #4 (valid message sent, remote host didn't respond). It can't be #3 (message not sent), because we've seen the request. It's probably too soon to blame it on Open Transport, so let's rule out explanation #2 (message corrupted) for now. That leaves explanation #1 (bad message) as the logical place to start.
Since this problem was just "dropped in our lap" and we haven't had time to read the HTTP specification, let's try running Netscape 3.0.1 and see what it sends for a request. Then we'll compare our request with Netscape's and see how ours is different. Start a new trace and run Netscape on the test system. You should get something that looks similar to Figure 1 (all traces in this example were generated using IP addresses so DNS lookup packets are not shown). Packet #4 is the GET request from NetScape 3.0.1 and is shown in Figure 2. One significant difference between Netscape's GET request and ours is that Netscape seems to be passing a handful of parameters of the form:
parameter-name : parameter-value CRLF
Connection: Keep-Alive CRLF
User-Agent: Mozilla/3.0 (Macintosh; I; PPC) CRLF
Do we have to add some of these parameters to our request? Let's keep looking -- there is one more difference: each of the parameters is terminated with a CRLF, but there is an extra CRLF at the end of the entire request! If we assume the parameters are optional and remove them all, we still seem to be missing this extra CRLF at the end of the packet.
The Problem Function
Now that we suspect we are building our HTTP GET Request incorrectly, let's look at some code. It seems likely that the problem might be in a routine called BuildGetRequest() (Listing 1). This routine builds the GET Request that we saw in Figure 5. It takes as input a C string containing a Uniform Resource Identifier (URI). A URI is a more general form of a Uniform Resource Locator (URL) that allows for a relative path. The relative path in this case is "/" which tells the web server to return the default document from its root directory. BuildGetRequest() takes the URI that it was passed and inserts it into a string that conforms to the syntax for an HTTP Get Request and puts the resulting string into the specified buffer. A valid HTTP Get Request is defined in Section 5 of RFC 1945 "Hypertext Transfer Protocol -- HTTP/1.0". From looking at the RFC (or from looking at the correct line of code that is commented out), we can see that the request requires a CRLF. Now, we just make a one-line code change and it's time to go home.
Listing 1: BuildGetRequest()
Builds an HTTP Get Request to retrieve the specified URI.
// Builds an HTTP Get Request to retrieve the specified URI.
// theURI C String with URI to retrieve
// requestBuffer Pointer to buffer to return built
// request message in.
// *requestBuffer Buffer will contain the request
char *theURI, // URI to retrieve
char *requestBuffer) // Where to put request message
// Invalid HTTP 1.0 (missing a CRLF) ...
char request = "GET %s HTTP/1.0\015\012";
// Correct HTTP 1.0 (has two CRLF's)...
// char request = "GET %s HTTP/1.0\015\012\015\012";
// Build the request message
sprintf(requestBuffer, request, theURI);
We have just used EtherPeek to fix a bug in unfamiliar code that uses a protocol with unfamiliar internals. Although it was a relatively simple example, I'm sure you see the full potential in this method of debugging. By using a network analyzer to quickly eliminate possibilities, you can greatly reduce or even eliminate much of the time you would spend stepping through your code in a debugger. If you are writing or debugging network software, you should definitely add EtherPeek to your toolbox!
Sean Gilligan, Sean_Gilligan@opsystems.com, has been developing cross-platform networking software on the Macintosh and other platforms since 1985. In 1988, he founded Open Systems Development (OSD) to provide software development services to hardware and software manufacturers. OSD develops device drivers, protocol stacks, and networking applications for Macintosh, Windows, and Web-based platforms.