Feb 02 Mac OS X
Volume Number: 18 (2002)
Issue Number: 02
Column Tag: Programming in Cocoa
by Dan Wood, Alameda CA
A Feature-Rich Cocoa Class for Fetching URLs
Cocoa developers who have made any effort to support URLs in their application—for example, to fetch a resource from a Web or FTP site—have no doubt encountered NSURL and NSURLHandle, foundation classes for URL retrieval. NSURL is a class that wraps around any URL (as specified in RFCs 1808 and 1738, not that you need to know) and allows access to the individual components, such as the host, the port, and the scheme). It provides rudimentary methods for fetching the contents of that URL. NSURLHandle is a more robust class that represents the data associated with a URL, and it provides methods for foreground and background loading of these data. But it doesn't go far enough. This article introduces CURLHandle, an open-source class that you can use in your applications for even richer URL access.
With NSURL and NSURLHandle, you can create a full-featured application that can fetch files via ftp:// URLs, Web-based data via http:// URLs, or local files wrapped in file:// URLs. But if you need more than what's provided in these classes, you may find yourself a long way from home. The NSURLHandle class could use some additional features that Apple didn't provide, such as:
- Proxy support. If you have proxies configured in the System Preferences, NSURLHandle will not pay attention to those settings, and fail. You yourself may not use proxies, but if you are writing an application for the general population, you will need to take this class of user into account. In addition, you will probably want to support authenticated proxies, meaning that an ID and password will need to be provided to access the proxy.
- "Post" support. If you are attempting to fetch a Web page's content via an HTTP POST rather than a GET, you cannot accomplish this with NSURLHandle.
- Better access to HTTP request and response headers. If you wish to set a particular header in your HTTP request, such as the user agent, or you wish to process the header from the response, such as the content length, you are limited with NSURLHandle.
In creating Watson, I found that I needed such abilities! Many Web sites that the program accesses need to be POSTed to or have to have the request headers set. I also wanted to be able to show determinate progress bars as the pages load, and I couldn't do that without access to the headers to determine the transfer sizes. And if I had been using NSURLHandle, I would not have been able support users working behind a proxy server. But where to turn?
CURL Has These Abilities
CURL, an open-source utility by Daniel Stenberg, is a great tool for accessing URLs. Apple includes the curl command line tool with Mac OS X (version 10.1 and greater), so you can fetch web resources via the command line. It is indispensable for shell scripts or just quick fetching of a resource via FTP. For instance, you might download OmniDictionary with the following command:
curl -O ftp://ftp.omnigroup.com/pub/software/
CURL's functionality is available both as this command-line utility, and as a library of C functions, called libcurl, which must be downloaded separately from CURL's home page at http://curl.haxx.se/. libcurl is a full programmatic interface to the functionality in CURL. There are simple C functions for setting up the transfer, for performing it, for processing the data as it is received, and for getting information about the transfer after the fact.
CURL is a particularly attractive tool because its license is quite permissive; it can be used in commercial distributions without any hassles.
C, Meet Objective C
Writing pure C code in the middle of an Objective C Cocoa application is possible, except that you have to stop thinking in terms of objects and start thinking in terms of pointers, structures, threads, and callbacks. This is not any fun for a spoiled Cocoa programmer! Fortunately, somebody (that would be me) has "taken the hit" and wrapped libcurl in an Objective-C class for you, so you don't have to. Nevertheless, it's a good learning exercise to understand how this works, as it brings up issues of memory management, threading, ports, notification, and caching. So let's go over how this was done.
CURLHandle is a subclass of NSURLHandle. The NSURLHandle class provides some of the functionality that we want to inherit, and a lot of structure. Actually, NSURLHandle is a somewhat abstract class, with hidden subclasses doing the real work behind the scenes. (See http://developer.apple.com/techpubs/macosx/Cocoa/TasksAndConcepts/ProgrammingTopics/Foundation/Concepts/ClassClusters.html for a discussion of "Class Clusters.") In essence, NSURLHandle provides an interface to conform to, the advantage being interoperability with other Cocoa classes. In fact, you can configure your program to use CURLHandle instead of NSURLHandle globally using the curlHelloSignature:acceptAll: class method by passing in a YES as the second parameter.
The documentation for NSURLHandle provides hints for how to subclass it. (It probably wasn't intended to be subclassed to handle protocols such as FTP and HTTP, since it handles that already. But there's no reason not to, either.) One challenge in building CURLHandle was that the source code to NSURLHandle, like all of Cocoa, is not available to the public. Anybody who has tried to subclass an non-trivial object for which they do not have the source code is probably aware that this can be tricky! Methods in the base class interact with other methods in ways that are not documented; it's not always clear (especially in a language such as Objective C without "public" vs. "protected" methods) when a public method is or is not meant to be overridden (and whether the superclass behavior should be inherited before or after the override or not at all). So subclassing an opaque class requires a lot of good documentation (which has been lacking in much of Cocoa) and a bit of trial and error. OK, a lot of trial and error!
The architecture of libcurl is such that before the transfer is to take place, the options for what to do with the incoming data are set, in the form of a C callback, as follows:
The libcurl API indicates that the buffer is passed into the callback as a void *. When the transfer is invoked, the callback is invoked multiple times as chunks of data come in.
size_t function( void *ptr, size_t size, size_t nmemb,
To wrap this in a class, it makes sense to minimize the amount of pure C, and jump back into Objective C from the C callback as quickly as possible, so we can update the GUI (in the foreground thread) and perform other operations from familiar object-oriented methods. To accomplish this, I pass a reference to the CURLHandle object as the inSelf buffer parameter. inSelf is cast back to a CURLHandle object and the Objective C syntax takes care of the rest. This way, the entire functionality of the callback, except for this little stub, is written in Objective C.
void *ptr, size_t size, size_t nmemb, void *inSelf)
return [(CURLHandle *)inSelf curlWritePtr:ptr size:size
Loading a URL in the foreground thread is perfectly fine for a file:// URL, and is possible — but not a good idea—for a remote URL, since you don't want your app to have to wait for data from the remote host. Still, it can be done if you and your users won't mind a "spinning rainbow" cursor.
Foreground loading is very straightforward; -[NSURLHandle loadInForeground] is overridden to call the libcurl method curl_easy_perform. What differs from the standard NSURLHandle foreground loading is that it's possible to specify an indeterminate progress bar to animate while the data is coming through. Each time the curlWritePtr: callback is invoked from a foreground load, the animate message is sent to the attached NSProgressIndicator object.
To really make a program useable, a program should load network URLs in a background thread, so that the foreground will still be responsive.
Threading, on any platform or language, is not trivial. Cocoa has some techniques to shield you from some of the intracacies of threading by using a callback-like mechanism, working in conjunction with the event loop that covers simple cases of multithreading. An application can, for example, start a URL loading in the background and then respond to messages—treated much the same as user events—generated by a background thread. A client of an NSURLHandle or a CURLHandle (implementing the NSURLHandleClient protocol) just implements methods to react to progress loading; these are invoked in the foreground thread even though the loading is taking place in the background.
To create this illusion of simplicity, CURLHandle has to do actual threading. Cocoa provides some classes that wrap Mach threads, but there are always subtleties to be nailed down. My goal was to initiate a background load in a new thread but be notified in the foreground thread as the load progresses so that the user interface can be updated or the load stopped. Thus, the background thread needs to send messages to the foreground thread.
For CURLHandle, I make use of several classes: NSThread, NSRunLoop, NSPort, and NSPortMessage. I haven't found a need for any of the locking classes, and I chose not to use the more sophisticated Distributed Object methods that were a bit overkill in light of the very simple messaging needed here.
In the initialization of each CURLHandle, I store a reference to the current thread, so that during a load, when a callback needs to find out if it is running in the main thread or not, it can just compare the current thread to that reference. I also store a reference to the current NSPort, which is used as the sending and receiving port for the messages from the background thread to the foreground thread. I set the delegate of that port to be the CURLHandle object itself, so that it will handle messages sent to that port. Finally, I add that port to the Run Loop, meaning that when the application is processing events, this port will be a valid receiver of messages. These messages will be received as the foreground thread waits for and processes events.
mMainThread = [[NSThread currentThread] retain];
mPort = [[NSPort port] retain];
[[NSRunLoop currentRunLoop] addPort:mPort
To actually begin a background load, a new thread is detached in the override of beginLoadInBackground. This causes the method curlThreadBackgroundLoad: to be sent to self in the new thread.
In the new thread, curlThreadBackgroundLoad: invokes the curl_easy_perform C function to actually invoke the transfer. Periodically, when data arrives, the curlHeaderFunction and curlBodyFunction are invoked, as described above. These wrappers invoke their Objective-C counterpart, curlWritePtr:size:number:message:, in order append the data to the data buffer, and update the user interface.
If this method determines that it is not running in the foreground thread, it constructs an NSPortMessage to send to the port. This message is sent from and to the NSPort we saved earlier. The port is given a numerical message (which we define in code to represent "done loading," "loading failure,' "header data," or "body data"), and then the message is sent (with a timeout value of 60 seconds, meaning that it will attempt to send the message for 60 seconds before giving up).
= [[NSPortMessage alloc] initWithSendPort:mPort
sent = [message sendBeforeDate:[NSDate
After the load is finished in the background thread, a final message is sent to the foreground, indicating that the load is finished, and then the thread exits.
While the thread is executing in the background, the foreground thread should be idling, waiting for user inputs and messages in the port we created. When a message is sent from the background, the handlePortMessage: method is invoked in our class. We use a simple switch statement to process the numerical message and handle the cases of a completed load, an error in loading, or new header or body data. (Again, distributed objects would be handy here if the communications model got any more complex than this.) A typical handling of such a message would be to invoke the NSURLHandle method that is be used as data is loaded -[NSURLHandle didLoadBytes:loadComplete:]. The client that is paying attention to the URL Handle could then update a progress bar to show that the data is coming in.
Cancelling a load is simpler than it might seem. Rather than trying to have the foreground thread literally send a message to the background thread to stop the load, we just set a simple flag that will signal libcurl to abort a load the next time it invokes the callback. In other words, we just leave a message to cancel the load that the background thread will be checking for each time through.
Thus, we override -[NSURLHandle cancelLoadInBackground] to set the mAbortBackground field of the CURLhandle to YES. And in our callback (or rather, in the Objective-C method that our callback invokes), we just check for the state of that field. If it is set, then we return a value of -1, which will indicate to libcurl that the load has been cancelled.
Dealing with memory is simple in Cocoa once you get the hang of it, knowing that you must retain anything you want to keep around, and that you can autorelease any object that you just need for the duration of the processing you are doing. But in order to combine this Object-based memory model with the raw malloc-style memory blocks needed for libcurl, a few interesting design decisions had to be made.
To set an a loading option in libcurl before the transfer takes place, you typically use curl_easy_setopt with a C string as a parameter. The method -[NSString cString] returns a pointer to a C string that is effectively autorelease'd. So at first glance, it would seem that you could just wrap a libcurl function in a higher-level function that takes an NSString as a parameter and converts it to the C string.
Not so fast. This doesn't quite do the right thing for background loads. When you set your options, and then begin loading in the background, the method that you used to set these options will finish, and then the run loop will clean up the "autorelease" pool. This means that the memory that had been temporarily allocated for these strings will be deallocated, even though libcurl is, in the background thread, still trying to make use of that memory!
So instead, we store the options that we want in an NSDictionary attached to the CURLHandle object, as shown in Listing 1. Then in the background thread, right before the load actually commences, we grab all these options from the dictionary and set the libcurl options. This way, the memory stays around while we still need it, and we don't have to remember to explicitly deallocate anything. Problem solved.
Listing 1: CURLHandle.m (fragment)
Store the given string as an option for the load
- (void) setString:(NSString *)inString
Just before the transfer commences, grab the entries from the saved dictionary and set these as options
for the transfer.
- (void) prepareAndPerformCurl
struct curl_slist *httpHeaders = nil;
NSEnumerator *theEnum = [mStringOptions keyEnumerator];
while (nil != (theKey = [theEnum nextObject]) )
= [mStringOptions objectForKey:theKey];
mResult = curl_easy_setopt(
mCURL, [theKey intValue], [theString cString]);
if (0 != mResult)
// ... Now perform the transfer with curl_easy_perform... (not shown here)
Cache Management and Visualization
The NSURLHandle specification allows you to cache your handles and their contents; this is convenient when you might be retrieving the same resources multiple times and don't want to perform redundant transfers. CURLHandle employs a very simple caching mechanism, by just remembering everything you cached while the application is running (although it could be modified for a more sophisticated "Least Recently Used" mechanism). There are a few additional methods not provided in NSURLHandle, however, for clearing out the cache and for visualizing the contents of the cache.
The cache is implemented as a simple NSMutableDictionary. The key for each entry is a URL; the value of is the CURLHandle object, which contains the data that was loaded.
When writing an application that uses CURLHandle, you may find it helpful to be able to "see" the contents of the cache while the application runs. To aid visualization, CURLHandle posts notifications (CURLHandleCacheCreateNotification, CURLHandleCacheDeleteNotification, and CURLHandleCacheChangeNotification) when the cache is changed.
Why is this of any use? You can build debugging code in your application that listens to these notifications, and updates a display of the contents of that cache. In this example, I use an NSTableView (with a single column) to display each of the URL keys in the cache.
This technique, by the way, is handy across a wide variety of applications. If you have a data structure in memory that you would like to be able to visualize while a program is running, just write a simple class that mirrors its contents in a table view, and build a nib file with a window and a table. You will save quite a bit of debugging time if you can glance at your internal states rather than relying on logging or breakpoints. Listings 2 and 3 show the controller object for a window that displays the cache.
Listing 2: DebugCurlController.h
@interface DebugCurlController : NSObject
IBOutlet NSTableView *oTable;
Listing 3: DebugCurlController.m
When the nib file is finished loading and all the actions and outlets are connected, subscribe to
notifications of the cache being changed.
- (void) awakeFromNib
When this object is deallocated, release our reference to the cache, and stop listening for
- (void) dealloc
Respond to a notification that the cache has changed. Get the associated object (the cache itself)
if we don't have it yet, and then tell the table to redisplay itself.
- (void) updateCURLHandleCache:
if (nil == mTrackingCache)
mTrackingCache = [[inNotification object] retain];
Return the row count, which is just the count of items in the cache.
- (int)numberOfRowsInTableView:(NSTableView *)inTableView
return [mTrackingCache count];
Return a value to place in the given column/row of the given table. There is just one column, so just
return the URL associated with the nth key in the cache.
- (id)tableView:(NSTableView *)inTableView
return [[mTrackingCache allKeys] objectAtIndex:inRow];
libcurl has a lot of pre-transfer options and post-transfer queries. Most of them are not central to the basic functionality of CURLHandle. In order to keep the implementation of CURLHandle as readable as possible, I've split the non-essential functions into a separate file and into a class category on CURLHandle. This file can be included in your application if the functions are needed, or it can be left out. If you include the file, you have access to simple Cocoa methods for setting HTTP headers, cookies, timeout values, and querying transfer speeds and download sizes.
The CURLHandle package comes with a testing application, aptly named CURLHandleTester (figure 1), that serves to put CURLHandle through its paces. It boasts a simple, geeky user interface that allows you to provide a URL to load and set a number of loading options, then invoke the transfer and examine its results. It takes full advantage of the CURLHandle (and NSURLHandle) interface, and provides examples on starting and stopping a transfer, monitoring progress with a progress bar, and viewing the headers and body of the transferred data. Like CURLHandle, the source code is freely available.
Figure 1. CURLHandleTester in action.
Opportunities for Expansion
The goals for CURLHandle were to extend the basic functionality already available in NSURLHandle of accessing data via the HTTP and FTP protocols, by adding the ability to work with proxies and be configurable for other needs such as HTTP Post, getting and setting cookies, accessing headers, and so forth. By wrapping around CURL, I was able to achieve this and have a foundation for more options.
To get CURLHandle for yourself, go to http://curlhandle.sourceforge.net/. The latest version is available there, along with plenty of documentation and the "CURLHandleTester" project. Since this is open-source, it will continue to improve as more developers embrace and extend its capabilities.
Dan Wood has been programming on the Mac since the days of black and white screens, in various languages and frameworks. His latest creation is Watson, an application written in Cocoa. You can reach him at firstname.lastname@example.org. Dan thanks John C. Randolph for a technical review of this article.