User, Meet Apache. Apache, Hug.
Volume Number: 19 (2003)
Issue Number: 7
Column Tag: Untangling the Web
Untangling the Web
User, Meet Apache. Apache, Hug.
by Kevin Hemenway
Or: How To Learn More About Your Cuddly New Web Server.
Hopefully, if you were at all interested in the last column (MacTech, June 2003), you've mentally prepared yourself for the coming months of web serving hemming and hawing. I'll continue to assume that you know more about your own network and internet provider than I ever could, and as such, will only touch briefly on ISP-related workarounds. In this installment, we'll turn on our web server, explore its directory structure, and learn how to interact via the command line. You should know how to operate Apple's Terminal application - if you don't, I highly recommend brushing up on Chris Stone's series, Learning the Terminal in Jaguar, on MacDevCenter.com.
HARTMAN: Today You People Are No Longer Maggots
A word of encouragement: 63% of the web sites on the Internet are using Apache (according to a 2002 report from http://netcraft.com/). Apple, convincingly enough, has included Apache in its own OS X and OS X Server products. It isn't a "ported" version or a slimmed-down feature set, but rather a full-fledged implementation of Apache with all the fixin's. Much of what you learn in these articles will apply to any installation of Apache, regardless of whether it's on Mac, Linux, or even Windows.
If you're familiar with the differences between Mac OS X and Mac OS X Server, you're probably aware that Server contains more GUI based administration tools for programs like Apache, sendmail or MySQL. This doesn't indicate that one Apache is better than the other - everything you can do in Server can be done in the consumer OS, and vice versa. As of this writing, these articles assume you're using the consumer Jaguar, and not the Server version (which we won't be covering).
Alternatively, a word of possible disillusionment: much like any web developer worth his salt will use a raw text editor like BBEdit (which I prefer, see http://www.barebones.com/) to code their HTML, most system administrators eschew the need for hand-holding GUI tools and dive right into text editing a configuration file. This is decidedly un-Mac-like but, until a few years ago, so was the thought of including Linux (or, more accurately, BSD) as an under layer. If you want to become proficient at web serving or programming, you're going to have to get used to editing configuration files. Yes, there are GUI based tools available, but you'll do a lot better if you learn to fish, rather than plunking quarters into a vending machine (that made sense, right? Right!)
Enough soap-boxing. Commence the casting.
KAHN: Butterfly In The Sky... I Can Fly Twice As High!
There are a few different ways you can get a rise out of Apache, the most immediate of which is through the Macintosh GUI. In Jaguar, this setting is hidden underneath the Sharing System Preference; open that now (Apple Menu > System Preferences... > Sharing). The first tab we see, Services, lists a number of capabilities we can turn on or off, as well as our current network address (which may or may not be accessible to the outside world).
In Figure 1, you'll see that "Personal Web Sharing" has been highlighted. To turn our Apache web server on, either put a check in the box on the left, or click the "Start" button on the right. A few seconds later, we'll see the results shown in Figure 2. Of special interest is the new information at the bottom of the screen - the first URL is the home of the primary web site of your machine, and the second is the address of the current user's personal web site.
Figure 1. The Services tab of the Sharing System Preference.
Figure 2. The Apache web server has been started.
Depending on your network or ISP's configuration, you may be able to type (or cut and paste) those URLs into your browser's address bar and see the default pages of your built-in web sites (Figure 3 and Figure 4). If you don't see those pages, or else get an error message concerning connectivity, you should run through some of the steps in last month's column to try and determine your external IP and whether it's viewable to the outside world. For now, you should be able to follow along by using http://127.0.0.1/ and http://127.0.0.1/~username/ respectively.
Figure 3 shows the default web page that is shipped with most Apache distributions - it's just a quick confirmation that the Apache web server is up and running smoothly. Ultimately, you should see this page only once (that once is now). You'll also be informed that the Apache documentation can be accessed from http://127.0.0.1/manual/.
Figure 3. The default root web page of Apache.
Figure 4, on the other hand, shows a friendly blurb that Apple created to ease new users into their virginal web serving experience. It briefly covers what we just did (turning on the web server) as well as quick definitions of HTML and Apache. You should read over both the default pages - you'll probably be deleting the files that represent them shortly.
But, where are these files physically located? The first URL, being the root location of the web server, is served, semantically enough, from the root web directory of Apache: /Library/WebServer/Documents. The second URL, being user specific, is served from /Users/username/Sites (as explained in Figure 4).
And that, as they say, is that. Your Apache web server is running, you're seeing the default web pages, and you could publish vanilla HTML with nary a peek at the underlying sprockets and cogs. So far, though, this hasn't been very satisfying. It's hard to feel good about yourself when all you did was click a button and piddly-type a URL.
Figure 4. The default user web page from Apple.
BATISE (TRANSLATING): If No Pain, Nothing Good Is Born
Being spelunkers of the technical sort, however, it's time we dig deeper. To do so, we'll open up a Terminal and type httpd -V which gives us the results shown in Figure 5. What we're doing is asking Apache (represented by the shell program httpd) to show its compile time settings. These will change depending on your distribution (Redhat's output will be different from SuSE's output, which will be different from OS X's, and so forth). I'll explain some of the more important entries below.
Note: The screenshot, and my explanations, are based off version 10.2.6 of OS X. If you're following along with 10.1 (waiting for Panther, eh?) or earlier versions of 10.2, you may see a slightly different output. Don't fret... the differences rarely equate to much importance, and when they do, I'll make a glib comment.
Figure 5. The results of an httpd -V shell command.
The first bit of info is the server version of the currently installed Apache (and, for the esoteric, the timestamp of when it was actually compiled). Thankfully, Apple has generally been responsive with their security updates, and most OS X users are running the latest release (1.3.27, although, at the time of this writing, there were rumors of an impending 1.3.28).
The next line worth exploring is -D HTTPD_ROOT, which tells us where the Apache binaries have been installed. The most important files, like httpd and apachectl, live in /usr/sbin. All of Apache's modules (read: plugins) live in /usr/libexec/httpd. If you've had experience with Linux programs before, this layout is relatively familiar. The next line, concerning SUEXEC_BIN, can safely be ignored - suexec isn't enabled or shipped under OS X (we can, however, recompile Apache and add it ourselves. Long story. Eventually.)
-D DEFAULT_ERRORLOG, the next entry of importance, is probably the single greatest answer to all your problems before, now, and after. Whenever something goes wrong, check your error log. Whenever something goes right that shouldn't have gone right, check your error log. Whenever you suspect someone is chuckling behind your back, the error log will have their home address. "Check your error log" is the Apache equivalent of "RTFM" - before in-the-know users will answer any of your tech support questions, they'll want to know what the error log says. More often than not, the error log will tell you exactly what went wrong. Don't be embarrassed. Check your error log. The quickest and most helpful way is with tail /var/log/httpd/error_log which spits the last ten lines of any file you pass to it.
We'll talk a bit more about log files in an upcoming column, but for now, realize that a matching /var/log/httpd/access_log covers successful operations (in the sense that the original URL request garnered a "proper" response). In previous versions of OS X, you would have seen a matching -D DEFAULT_ACCESSLOG in Figure 5's output. This has since moved inside the configuration file, which is what our remaining four lines cover.
The first of the last, /etc/httpd/mime.types, contains the mapping between a file extension and the MIME type sent to the browser (or, more generically, the "requesting user-agent"). For now, we'll leave the actual definition of MIME types to a later column, but if a .jpg were served as text/html (and not its proper image/jpeg), then the browser wouldn't be able to properly render and display the picture.
The next entry is The Big One - /etc/httpd/httpd.conf points to the file that handles all the configuration of our Apache web server. It's suitably large, suitably commented, and suitably intimating, enough so that the included early warning should be taken to heart: Do NOT simply read the instructions in here without understanding what they do. They're here only as hints or reminders. If you are unsure consult the online docs. You have been warned. The last two entries in our output can be safely ignored - they're deprecated configuration files that have since been merged with the master configuration. If you get caught using them, your error log will spit out my home address.
HAMMOND: All Major Theme Parks Have Delays
If you recall from the first column, one of the ways an ISP can put a dent in your web serving plans is by filtering incoming HTTP traffic. With such a filter, any incoming requests on port 80 will be dropped, and they'll never reach your anxious and ever-ready Apache. Working around this is easy enough to serve as an introduction to editing the master configuration file. Note: For those who DON'T have an evil ISP, just mentally follow along - you'll have less mundane things to do come next column.
The first hurdle concerns saving our upcoming changes to /etc/httpd/httpd.conf. This file, being "special", requires heighten privileges to allow modifications - privileges that your user account doesn't have by default. To get these special administrative privileges in the Terminal, you'll need to preface your intent with the sudo command. Said command gives you, for that one instant, super powers - enough so to save your changes to an otherwise protected file. The downside of using sudo is that you need to be proficient in a shell editor like pico, emacs, or vi.
Myself, I prefer BBEdit 7.0's shell utility (which can be installed via their Preferences > Tools > Install "bbedit" Tool). BBEdit's utility is smart enough to know that when you attempt to save a protected file, you should be prompted for an administrative password. In my case, I'll launch into our next paragraphs with bbedit /etc/httpd/httpd.conf. For those without BBEdit, utter sudo EDITOR /etc/httpd/httpd.conf where EDITOR is your preferred program of choice. If you're new to shell editing, there's a quick tutorial on using pico in Chris Stone's MacDevCenter.com series.
However you get there, we should now be looking at Apache's primary configuration file. Within the first screen of information, you should see the warning I italicized above, and I'll warn you again: here they be dragons. Friendly Puff-like dragons, but dragons nonetheless.
In general, when you want to modify the configuration of Apache, more often than not, just do a search for your desire and you'll find something worth investigating. In our example, we've got problems with the ISP filtering port 80 traffic, so do a search for the word "port". You will find, soon enough, the following bit of text:
# Port: The port to which the standalone server listens.
# For ports < 1023, you will need httpd to be run as root
Here, you can see that Apache has been configured to start up on the default (and expected) HTTP port 80. Since our theoretical ISP has blocked that, we need to change it to something else. Good alternative choices are 8000, 8080, or 8088. Sadly, whatever you choose will turn all your URLs ugly... if you chose 8000, you'll be forced to give out http://127.0.0.1:8000/~morbus/ instead of http://127.0.0.1/~morbus/. Choose your poison, and save the file.
Since we've made a change to the configuration, we've now got to restart Apache. You'll become familiar with these steps as the column progresses: any time a change is made to the configuration, it won't be put into play until Apache is stopped and then started. Before we actually do that, it's often handy to run httpd -t first. Much like -V gave information about the built-in compile time settings, -t gives us some insight on our configuration file by testing it for errors. If everything's grand, we'll get a Syntax OK... if not, we'll be told on what line we actually screwed up. The benefit of testing before restarting should be obvious - it gives us a chance to fix problems before our visitors start complaining.
We can restart Apache one of two ways: by toggling the buttons in our Sharing System Preference (refer back to Figure 1 and Figure 2) or by using the apachectl shell utility. I personally prefer the shell utility, due to a shortcoming in the System Preference: if your httpd.conf has an error in it, the System Preference will attempt to "start" indefinitely, expecting a positive response that it will never get. With apachectl restart (or its implied brethren: apachectl stop and apachectl start) you'll be given an httpd -t diagnosis if anything goes wrong. Once Apache is restarted, the configuration change has taken effect, and you should be able to visit your newly tweaked http://127.0.0.1:8000/~morbus/ (or whatever actual port you chose).
In our next column, we'll begin exploring our first major feature: server side includes. Often ignored for being "too simple", they can do some fairly useful things without much effort. Until then, students may contact the teacher at firstname.lastname@example.org.
- Peruse through the /etc/httpd/httpd.conf. Familiarize yourself.
- The Apache documentation already loaded on your machine (see Figure 3) is some of the best open source documentation around. I'll refer to it from time to time.
- Liked reading about URL design from the first column? Check out "Toward Next Generation URLs" by Thomas A. Powell and Joe Lima: http://port80software.com/support/articles/nextgenerationurls
- Complete the ever-enlarging animal tree: maggot, butterfly, what, and what?
- Each of the headings is a quote from a movie or TV show. Name them.
Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine (like the popular open-sourced aggregator AmphetaDesk, the best-kept gaming secret Gamegrene.com, articles for Apple's Internet Developer and the O'Reilly Network, etc.), he's an ardent supporter of writing incorrect passwords on sticky notes, just to confuse peepers. Contact him at email@example.com.