MacTech Network:   MacForge.net  |  Computer Memory  |  Register Domains  |  Printer Supplies  |  Cables  |  iPod Deals  |  Mac Deals  |  Mac Book Shelf


  MacTech Magazine

The journal of Macintosh technology

 
 
Visit Smalldog.com

Magazine In Print
  About MacTech  
  Home Page  
  Subscribe  
  Archives DVD  
  Submit News  
  Submit a Tip!  
  Get a copy of MacTech RISK FREE  
Google
Entire Web
mactech.com
Mac Community
More...
MacTech Central
  by Category  
  by Company  
  by Product  
MacTech News
  MacTech News  
  Previous News  
  MacTech RSS  
Article Archives
  Show Indices  
  by Volume  
  by Author  
  Source Code FTP  
Inside MacTech
  Writer's Kit  
  Editorial Staff  
  Editorial Calendar  
  Back Issues  
  Advertising  
Contact Us
  Customer Service  
  MacTech Store  
  Legal/Disclaimers  
  Webmaster Feedback  
ADVERTISEMENT
Click Here

Volume Number: 23
Issue Number: 03
Column Tag: Network Administration

OSX Failover - Part 1

A Beginner's Guide

By Ben Greisler

Introduction

OS X Server has the capability to provide IP failover, a high availability feature that allows a secondary backup server to take over for a failed primary server. It is a great feature and can be very handy keeping your services available, but it has its limitations and constraints. We will review the basics of IP failover in this article and then expand on the concept in later issues. This is aimed at getting the beginner up and running with a minimum of hassle.

IP Failover Concepts

There are two major parts to the failover process: The primary server sending out notification that it is up and running and the secondary server monitoring the signal from the primary server. Kind of like, "Can you hear me now?" but without the primary server repeating "Good" after each question. This process is done via two daemons, heartbeatd and failoverd. Both are available on OS X Server, but not on OS X client.

On the primary server, heartbeatd sends out a message every second via port 1694 on both of the network interfaces involved in the process. This is the signal to the other machine in the failover pair that the primary is still alive and well, or at least well enough to keep a heartbeat going.

On the secondary server, failoverd listens for the heartbeat message on port 1694 on both network interfaces. If it stops receiving the heartbeat message it will start the failover process.

Initial configuration of IP failover starts in /etc/hostconfig where you define what role each server will be. We'll get into the specifics in the next section. There is a startup item at /System/Library/StartupItems/IPFailover that checks for configuration specifications and starts either heartbeatd or failoverd located in /usr/sbin as appropriate.

When failoverd on the secondary server realizes that it isn't receiving a heartbeat message, it sets off a series of events based on scripts located in /usr/libexec. The script NotifyFailover grabs the email address of failover recipient from /etc/hostconfig and sends a message to that address. It then utilizes the ProcessFailover script which will make an IP alias on a network interface, allowing the secondary server to take the IP address of the primary server. Both of these scripts are available for examination and are pretty well commented.

Another purpose of the ProcessFailover script is to execute scripts located in the /Library/IPFailover/ folder. This folder does not exist in a standard install of OS X Server and has to be created if needed. Within that folder can be 4 subfolders: PreAcq, PostAcq, PreRel and PostRel. You can utilize these folders to perform certain actions. The names are self-explanatory and define when the content scripts will be used (i.e.: before IP acquisition or after the IP release, etc). This is where the power and flexibility of IP failover resides.

More information can be found in the High Availability Administration document http://images.apple.com/server/pdfs/High_Availability_Admin_v10.4.pdf , but it does have some incorrect information as referenced in this Apple tech article: http://docs.info.apple.com/article.html?artnum=305066

Setting up IP Failover

In this article, we will set up the most basic IP failover configuration to show that it works. In general, IP failover can be done in three easy steps:

1. Set up OSX Server on two machines with appropriate network configurations.

2. Add the appropriate entries to /etc/hostconfig on both machines.

3. Reboot each machine and have a working IP failover pair.

Easy, huh? Ok, now to the steps needed to accommodate the above.

It is best that the two machines in the failover pair be as identical as possible. You wouldn't want the machines to be on different OS versions, or have a secondary server that can't handle the load that the primary server normally handles. It is also tempting to give the secondary server other work to do while it is just sitting there listening to the heartbeat of the primary server, but refrain from that. Its job is to be a backup server, pure and simple.

We need to set up two networks for the IP failover pair to join. One will probably be your existing network that your other machines use to connect to your server. The other network will be a private network that the pair will communicate over. Typically this will be IP over Firewire. You don't have to do it this way, but it does preserve your secondary Ethernet port on machines that have one and allows a private network on machines that don't have a second Ethernet port (i.e.: MacMini).

Let's set up our networking like this:

Primary Server

192.168.254.165 on en0

255.255.255.0 Subnet Mask

192.168.254.1 Gateway

10.0.0.165 on fw0

255.255.0.0 Subnet Mask

Secondary Server

192.168.254.170 on en0

255.255.255.0 Subnet Mask

192.168.254.1 Gateway

10.0.0.170 on fw0

255.255.0.0 Subnet Mask

Make sure that you have good DNS entries for both machines and test them. Do not enter DNS servers or gateway information in the Firewire interface.

Now, let's edit /etc/hostconfig on each server (using your favorite editor via sudo). Add the following lines:

Primary Server

FAILOVER_BCAST_IPS="192.168.254.170 10.0.0.170"

FAILOVER_EMAIL_RECIPIENT=user@domain.com

Secondary Server

FAILOVER_PEER_IP_PAIRS="en0:192.168.254.165"

FAILOVER_PEER_IP="10.0.0.165"

FAILOVER_EMAIL_RECIPIENT=user@domain.com

So, what does all that mean?

FAILOVER_BCAST_IPS="192.168.254.170 10.0.0.170"-This identifies to the primary server the IP addresses of the network interfaces of the secondary server. You can either specify the IP's of the secondary server or use the broadcast addresses for the subnet (i.e.: 192.168.254.255, 10.0.0.255)

FAILOVER_PEER_IP_PAIRS="en0:192.168.254.165"-This identifies the primary interface IP of the primary server. Note the syntax of "en0:" when creating your configuration.

FAILOVER_PEER_IP="10.0.0.165"-This identifies the secondary interface on the primary server. In this case it is the Firewire port (fw0).

FAILOVER_EMAIL_RECIPIENT=user@domain.com-This is the email address of the person who needs to know about failover actions. Make sure that your machine is configured to be able to send mail. You may need to configure SMTP services.

Hook up the servers to the Ethernet network and connect a Firewire cable between the two machines. Check that you can ping each machine on each interface from each machine. Both machines need to be able to see one another. Now restart the primary machine and then the secondary. This is important because if you start the secondary machine before the primary, it won't hear the heartbeat message from the primary and will try to failover immediately.

Ok, now that each server is up and running let's test it out. On a third machine, ping the primary server's public IP address. You should get a good solid return. Now open up Console on each machine and view the System log. Using tail on /var/log/system.log so you can see what is going on with each machine, alternately pull the Firewire cable and then Ethernet cable on the primary machine. You will notice that you stop getting ping responses from the primary server. Wait a few seconds and you should see the pings start to return again. This is the secondary machine reacting to the loss of the heartbeat message from the primary machine and initiating the ProcessFailover script to allow the secondary machine to acquire the IP of the primary machine. You have just gotten IP failover to work!

To failback, I suggest not just plugging the cables back into the primary machine. In a production environment you may have to shutdown the secondary server in a controlled manner, bring the primary back on line and then bring up the secondary. This is inconvenient as it would be great if you could just have everything failback to its original state, but practice has shown that this doesn't happen exactly the way you would want it to in every case.

Conclusion

So, it's great that we can failover from one server to another, but what good does this really do us? In the next article we will start making IP failover do some tricks for us that will be useful. Stay tuned!

References:

http://images.apple.com/server/pdfs/High_Availability_Admin_v10.4.pdf

http://docs.info.apple.com/article.html?artnum=305066

man heartbeatd

man failoverd


Ben has worked Apple based technology integration projects from Maine to Japan while learning all the way. When not collecting frequent flyer miles he spends his favorite time with his wife and 2.5 year old daughter at their home outside of Philadelphia. He can be reached at magikben@mac.com.


Click here to find out more about our best subscription bundle deal ever!
2 years of the magazine, and the all new MacTech DVD ... at 70% off!



Click on the cover to
see this month's issue!

TRIAL SUBSCRIPTION
Get a RISK-FREE subscription to the only technical Mac magazine!
 
 


MacTech Magazine. www.mactech.com
Toll Free 877-MACTECH, Outside US/Canada: 805-494-9797

Register Low Cost (ok dirt cheap!) Domain Names in the MacTech Domain Store. As low as $1.99!
Save on brand compatible and name brank ink jet and laser supplies.
Save on long distance * Upgrade your Computer
Movies with No Late Fees!

See local info about Westlake Village
SJ * BRJ * BJ * OJ * NITS
Staff Site Links



All contents are Copyright 1984-2007 by Xplain Corporation. All rights reserved.

MacTech is a registered trademark of Xplain Corporation. Xplain, Video Depot, Movie Depot, Palm OS Depot, Explain It, MacDev, MacDev-1, THINK Reference, NetProfessional, NetProLive, JavaTech, WebTech, BeTech, LinuxTech, Apple Expo, MacTech Central and the MacTutorMan are trademarks or service marks of Xplain Corporation. Sprocket is a registered trademark of eSprocket Corporation. Other trademarks and copyrights appearing in this printing or software remain the property of their respective holders.