TweetFollow Us on Twitter

Nagios on OS X, Part 2

Volume Number: 22 (2006)
Issue Number: 2
Column Tag: Programming

Patch Panel

Nagios on OS X, Part 2

by John C. Welch

Setting Up Nagios 2.0

Since part one of this article was published, there have been some changes in Nagios, (the reasons behind the delay for part two in fact.) Nagios 2 has reached the release candidate stage, and as such, I felt that this part should deal with what (should) be the current version by the time you read this. Luckily, this doesn't really change anything in part 1 other than the version of Nagios you download and install, and one minor change for the configure step of installing Nagios.

Addendum and Errata

The change involves an additional group, the nagios command group. I use nagioscmd for the name of this group, and you create it as you did the nagios group in part 1. This brings us to a rather obvious error in part one, and one I should have caught. The configure command in part one is incorrect, and if it works at all, will give you an incorrect setup. With Nagios 2 in mind, the correct configure command is:

./configure --with-gd-lib=/opt/local/lib --with-gd-inc=/opt/local/include 
   --prefix=/usr/local/nagios --with-cgiurl=/cgi-bin --with-htmlurl=/ --with-nagios-user=nagios 
   --with-nagios-group=nagios --with-command-group=nagioscmd

That should work correctly for you. The rest of part one should be unchanged for you, it has been for me.

Initial Configuration

One change in Nagios 2, and one that will be welcomed by administrators new to Nagios is the initial configuration. In Nagios 1.2 and later, you had to use multiple configuration files, and getting them set up, and grasping the relationship between them was a little tricky. With Nagios 2, if you're new to Nagios, or you want to play with a smaller setup before you get into mapping the entire Internet, you can now use a much smaller number of config files (around 3 for a minimal configuration). In fact, you have your choice of two templates to work from, one called "minimal.cfg-sample" and one called "bigger.cfg.sample". As the names imply, they are for minimal, and slightly bigger Nagios setups.

In case I haven't done this, let me stress one critical point: This article is not, nor should it be taken as a replacement for the Nagios documentation! That documentation is far more complete, and if this article disagrees with the Nagios docs, the Nagios docs should be assumed to be more correct, unless they are assuming !Mac OS X.

The Config Files

As I have alluded to, Nagios makes use of text config files for its setup and to do its work. They can seem daunting at first, but they do follow a logical flow and they should not intimidate you in the least. When you initially install the config files, they all have the name pattern of <something>.cfg-sample. When you have a file set up as you like, remove the -sample from the end of the name, and Nagios will be able to use it. Note that in general, any changes to a config file will probably require a restart of the Nagios process for those changes to be used.

Basic Nagios config theory

Before we get into specifics of the files, we need to look at the relationship of things in Nagios, so that we might have a better mental picture of what's going on. The basic 'unit' in Nagios is the host. A host is a computer, a router, switch, etc. It's anything that Nagios directly probes. Each host has a series of characteristics, like IP address, and name that define it to Nagios. Since you can have hundreds, if not thousands of hosts in a network, and applying settings and services to them individually would be rather tedious, you have hostgroups. Hostgroups are just what they sound like, a collection of hosts that allow you to more easily work with hosts. So you can apply a probe to a single hostgroup instead of 300 hosts. Easier, no?

Now, since in addition to monitoring, we have to notify people we have contacts. Contacts are the human versions of hosts. You can use email, paging, whatever method you can script Nagios to use to notify contacts. As we'll see, you can also set up working hours and non-working hours for notifications too, so non-critical notifications aren't paging people at 2am. Like hostgroups, we have contactgroups, since in larger organizations, you may have quite a few people who need to receive notifications from the same host or hostgroup.

The actual probes Nagios uses are checkcommands, and that's what determines the information that Nagios checks. However, you don't directly apply the checkcommands, rather you use services, which use the checkcommand definition, and other parameters to probe host(groups). This makes it easier to have many checks per host.

Another basic concept is the dependency. This can be used to make sure that you're not getting spurious alerts. For example, if you have a monitored switch that has 24 monitored hosts connected to it and the switch goes down, you'd potentially get alerts from all 25 hosts and however many services are being monitored on each host. If the switch is down, you can't talk to the hosts anyway, so those alerts are somewhat useless, as what they're really telling you is that the hosts are unreachable, not that they're actually malfunctioning. So, we use dependencies to say "If Switch A is down/unreachable, don't bother me with alerts from its attached hosts." You can have both service and host dependencies, and both are critical to a happy Nagios installation.

Finally we have extended info, for both hosts and services which can be thought of as metadata. This is useful for things like 3-D icons, connecting Nagios to various graphing utilities, etc.

nagios.cfg

The nagios.cfg file is the heart of Nagios. Without it, nothing works. So, understanding it is critical. Luckily, while it's long, it's really pretty simple, and well-documented. (I really have to put in a commendation to the Nagios team for their documentation. It's an excellent example of how to do useful documentation, and far more projects, commercial and open source both could do well to learn from Nagios' example.)

The nagios.cfg file is really a listing of other configuration files and settings that apply to Nagios as a whole. For example this line:

log_file=/usr/local/nagios/var/nagios.log

Tells Nagios where its log file lives. That's the default from a new Nagios installation. Going down nagios.cfg, we see the entries for the checkcommands file, the misccommands file, and the minimal.cfg file, which is the very basic Nagios configuration file, and handy for new users. As you go down the list, we see that you can get quite complex with the configuration files, giving you the flexibility to grow your Nagios installation to as large as you need to be. (At the high end, you can have clusters of Nagios servers all talking to each other. We're not going to get into those here.)

Other entries in nagios.cfg include the user and group Nagios runs as, and whether external commands are to be used. (note that to use the Web interface, you have to set check_external_commands=1 in nagios.cfg) One of the nicest things about Nagios is that pretty much every entry in nagios.cfg is nicely commented, so you don't have to guess at what any of them does. Since this is just a basic intro to Nagios, we only need to enable external commands. We'll leave the rest alone.

minimal.cfg

This file is (obviously) a minimal config file that will let you get Nagios up and running with a minimum of work. While it's not going to be one you'll want to use for large installations, it has everything you need to get started.

The first section defines time periods, (can be separately defined in timeperiods.cfg). While minimal.cfg only defines the "24x7" time period, you can create others to suit your own needs, (working_hours, non_working_hours, weekends, etc). The syntax is pretty self explanatory; a 'define timeperiod' block, containing a name used by other config files, an alias that can be more descriptive, contain spaces, etc, and the days in the timeperiod with hours, in 24hr time, that each day covers in the timeperiod, one per line. You can create more if you like, or just use the 24x7 one.

The next section defines the commands Nagios uses to talk to hosts and hostgroups, (can be separately defined in checkcommands.cfg and misccommands.cfg. misccommands is used for things like notification commands and other commands that don't directly use a nagios command plugin). This is where you define the commands that make up services. Looking at this section, we see the checkcommands.cfg file has some commands already set up. We'll skip down past the first two notification commands, to the check-host-alive command, as it's simpler to explain. The basic syntax is simple:

# Command to check to see if a host is "alive" (up) by pinging it--
a comment for your use
define command{ 
--start the command definition block
   command_name   check-host-alive --the name you refer to the command
-- as in the rest of Nagios
   command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w 99,99% -c 100,100% -p 1 
--the actual command
   }

The command string is pretty straightforward. The $USER1$ macro, defined in resource.cfg, is the path to the plugin directory, normally /usr/local/nagios/libexec. You can use the $USERx$ macros to define all kinds of commands, like SNMP community strings, etc. You can have up to 32 of them, and they make life a lot easier. The check_tcp is the actual command executable name, (in general, Nagios command plugins are of the form check_<functionname>.), followed by various switches. A common one is the "-H" switch, which is the host address. Rather than entering it manually for every host, we use the $HOSTADDRESS$, which is defined in the host entry for minimal.cfg, and we'll see that in a bit. The rest of the switches are command-specific, and are explained by the commands help function. You can bring this up in terminal by running /usr/local/nagios/libexec/commandname -h in Terminal, and this will bring up the command's syntax definitions. I've yet to run into a command where this didn't work. Before using a command on a host or hostgroup, it's a good idea to use -h to make sure you know how the command works and what it is going to tell you.

Next up is the contact definition, (also defined by contacts.cfg), which tells Nagios who to notify when it needs to. Since minimal.cfg is by design a simple setup file, there's only one entry here, although you can put more in if you like. The contact_name is how you refer to the contact in the rest of Nagios, the alias is there for more human-friendly labeling. Note that you have separate host and service notification periods. While they're the same in this default, there are cases where you may want them to be different. For example, a backup service that only runs on the weekends wouldn't need 24x7 monitoring, but the host it runs on would. So you could set up a "backup_admin" contact that only received service notifications during a "weekend" time period.

The service and host notification options lines are nothing but a set of switches defined thusly:

    d = send notifications on a DOWN state

    u = send notifications on an UNREACHABLE state

    r = send notifications on recoveries (OK state)

    f = send notifications when the host starts and stops flapping

    n = no host notifications will be sent out

    w = send notifications on a WARNING state

    u = send notifications on an UNKNOWN state

    c = send notifications on a CRITICAL state

Note that "u" can have different definitions depending what kind of notifications you have. "Flapping" is the Nagios definition for a host that is changing states too often. To avoid this you can enable and configure "flap detection" in nagios.cfg. Flap detection can be quite useful if you have a balky host or service, or if you're having other problems causing hosts or services to look like they're coming up and down a lot.

The service_notification_commands and host_notification_commands lines are how you need to be notified for service and host alerts, (email in this case), and then you have a line for the email address you wish to use for the notifications.

The Contact Groups section follows, (defined separately in contactgroups.cfg), and is, obviously, where you create contact groups. The syntax is similar to the contacts configuration, (you'll note that Nagios uses as many common terms as possible in its config files, which makes things much easier on you) with the "members" line being a comma-delimited list of contacts that will be notified when that particular contact group is notified.

Next up is the Hosts section, (defined separately in hosts.cfg). This is the section where you tell Nagios what to monitor. The host definition has, by necessity, a largish list of terms, even in minimal.cfg:

define host{
   use                     generic-host; Name of host template to use
   host_name               localhost
   alias                   localhost
   address                 127.0.0.1
   hostgroups              test
   check_command           check-host-alive
   max_check_attempts      10
   notification_interval   120
   notification_period     24x7
   notification_options    d,r
   contact_groups          admins
        }

The use line tells the defintion which template to use, in this case, "generic-host", defined just above the host definitions in minimal.cfg. The templates are handy for defining values that are going to be common to a host or group of hosts, (not a hostgroup), so that your host definitions don't have to be needlessly long.

The host_name is how you refer to the host in the rest of Nagios, the alias is for a more human-friendly label. The address is what is used by the $HOSTADDRESS$ macro we saw in the command definitions earlier. It can be a FQDN DNS name, or an IP address. For servers, I prefer the IP address wherever possible, since that way, the DNS service on my network dropping doesn't kill Nagios' ability to find hosts. One line not in the default, but that I included here is hostgroups. You can, if you like, define a host's hostgroups in the host definition or a separate hostgroup definition. I recommend picking one and sticking with it to avoid confusion. The check_command line is a single command that you can use as the basic "is it up or not command". This isn't where you set up all the services you check on a host, we'll look at that later. This is just a default command for a given host. max_check_attempts is how many times Nagios will retry the check_command if the result is anything other than OK. (This only applies to the check command in the host definition, not all services running against that host).

notification_interval is how many time units, (default is minutes) that Nagios will wait to send out notifications of a host that is still down or unreachable. That's continuously down. If the host goes up and comes back down, that's different. notification_period is the timeperiod that notifications for this host are allowed, and use the timeperiod(s) set by you. notification_options determine the conditions that notifications are sent out. Usually, you want at least d,r, so that you know when a host goes down, and if it comes back up by itself. contact_groups are self-evident, they're the groups who get notified. If you want multiple contact groups, then use a comma-delimited list. Please note that this example is not a complete list of host parameters by any means, and you should consult the Nagios documentation for a full list.

Since we just defined the host, we should next define the host group the host belongs to, and that's the next section, (separately in hostgroups.cfg):

define hostgroup{
   hostgroup_name    test
   alias             Test Servers
   members           localhost
        }

As you can see, hostgroups look a lot like contact groups. The members parameter is a comma-delimited list of hosts if multiple hosts are used.

The final section in minimal.cfg is services, (defined separately in services.cfg). This is where you really get into the meat of Nagios. Services are how you apply commands to multiple hosts with a single entry, and notify multiple contacts or contact groups. Services can be any command Nagios knows about, and you can get quite specific. For example, while there's no specific command to check the KDC status on OS X Server, I was able to do so by using SNMP to check for the KDC process by using the following command definition:

#check_kdc_process_via_snmp command definition
define command{
        command_name    check_kdc_process_via_snmp
        command_line    $USER1$/check_proc_by_snmp $HOSTADDRESS$ $USER3$ $USER9$
        }

and wrapping it in a service:

define service{
   use   generic-service         ; Name of service template to use
        
   host_name   xserve01
   service_description   SNMP KDC Process Check
   is_volatile   0  
   check_period   24x7
   max_check_attempts   3
   normal_check_interval   3
   retry_check_interval   1
   contact_groups   xserve-admins,nt-admins
   notification_interval   120
   notification_period   24x7
   notification_options   w,u,c,r
   check_command                   check_kdc_process_via_snmp
        }

Like host definitions, service definitions have a template option, so you can set common parameters once and apply them to all the services that use this template. Let's take a look at the default "PING" definition in minimal.cfg:

define service{
   use                     generic-service         
; Name of service template to use
   host_name               localhost
   service_description     PING
   is_volatile             0
   check_period            24x7
   max_check_attempts      4
   normal_check_interval   5
   retry_check_interval    1
   contact_groups          admins
   notification_options    w,u,c,r
   notification_interval   960
   notification_period     24x7
   check_command           check_ping!100.0,20%!500.0,60%
        }

If we compare the service to the host definitions, we see that they're quite similar. The use parameter is the template the definition uses. The host_name paramter is a comma-delimited list of hosts this service runs giants. The max_check_attempts, notification_interval, and notification_periods are the same as for the host definition. The is_volatile parameter normally doesn't apply to services, and should be left at its default unless you have a need to change it. The normal_check_interval setting is how many time units to wait between regular checks of a service. By default, this is set to every five minutes. You can increase the frequency of checks, but this will also increase the load on your server and network, and should be done with caution. The retry_check_interval is how long to wait before scheduling a 're-check' which only happens in the case of a non 'OK' return from a check. The contact_groups parameter is exactly the same as for the host definition. The check_command is the name of the command as defined in the checkcommand definition, and any additional parameters the command needs.

That, in a nutshell is minimal.cfg, and almost all the settings you need to get started with Nagios. However, there are still a couple more files to set up before we're ready to start Nagios and get monitoring.

cgi.cfg

This is the config file that controls how Nagios talks to the CGIs and who can access which CGIs. This file isn't too complicated, but it has to be correct for Nagios' web interface to work correctly. Like all Nagios files, it's well commented. Running down the entries, most are self explanatory, so I won't comment on all of them. One of the ones that can catch you off guard is the url_html_path entry. Remember, with Mac OS X Server, that's going to be the root defined for the site in Server Admin, so you want that to point at the path defined by the physical_html_path parameter just above it.

The use_authentication parameter and the access control sections that follow it are critical, and you really, really want to read the Nagios documentation on how they work. If you are using a lot of external commands that can reboot hosts, etc, (all possible with Nagios), properly configuring your CGI access controls is critical to the security of your Nagios installation.

The statuswrl_include parameter is if you want to create a 3-D VRML 'flythrough' view of your network. It's not really any more useful than any of the 2-D views, but it's pretty cool for corner office types. The rest of the options can be left alone for an initial installation.

Testing Your Configuration

Of course you read all the docs and did everything right, but just in case, Nagios gives you a way to test your config, via the -v option for the nagios executable. The syntax is <path to the nagios executable> -v <path to nagios.cfg>. So for our example, we'd use:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If we did everything right, Nagios will tell us, and we can start it up. If there are config file errors, Nagios will do its best to give you the file name and the line number with the error. This info has always been fairly accurate in my experience, so just look where Nagios tells you, and you should be able to find any errors quickly.

If you didn't get any errors, then let's start nagios as a daemon. This is done by using the -d switch as below:

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Once that's done, run top to make sure Nagios is running. If it is, congratulations, you have a working Nagios installation. If not, run Nagios with the -v option to see what you may have missed. Checking system.log can help here as well.

Conclusion

Well, we've gone over a very basic Nagios configuration setup guide, (and corrected some errors from part 1). In the third part of this, we'll take a look at the actual interface, and get an idea of what we're looking at when we check on Nagios, along with some of the email notifications you might get. Thanks!

Bibliography and References

There are two sites that you really must get familiar with to use Nagios. http://www.nagios.org/ is the main Nagios site, and has tons of excellent information for you to use. The other is http://nagiosexchange.org, the biggest collection of Nagios plugins you'll find anywhere.


John Welch jwelch@bynkii.com is Unix/Open Systems administrator for Kansas City Life Insurance, (http://www.kclife.com/) a Technical Strategist for Provar, (http://www.provar.com/) and the "GeekSpeak" segment producer for Your Mac Life, (http://www.yourmaclife.com/). He has over fifteen years of experience at making Macs work with other computer systems. John specializes in figuring out ways in which to make the Mac do what nobody thinks it can, showing that the Mac is a superior administrative platform, and teaching others how to use it in interesting, if sometimes frightening ways. He also does things that don't involve computertry on occasion, or at least that's the rumor.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Ys Chronicles II (Games)
Ys Chronicles II 1.0.0 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.0 (iTunes) Description: After a hard fight to recover the six sacred books in Ys Chronicles I, Adol is back for a sequel! | Read more »
FINAL FANTASY Ⅸ (Games)
FINAL FANTASY Ⅸ 1.0.4 Device: iOS Universal Category: Games Price: $16.99, Version: 1.0.4 (iTunes) Description: ==========●Special sale price for the FINAL FANTASY IX release! ●20% off from February 10 to February 21, 2016... | Read more »
Tennis Club Story (Games)
Tennis Club Story 1.03 Device: iOS Universal Category: Games Price: $4.99, Version: 1.03 (iTunes) Description: Aim for the ace position of tennis club prestige in this simulation! Your leadership decides if players make it to the big... | Read more »
Juggernaut Wars guide - How to use skill...
Juggernaut Warsis a brand new auto-RPG on iOS and Android that challenges you to build a team of heroes, send them out into various different missions to defeat waves of heroes, and level them up to increase their power. The actual combat itself... | Read more »
Check out the new Pirate Attack update i...
Love pirates and board games? Well, you'll love the new Pirate Attack themed update that just launched in Game of Dice. It adds a bunch of new content themed around pirates, like an all new event map based on a pirate ship which revamps the toll... | Read more »
Splash Cars guide - How to paint the tow...
Splash Cars is an arcade driving game that feels like a hybrid between Dawn of the Plow and Splatoon. In it, you'll need to drive a car around to repaint areas of a town that have lost all of their color. Check out these tips to help you perform... | Read more »
The best video player on mobile
We all know the stock video player on iOS is not particularly convenient, primarily because it asks us to hook a device up to iTunes to sync video in a world that has things like Netflix. [Read more] | Read more »
Four apps to help improve your Super Bow...
Super Bowl Sunday is upon us, and whether you’re a Panthers or a Broncos fan you’re no doubt gearing up for it. [Read more] | Read more »
LooperSonic (Music)
LooperSonic 1.0 Device: iOS Universal Category: Music Price: $4.99, Version: 1.0 (iTunes) Description: LooperSonic is a multi-track audio looper and recorder that will take your loops to the next level. Use it like a loop pedal to... | Read more »
Space Grunts guide - How to survive
Space Grunts is a fast-paced roguelike from popular iOS developer, Orange Pixel. While it taps into many of the typical roguelike sensibilities, you might still find yourself caught out by a few things. We delved further to find you some helpful... | Read more »

Price Scanner via MacPrices.net

What iPad Pro Still Needs To Make It Truly Pr...
I love my iPad Air 2. So much that I’m grudgingly willing to put up with its compromises and limitations as a production tool in order to take advantage of its virtues. However, since a computer for... Read more
21-inch 3.1GHz 4K on sale for $1399, $100 off...
B&H Photo has the 21″ 3.1GHz 4K iMac on sale $1399 for a limited time. Shipping is free, and B&H charges NY sales tax only. Their price is $100 off MSRP: - 21″ 3.1GHz 4K iMac (MK452LL/A): $... Read more
Apple price trackers, updated continuously
Scan our Apple Price Trackers for the latest information on sales, bundles, and availability on systems from Apple’s authorized internet/catalog resellers. We update the trackers continuously: - 15″... Read more
Save up to $240 with Apple Certified Refurbis...
Apple is now offering Certified Refurbished 12″ Retina MacBooks for up to $240 off the cost of new models. Apple will include a standard one-year warranty with each MacBook, and shipping is free. The... Read more
Apple refurbished 13-inch Retina MacBook Pros...
Apple has Certified Refurbished 13″ Retina MacBook Pros available for up to $270 off the cost of new models. An Apple one-year warranty is included with each model, and shipping is free: - 13″ 2.7GHz... Read more
Apple refurbished Time Capsules available for...
Apple has certified refurbished Time Capsules available for $120 off MSRP. Apple’s one-year warranty is included with each Time Capsule, and shipping is free: - 2TB Time Capsule: $179, $120 off - 3TB... Read more
13-inch 2.5GHz MacBook Pro (refurbished) avai...
Apple has Certified Refurbished 13″ 2.5GHz MacBook Pros available for $829, or $270 off the cost of new models. Apple’s one-year warranty is standard, and shipping is free: - 13″ 2.5GHz MacBook Pros... Read more
Apple refurbished 15-inch Retina MacBook Pros...
Apple has Certified Refurbished 2015 15″ Retina MacBook Pros available for up to $380 off the cost of new models. An Apple one-year warranty is included with each model, and shipping is free: - 15″ 2... Read more
New Liquid Crystal Technology Prevents Automo...
Researchers at the University of Central Florida have developed three new liquid crystal mixtures which will allow automobile displays to operate at unprecedented high and low temperatures In... Read more
BookBook For iPad Pro Coming Soon
The iPad Pro is a device unlike any other, and with Apple Pencil, it’s the ideal portable sketchpad: all that’s missing is the modern easel and portfolio to go. TwelveSouth’s BookBook for iPad Pro... Read more

Jobs Board

Lead Engineer - *Apple* OSX & Hardware...
Lead Engineer - Apple OSX & Hardware **Job ID:** 3125919 **Full/Part\-Time:** Full\-time **Regular/Temporary:** Regular **Listed:** 2016\-02\-10 **Location:** Cary, Read more
*Apple* System Analyst - ATOS IT Services...
Apple System AnalystReference no.198783CountryUSARegionUS - CALIFORNIACityUS - CALIFORNIA - BURBANKPosition TypeProfessionalJob AreaIT SupportJob TypeFull Read more
*Apple* Retail - Multiple Positions (US) - A...
Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, you're also the Read more
*Apple* Retail - Multiple Positions (US) - A...
Job Description: Sales Specialist - Retail Customer Service and Sales Transform Apple Store visitors into loyal Apple customers. When customers enter the store, Read more
*Apple* Subject Matter Expert - Experis (Uni...
This position is for an Apple Subject Matter Expert to assist in developing the architecture, support and services for integration of Apple devices into the domain. Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.