Nagios

For the last several weeks I’ve been working on implementing a Nagios setup in my client’s environment. For those who don’t know, Nagios is an amazing piece of open source software that enables monitoring of pretty much any kind of network node or service. Unfortunately, it’s also amazingly complicated. There are several components to Nagios and in order use this solution effectively, you need to understand how to setup all of them.

I think the logical place to start with implementing this solution is with the server application. I chose to use Ubuntu 12.04 server as my host OS for this service. This makes the installation very easy: `sudo apt-get install nagios3 nagios-nrpe-plugin`. This should install all of the necessary components for nagios server and the nagios web pages to run.

Before we go any further, we need to make a few configuration changes to get Nagios configured to run correctly.

Edit ‘/etc/nagios3/nagios.cfg’

Change to ‘check_external_commands=1′

Edit ‘/etc/group’

Add ‘www-data’ to the nagios group: ‘nagios:x:127:www-data’

Edit ‘/etc/nagios3/cgi.cgi and edit/add entries:

‘authorized_for_all_host_commands=nagiosadmin’

‘authorized_for_all_service_commands=nagiosadmin’

Execute `chmod g+x /var/lib/nagios/rw’

With the server installed and configured, the next step for me was to create host-groups based on common services. This can be done by editing the file ‘/etc/nagios3/conf.d/hostgroups_nagios2.cfg’. For example, I would create a ‘web-servers’ host group with an entry like:

define hostgroup {

hostgroup_name  web-servers

alias Web Servers

members web-server1,web-server2

}

The alias field is what will be displayed on the Nagios webpage for this host-group. The members field defines which servers are to be included in this group. The names that are used here are not necessarily the hostnames of those servers. Rather they are the names that we use in other configuration files that define those hosts…more on that in a minute.

With the host-group defined, the next thing I want to do is define a service to check against that host-group. For this I’ll edit the file ‘/etc/nagios3/conf.d/services_nagios2.cfg’. Now I’m going to create a check here that uses the check_http plugin which will run from the Nagios server and check to make sure that a webpage is being served on the web server. This command to use this plugin is also called ‘check_http’ and is defined in ‘/etc/nagios-plugins/config/http.cfg’. So if you need to modify the way it is checking, edit http.cfg. But for our purposes here, we’ll just stick with the default.

define service {

hostgroup_name web-servers

service_description HTTP

check_command check_http

use generic-service

}

Note that this check does not require an agent to be installed on the machine it is checking since it is directly querying that service.

Another way we can check to make sure the Apache2 web server is running is by using the nrpe plugin to talk to the agent we’ll be installing on our web server in a minute, and having the agent check to make sure that a web server, in this case apache2,  is running. This method depends upon the architecture of the system being checked as to how it needs to be called. So for this you should create a new configuration file just for the client machine.

It will require these lines to be created:

define host {

use  generic-host

host_name web-server1

alias Web Server 1

address <IP or Domain Name of web-server1>

}

Now append into that file this service definition if it is Linux:

define service {

hostgroup_name web-servers

service_description Apache2

check_command check_nrpe!check_apache

use generic-service

}

Or this service definition if it is Windows:

define service {

hostgroup_name web-servers

service_description Apache2

check_command check_process!apache2.exe

use generic-service

}

The reason for this difference is that Nagios has made the job of checking Windows a bit easier on us by already having several commands in the ‘/etc/nagios-plugins/config/nt.cfg’ file that interface with the check_nrpe plugin for us.

Okay, so with the checks configured, it’s time to install agents on Windows and Linux client machines. Again, I use Ubuntu in my environment pretty much exclusively so it’s very simple to install the Nagios agent: `sudo apt-get install nagios-nrpe-server`. From there I just need to jump into the nrpe.cfg file and make a few changes to allow my Nagios server to talk to the agent, and then setup the checks it will be allowed to call for.

On the client, edit the file ‘/etc/nagios/nrpe.cfg’ and change the the entry ‘allowed_hosts=127.0.0.1′ so it reflects the IP address of your Naigos server. Now scroll to the bottom and add the line

 ‘command[check_apache]=/usr/lib/nagios/plugins/check_procs -a apache’.

This line tells the agent that when told to run a check called ‘check_apache’ to execute ‘/usr/lib/nagios/check_procs’ which is a standard Nagios plugin that looks through the process list for various things based on the arguments it is supplied. In our case, ‘-a apache’, tells it to look for proceses that have ‘apache’ in them.

Now on Windows, I have to go to sourceforge.org and download nsclient++ and install it on my Windows client. Much like the Linux client, I have to change some lines in the configuration file once its done installing. This is to allow it to load specific modules that will be needed of the checks that I want to run. Also I need to allow my Nagios server to talk to it. As well I need to get my machine specific checks in here as well. I ended up changing this config file quite a bit and removed all of the comments for the sake of readability.

Make sure the following lines are uncommented:

[modules]

CheckSystem.dll

CheckDisk.dll

NRPEListener.dll

CheckExternalScripts.dll

[Settings]

use_file=1

allowed_hosts=<Your_Nagios_Server_IP>

[NRPE]

allow_arguments=1

use_ssl=1

That should pretty much do it. Now go back to your Naigos server. Since we listed in our host group that we have web-server1 and web-server2, make sure that you create a host configuration file for web-server2. If you don’t Nagios will error when you test the config. The other option is just to remove web-server2 from the web-servers host-group. Now run:

 `sudo nagios3 -v /etc/nagios3/nagios.cfg`

This tests your configuration. If it comes back without any errors then proceed to run:

 `sudo /etc/init.d/nagios3 restart`

I hope this guide helps prevent others from struggling as much as I did.