So, you run your own monitoring? Nice!
But how do you monitor your monitoring? Who will alert you when your monitoring service is down?
We wanted to encourage all Nagios and Icinga administrators out there to start monitoring their own monitoring. So we did something unprecedented.
We made Server Density free
That’s right. If you are looking for a tool to monitor your monitoring, then look no further. We are offering a free Server Density account, valid for one server and one website.
But we didn’t stop there.
To help you monitor your monitoring, we even created Server Density plugins for Icinga and Nagios.
This article outlines some key Nagios / Icinga metrics you need to monitor, followed by step by step instructions for doing this with Server Density.
Nagios performance metrics
Let’s take a look at some Nagios performance metrics:
Metric | Comments | Suggested Alert |
Total Hosts/Services | Number of monitored hosts or services | If the count goes below a certain threshold, some servers may have been removed. Depending on how dynamic your infrastructure is, it might be a good idea to add some alerts here. |
Hosts/Services Actively Checked | Number of checks that run directly from Nagios. These are performance killers: They need to be scheduled, and when a thread is available, shell out, execute the script, wait results, parse, append to command buffer and process the buffer. | |
Hosts/Services Passively Checked | 3rd party checks whose results are forwarded to Nagios. For example: check_MK. | |
Active/Passive Host/Service Latency 1/5/15 min | This one measures the time each check has to wait from being scheduled to actually running. | Trigger alert when > 40s. This will obviously depend on how busy your Nagios server is. It should never be higher than interval_length. |
Active/Passive Host/Service Checks Last 1/5/15 min | This one measures the amount of checks carried out over the specified time interval. | Trigger alert when < X, where X is your regular rate of checks. This is an alert that checks Nagios is actually running and processing configured checks. |
Uptime | Tracks how long your Nagios server has been up and running. If Nagios is respawning, we would need to be notified. | Trigger when < 300s. |
Nagios server alerts
It’s also a good idea to configure standard system metrics:
Metric | Comments | Suggested Alert |
Load | A busy Nagios server can experience excessive loads. | Trigger alert when > X, where X is 4 * the number of CPU cores. |
Disk usage | We need a sufficient amount of available disk space for services to operate normally. | Disk usage > 90% |
Swap usage | Disk swapping impacts system performance | Swap usage > 128MB |
Process running | Checks to make sure the Nagios daemon is running. | Process count for /usr/local/nagios/bin/nagios > 1 |
In addition to the above, you should also collect metrics for disk IO and network bandwidth. If you use MySQL or PostgreSQL IDO, then monitoring those databases is highly recommended too.
Monitoring Nagios itself with Server Density
Here is what you need to do:
- Register for your free Server Density account (limited to 1 server + 1 website)
- Enter your credit card information in Preferences > Billing. This is to verify your account and you will not be charged.
- Install the agent on your monitoring server:
$ curl -Lk https://www.serverdensity.com/downloads/agent-install.sh > install.sh $ chmod +x install.sh $ sudo ./install.sh -a https://example.serverdensity.io -t your_licensekey
- Configure your plugin path:
$ sudo sed -i 's/plugin_directory: $/plugin_directory: \/usr\/local\/share\/sd-plugins\//g' /etc/sd-agent/config.cfg
- Add the sd-agent user to your Nagios group (same for Icinga):
$ sudo adduser sd-agent nagios
- Install the plugin:
For Nagios3/Icinga1:
$ cd /usr/local/share/sd-plugins/ && sudo wget https://raw.githubusercontent.com/bencer/sd-agent-plugins/master/Nagios/Nagios.py $ sudo vi /etc/sd-agent/plugins.cfg [Nagios] cmd_path = /usr/local/nagios/bin/nagiostats
And restart the agent to apply changes:
$ sudo service sd-agent restart
For Icinga2:
$ cd /usr/local/share/sd-plugins/ && sudo wget https://raw.githubusercontent.com/bencer/sd-agent-plugins/master/Icinga2/Icinga.py
Now we need to configure Icinga2 API (if not configured yet):
$ cd - ; sudo -s # icinga2 api setup information/cli: Generating new CA. [...] Done. Now restart your Icinga 2 daemon to finish the installation. # service icinga2 restart
Check your auto-generated username and password and configure the credentials for the plugin:
$ sudo cat /etc/icinga2/conf.d/api-users.conf $ sudo vi /etc/sd-agent/plugins.cfg [Icinga] api_user = root api_passwd = your_password # you will need FQDN to match the cert CN api_stats_url = https://your_fqdn:5665/v1/status # Icinga PKI CA cert icinga_ca_crt = /etc/icinga2/pki/ca.crt
And restart the agent to apply changes:
$ sudo service sd-agent restart
- Login to Server Density and start playing with the alerts by creating similar dashboards.
Summary
If you run your own monitoring servers then make sure that “monitoring your monitoring” is high on your DevOps checklist.
Grab your free Server Density account, play around with the alerts and dashboards and let us know what you think.
Do you monitor any other performance metrics on your monitoring service? Make sure you add those in the comments.
The post How to Monitor Nagios itself for free appeared first on Server Density Blog.