Distributed Monitoring in Nagios with check_mk multisite

For some time now, I’ve been exploring the best ways of configuring a distributed Nagios setup. With the “federated” configuration that Nagios recommend, you can pass data from remote Nagios instances back to a central Nagios server with the use of passive checks combined with NSCA or NRDP. Whilst this works well, the duplicate configuration on each server soon becomes tedious and unmanageable. There are other alternatives such as mod_gearman but in my opinion these lack the intelligence to be effective.

The ability to have centralised configuration in a distributed setup isn’t currently supported by Nagios, therefore I have shifted my focus towards centralised reporting, where data is aggregated from several independent Nagios instances to a centralised location. This provides the benefits of multiple Nagios instances at remote sites but without the overhead and complexity associated with duplicating the configuration. There are quite a few tools offering centralised reporting, such as Nagios Fusion and Thruk, but my favorite by far is checkmk multisite.

check_mk multisite uses the extremely fast MK Livestatus module to access data from Nagios backends. Best of all it can be seamlessly and easily bolted on to existing Nagios implementations. Setting it up is as simple as:

  • Download the latest stable version of check_MK . If you are willing to brave the beta versions there are some very cool new features such as customisable dashboards.
  • Install check_mk multisite on all nodes (standard install no config required)
  • Tweak Apache/Nagios security settings on master and remote sites as per your requirements. The usernames need to be consistent across all Nagios instances although passwords are not important, since only the central site will perform authentication.
  • Make livestatus reachable via TCP by specifying your central server IP

RemoteSite2:~ # vi /etc/xinetd.d/livestatus
service livestatus
{
type            = UNLISTED
port            = 6557
socket_type     = stream
protocol        = tcp
wait            = no
# limit to 100 connections per second. Disable 3 secs if above.
cps             = 100 3
# set the number of maximum allowed parallel instances of unixcat.
# Please make sure that this values is at least as high as
# the number of threads defined with num_client_threads in
# etc/mk-livestatus/nagios.cfg
instances       = 500
# limit the maximum number of simultaneous connections from
# one source IP address
per_source      = 250
# Disable TCP delay, makes connection more responsive
flags           = NODELAY
user            = nagios
server          = /usr/bin/unixcat
server_args     = /usr/local/nagios/var/rw/live
# configure the IP address(es) of your Nagios server here:
only_from       = 10.0.0.3/32 127.0.0.1
disable         = no
}
/etc/init.d/xinetd reload

  • Configure your master and remote sites in check_mk

Vi /etc/check_mk/multisite.mk on master
sites = {
"CentralSite" : {
"alias" : " CentralSite "
},
"RemoteSite1": {
"alias":          " RemoteSite1",
"socket":         "tcp:10.1.1.5:6557",
"url_prefix":     "http://10.1.1.5/",
},
"RemoteSite2": {
"alias":          "RemoteSite2",
"socket":         "tcp:10.2.2.5:6557",
"url_prefix":     "http://10.2.2.5/",
},
}

There is an easy to follow guide here: http://mathias-kettner.de/checkmk_multisite_setup.html

You can leave a response, or trackback from your own site.

Leave a Reply