So after an awesome set of presentations at the Nagios World Conference 2012, one of the hot topics for discussion was clearly the upcoming Nagios Core 4 release. Andreas Ericsson has been hard at work overhauling the Core engine to optimize performance and reduce Disk and CPU usage for Nagios, and initial tests are showing his work has paid off in a substantial way. For this experiment, we’re going to use a system with the following specs:
- Virtual Machine running under Vmware Workstation 8
- 2GB of RAM
- 1 CPU, 4 Cores
- 80GB Hard Disk
- Nagios XI Installed
- Nagios binaries replace with Nagios 4 monitoring engine
- ndoutils binaries replaced with with the latest SVN code for ndoutils: nagios/ndoutils/branches/ndoutils-2-0
- No initial performance tweaks other than Nagios 4 and ndoutils 2
I’ll post setup instructions below for users who also want to play around with this setup. Note: This setup is not intended for production installs, use this in test environments only!
Start with Nagios XI installed, either through the pre-installed VM or with a manual installation. I chose a manual installation for this demo so I could set up the hardware to my liking and give it sufficient hard drive space to test a LOT of hosts. My first attempt at the prototype only had 10GB on the box, and filled up quite quickly because of performance data. .I ran the following commands after initial Nagios XI installation and setup was completed.
From the command-line:
yum install -y subversion
svn co https://nagios.svn.sourceforge.net/svnroot/nagios/ndoutils/branches/ndoutils-2-0
svn export ndoutils-2-0/ ndoutils
svn co https://nagios.svn.sourceforge.net/svnroot/nagios/nagioscore/trunk/ coretrunk
svn export coretrunk/ nagioscore
service nagios stop
service ndo2db stop
./configure; make; make install
./upgradedb -u root -p nagiosxi -h localhost -d nagios
service ndo2db start
service nagios start
You can verify the upgrade succeeded by reviewing the /usr/local/nagios/var/nagios.log file. There should be some new warnings about obsolete definitions like “failure_prediction_enabled”, which we won’t worry about for now. For now I’d like to see what kind of performance impact I can expect for a large number of checks being run on this machine, so I need to quickly create a large number of checks. I’ll achieve this by running a tools script that we include with every installation of Nagios XI.
./create_checks.php --hosts=1000 --prefix=_MASS1_ > /usr/local/nagios/etc/static/_MASS1.cfg
I chose to use static configs instead of the CCM for this benchmark for ease of setup time, and also easy removal later on. This also creates a list of checks with 25% of the services showing up as critica, which is useful in testing a system stressed with alerts and notifications. However, I’m also going to turn off notifications and event handlers during this setup phase just to make sure I don’t bottleneck somewhere and tank the entire box. Now lets restart Nagios to start using the new configs.
service nagios restart
After adding 1000 hosts and 4000 services all at a 5mn interval the CPU load is running at a nominal level, averaging anywhere from .30 – .70, which is pretty impressive for a 4 core system! There is still some Disk IO because performance data processing is happening for each service, and this will likely be one of the noticeable bottlenecks as we add more checks to this system. After the system levels out and all of the checks are settled into a hard state, I turn on notifications and event handlers and begin watching the system and testing for bottlenecks. I’ll post back with some results soon! If there are any XI users out there who want to give this a shot in their test environments and post back with their results we’d love to hear what you find!