Archive for the 'Nagios Core' Category

Page 2 of 5

Nagios Core 4.0.0 Now Available

We are pleased to announce that Nagios Core 4.0.0 is now available for download. Core 4 brings numerous performance enhancements and new features to a platform that is already the world leader in monitoring. Key changes included in this release are outlined below:

  1. Performance Improvements:
    –  Core Workers (see below for more information.)
    –  Configuration Verification: In Core 4, each configuration item is only verified once with an O(n) operation, whereas in Core 3 it was an O(n^2) operation.
    –  Event Queue: In Core 4, inserting events into the event queue is now an O(log n) operation, whereas in Core 3 it was an O(n) operation.
    –  Macros: In Core 4, macros are now sorted at start-up so that a binary search can be used rather than the linear search that was used in Core 3. In addition, frequently accessed macros such as $USERx$, $ARGx$, and $HOSTADDRESS$ are given special case, early lookups.
  2. Query Handler: The query handler provides a simple interface for external entities communicating with Nagios Core. Core workers use the query handler interface.
  3. Core Workers: The process of performing checks is now handled by a lightweight core worker process.
    –  There are standard worker processes that are created when Core starts that stay running as long as Core is running. This eliminates at least one fork of Nagios Core when a check is performed and in many cases two forks, thus speeding up the checks.
    –  Core workers communicate with the main Nagios Core process with an in-memory IPC mechanism (currently Unix-domain sockets), eliminating the I/O bottleneck that Core 3 encountered while writing and reading check results to/from disk.
    –  The core worker architecture is extensible to special purpose workers that could potentially perform checks even faster and/or distribute the check load.
  4. libnagios: libnagios is a library of functions useful to developers of query handlers and core workers.

For more detailed information on what’s new in Core 4, visit:
http://nagios.sourceforge.net/docs/nagioscore/4/en/whatsnew.html

NRPE 2.15 Released – Now with IPv6 Support

NRPE 2.15 was released earlier today. The primary update in this version of NRPE is full support for IPv6.

The NRPE daemon now has the ability to listen on IPv4 and/or IPv6 addresses. In addition, the check_nrpe plugin now accepts switches that specify whether an IPv4 or IPv6 connection should be made to the NRPE daemon. The NRPE daemon has always had the ability to perform checks using IPv6, assuming the plugin it runs supports it. Thanks to Leo Baltus for the patch that made this possible.

IPv6 communication has been tested on Linux (RHEL/CentOS) and is known to work there. It also known to compile on other Unices that we have access to: Solaris 10, AIX 5.3 and 6.1 and HP-UX 11i v1. Feedback on these and other platforms is welcome. Bugs or enhancements (preferably with patches) can be submitted to http://tracker.nagios.org. Other discussion/questions can be sent to the mailing lists or http://support.nagios.com/forum.

Building a Nagios 4 / Nagios XI Prototype Box

So after an awesome set of presentations at the Nagios World Conference 2012, one of the hot topics for discussion was clearly the upcoming Nagios Core 4 release. Andreas Ericsson has been hard at work overhauling the Core engine to optimize performance and reduce Disk and CPU usage for Nagios, and initial tests are showing his work has paid off in a substantial way. For this experiment, we’re going to use a system with the following specs:

  • Virtual Machine running under Vmware Workstation 8
  • 2GB of RAM
  • 1 CPU, 4 Cores
  • 80GB Hard Disk
  • Nagios XI Installed
  • Nagios binaries replace with Nagios 4 monitoring engine
  • ndoutils binaries replaced with with the latest SVN code for ndoutils: nagios/ndoutils/branches/ndoutils-2-0
  • No initial performance tweaks other than Nagios 4 and ndoutils 2

I’ll post setup instructions below for users who also want to play around with this setup. Note: This setup is not intended for production installs, use this in test environments only!

Start with Nagios XI installed, either through the pre-installed VM or with a manual installation. I chose a manual installation for this demo so I could set up the hardware to my liking and give it sufficient hard drive space to test a LOT of hosts. My first attempt at the prototype only had 10GB on the box, and filled up quite quickly because of performance data. .I ran the following commands after initial Nagios XI installation and setup was completed.

From the command-line:

You can verify the upgrade succeeded by reviewing the /usr/local/nagios/var/nagios.log file. There should be some new warnings about obsolete definitions like “failure_prediction_enabled”, which we won’t worry about for now. For now I’d like to see what kind of performance impact I can expect for a large number of checks being run on this machine, so I need to quickly create a large number of checks.  I’ll achieve this by running a tools script that we include with every installation of Nagios XI.

I chose to use static configs instead of the CCM for this benchmark for ease of setup time, and also easy removal later on. This also creates a list of checks with 25% of the services showing up as critica, which is useful in testing a system stressed with alerts and notifications. However, I’m also going to turn off notifications and event handlers during this setup phase just to make sure I don’t bottleneck somewhere and tank the entire box. Now lets restart Nagios to start using the new configs.

After adding 1000 hosts and 4000 services all at a 5mn interval the CPU load is running at a nominal level, averaging anywhere from .30 – .70, which is pretty impressive for a 4 core system! There is still some Disk IO because performance data processing is happening for each service, and this will likely be one of the noticeable bottlenecks as we add more checks to this system. After the system levels out and all of the checks are settled into a hard state, I turn on notifications and event handlers and begin watching the system and testing for bottlenecks. I’ll post back with some results soon! If there are any XI users out there who want to give this a shot in their test environments and post back with their results we’d love to hear what you find!

 

 

 

Nagios V-Shell 1.9 Released

Nagios V-Shell 1.9 includes major performance updates, and a re-implementation of PHP caching that should decrease V-Shell page load times anywhere from 40-75%.  I ran some benchmarking tests on a test system(Dual core desktop with 4GB of RAM) with 1800 hosts, and 7200 services.  This system runs with an average CPU load of 2.0-6.0 throughout the day, so the hardware is being pushed pretty hard already from the check load. V-Shell 1.8 created page load times anywhere from 18-28 seconds throughout the interface without APC caching enabled.  Needless to say, this is problematic for many users with larger environments.  The Core cgi’s were able to load anywhere from 2-11 seconds, with the service status page taking around 9-11 seconds to load all of the data.  My goal for 1.9 was to minimize any unnecessary processing, and optimize any functions that were inefficient or using slower PHP built-in functions.  The differences in 1.9 are substantial.  Without any caching enabled at all, I was able to decrease the average page load time to 9-14 seconds, which is 40-50% faster by itself.  Once I had the code optimized, I reworked the APC caching functionality.  If a user has PHP’s APC caching packages installed and enabled on their web server, V-Shell will cached the objects.cache file until it detects any changes in the file, while the data in the status.dat file will be cached based on a TTL (time to live) config option which now exists in 1.9.  Once the data is cached in APC, the page load times throughout the interface averaged between 4-5 seconds for all pages, which is a 75% decrease in load time on average.

My goal for the next version of V-Shell is to add support for mklivestatus and ndoutils for backend data, which will eliminate the need to parse the objects.cache file and status.dat files for systems with those backends.  This should further improve performance for larger installations.

Download Nagios V-Shell 1.9

CHANGELOG