Archive for the 'Nagios Core' Category

Building a Nagios 4 / Nagios XI Prototype Box

So after an awesome set of presentations at the Nagios World Conference 2012, one of the hot topics for discussion was clearly the upcoming Nagios Core 4 release. Andreas Ericsson has been hard at work overhauling the Core engine to optimize performance and reduce Disk and CPU usage for Nagios, and initial tests are showing his work has paid off in a substantial way. For this experiment, we’re going to use a system with the following specs:

  • Virtual Machine running under Vmware Workstation 8
  • 2GB of RAM
  • 1 CPU, 4 Cores
  • 80GB Hard Disk
  • Nagios XI Installed
  • Nagios binaries replace with Nagios 4 monitoring engine
  • ndoutils binaries replaced with with the latest SVN code for ndoutils: nagios/ndoutils/branches/ndoutils-2-0
  • No initial performance tweaks other than Nagios 4 and ndoutils 2

I’ll post setup instructions below for users who also want to play around with this setup. Note: This setup is not intended for production installs, use this in test environments only!

Start with Nagios XI installed, either through the pre-installed VM or with a manual installation. I chose a manual installation for this demo so I could set up the hardware to my liking and give it sufficient hard drive space to test a LOT of hosts. My first attempt at the prototype only had 10GB on the box, and filled up quite quickly because of performance data. .I ran the following commands after initial Nagios XI installation and setup was completed.

From the command-line:

cd /tmp
yum install -y subversion
svn co https://nagios.svn.sourceforge.net/svnroot/nagios/ndoutils/branches/ndoutils-2-0
svn export ndoutils-2-0/ ndoutils
svn co https://nagios.svn.sourceforge.net/svnroot/nagios/nagioscore/trunk/ coretrunk
svn export coretrunk/ nagioscore
service nagios stop
service ndo2db stop
cd nagioscore
./configure --with-command-group=nagcmd
make all
make install
cd /tmp/ndoutils
./configure; make; make install
cd db
./upgradedb -u root -p nagiosxi -h localhost -d nagios
service ndo2db start
service nagios start

You can verify the upgrade succeeded by reviewing the /usr/local/nagios/var/nagios.log file. There should be some new warnings about obsolete definitions like “failure_prediction_enabled”, which we won’t worry about for now. For now I’d like to see what kind of performance impact I can expect for a large number of checks being run on this machine, so I need to quickly create a large number of checks.  I’ll achieve this by running a tools script that we include with every installation of Nagios XI.

cd /usr/local/nagiosxi/tools
./create_checks.php --hosts=1000 --prefix=_MASS1_ > /usr/local/nagios/etc/static/_MASS1.cfg

I chose to use static configs instead of the CCM for this benchmark for ease of setup time, and also easy removal later on. This also creates a list of checks with 25% of the services showing up as critica, which is useful in testing a system stressed with alerts and notifications. However, I’m also going to turn off notifications and event handlers during this setup phase just to make sure I don’t bottleneck somewhere and tank the entire box. Now lets restart Nagios to start using the new configs.

service nagios restart

After adding 1000 hosts and 4000 services all at a 5mn interval the CPU load is running at a nominal level, averaging anywhere from .30 – .70, which is pretty impressive for a 4 core system! There is still some Disk IO because performance data processing is happening for each service, and this will likely be one of the noticeable bottlenecks as we add more checks to this system. After the system levels out and all of the checks are settled into a hard state, I turn on notifications and event handlers and begin watching the system and testing for bottlenecks. I’ll post back with some results soon! If there are any XI users out there who want to give this a shot in their test environments and post back with their results we’d love to hear what you find!

 

 

 

Nagios V-Shell 1.9 Released

vshell2

Nagios V-Shell 1.9 includes major performance updates, and a re-implementation of PHP caching that should decrease V-Shell page load times anywhere from 40-75%.  I ran some benchmarking tests on a test system(Dual core desktop with 4GB of RAM) with 1800 hosts, and 7200 services.  This system runs with an average CPU load of 2.0-6.0 throughout the day, so the hardware is being pushed pretty hard already from the check load. V-Shell 1.8 created page load times anywhere from 18-28 seconds throughout the interface without APC caching enabled.  Needless to say, this is problematic for many users with larger environments.  The Core cgi’s were able to load anywhere from 2-11 seconds, with the service status page taking around 9-11 seconds to load all of the data.  My goal for 1.9 was to minimize any unnecessary processing, and optimize any functions that were inefficient or using slower PHP built-in functions.  The differences in 1.9 are substantial.  Without any caching enabled at all, I was able to decrease the average page load time to 9-14 seconds, which is 40-50% faster by itself.  Once I had the code optimized, I reworked the APC caching functionality.  If a user has PHP’s APC caching packages installed and enabled on their web server, V-Shell will cached the objects.cache file until it detects any changes in the file, while the data in the status.dat file will be cached based on a TTL (time to live) config option which now exists in 1.9.  Once the data is cached in APC, the page load times throughout the interface averaged between 4-5 seconds for all pages, which is a 75% decrease in load time on average.

My goal for the next version of V-Shell is to add support for mklivestatus and ndoutils for backend data, which will eliminate the need to parse the objects.cache file and status.dat files for systems with those backends.  This should further improve performance for larger installations.

Download Nagios V-Shell 1.9

CHANGELOG

 

Bash and Python NRDP Clients for Nagios

Now available 2 new clients to send passive check results to Nagios Remote Data Processor (NRDP) server.

We have just released:
send_nrdp.sh Bash NRDP Client
send_nrdp.py Python NRDP Client

You no longer need to install PHP or Perl on your client machines to run passive checks with NRDP.  Both of these implementations can accept result piped from STDIN and you can change the delimiters to whatever you like.

STDIN results should be in the following order, for HOST checks:

HOSTNAME    STATE    OUTPUT

for SERVICE checks

HOSTNAME    SERVICENAME    STATE    OUTPUT

Additionally, the bash version can take an XML file of check results formatted like so:

<?xml version='1.0'?>
<checkresults>
<checkresult type="host" checktype="1">
  <hostname>YOUR_HOSTNAME</hostname>
  <state>0</state>
  <output>OK|perfdata=1.00;5;10;0</output>
</checkresult>
<checkresult type="service" checktype="1">
  <hostname>YOUR_HOSTNAME</hostname>
  <servicename>YOUR_SERVICENAME</servicename>
  <state>0</state>
  <output>OK|perfdata=1.00;5;10;0</output>
</checkresult>
</checkresults>

 

Nagios SNMP Trap Interface (NSTI) Available

 

Nagios SNMP Trap Interface has been uploaded to the nagios project SVN repo. Its goal is to make it easier to see what traps have arrived and provides a sane way to keep track of SNMP traps.
Continue reading ‘Nagios SNMP Trap Interface (NSTI) Available’

Nagios Mobile 1.0

nagiosmobile1

Nagios Mobile is a lightweight web interface, based on the Teeny Nagios project by Hirose Masaaki. Nagios Mobile is a PHP web-based application designed for Mobile and touch-screen devices.

Key Features:
- User-level authorization for hosts, services, and commands that match Nagios Core.
- Filtered lists to quickly identify and respond to unhandled problems
- Acknowledge problems, Disable/Enable Notifications, or Schedule Downtime for authorized hosts and services
- Works with any Nagios 3.x installation
- Support for APC data caching for faster page loads
- Support for both webkit and non-webkit enabled devices

My favorite kinds of development projects always end up being on the front-end, and I certainly can’t claim much on the interface design for this project, as that goes to community member Hirose Masaaki using the JQuery Mobile framework.  We loved the front-end design that he came up with for the Teeny Nagios project, so we did some revisions to the server-side code underneath to allow for host and service filtering by state, more complex permissions, data caching, and improved scalability for larger installations.    We also added some code to allow Nagios Mobile to work from essentially any mobile browser.

Download Nagios Mobile.

 

 

Nagios V-Shell 1.8 Release

Over the past few years, there’s been a strong outpouring of requests for an updated interface for Nagios.  We released Nagios V-Shell just about a year ago now, and we’re happy to see that it currently stands as the most popular item on the Nagios Exchange, with over 100,000 views!  I don’t usually post to labs every time I make an update to V-Shell, but I thought this time around would be worth mentioning.  I’ve spent the last few weeks doing a major overhaul of the permissions in order to mirror the same permissions scheme that people are used to in Nagios Core.  Initially V-Shell has limited user-level control in regards to permissions, but as of v1.8 I’m pleased to say I’ve finally got that major TODO crossed off my list.  V-Shell now supports user-level access, as well as read-only access to match the permissions scheme of Nagios Core.  Feel free to check out V-Shell 1.8 on the Nagios Exchange.

Monitoring Linux/Unix Machines Using SSH or NRPE

We’ve had a number of customer requests for new Nagios XI wizards that make it easy to monitor Linux/Unix machines either by SSH (using check_by_ssh) or NRPE. This is often useful in environments where Nagios admins have already installed the Nagios plugins and/or NRPE on machine in order to monitor them with Nagios Core.

Due to the requests we received, we whipped together some new wizards that help with this. Specifically, the new SSH Proxy and NRPE wizards.

And lest I forget, we also had a great community member (thanks Joshua!) document and test instructions on monitoring AIX over NRPE. We worked with Joshua to develop the NRPE wizard in a way that would work with his AIX/NRPE setup. BTW: Would you believe using Nagios to monitor AIX could save $300k+ on Tivoli licenses? :-)

Nagios Montage

There are two issues that have always been present with Nagios, one technical and the other social.  The first is that Nagios is a fairly sophisticated and complicated piece of software, which can make it difficult for new users to get up and running with it quickly, as they have to deal with the hardest part about using it – installation and configuration – first, before being able to play with its abilities.  The second is that “Nagios” does not only refer to a single piece of software, but rather an extensive software ecosystem of community contributions around a common framework.  As such, it’s common for new users to not even be aware of all of the things Nagios can do, as they initially only see what the core engine can do, without any of those great extensions. Continue reading ‘Nagios Montage’

Cool Nagios Use – Monitoring Radiation Levels With Nagios

Some engineering Nagios users in Japan managed to hook up their geiger counter to Nagios, so they could monitor radiation levels outside their office in Tokyo.  This is a great example of how flexible Nagios can be, although it is a bit unnerving.  Our thoughts go out to everyone in Japan that is suffering from the recent earthquake and tsunami, as well as the ensuing radiation problems.  Stay safe!

See the geiger counter graphs at Denphone.

Cool Nagios Use – Detecting Silence In Audio Streams

Nagios is an extremely flexible monitoring system that is capable of monitoring just about anything you need.  A great example of a “non-standard” use of Nagios is that of monitoring audio levels and silence in audio streams.  James Harrison wrote a nice article on using Nagios to monitor audio silence with SilentJack.  Read his article here.