We came across an issue about a month ago where a user was losing data with a distributed/passive checks setup. Upon a closer investigation we uncovered that all of the passive checks were being executed every 5 minutes from servers that were all synced to the same time server. The result? Hundreds of checks were all coming in with a few seconds, putting a heavy load on Nagios, while the other 4 minutes and 50 seconds were going virtually unused by the server. After some discussion on this we decided to make use of a built-in tool for Nagios – nagiostats – and create a wizard that could monitor Nagios itself to see how the checks were coming in and being processed. Although multiple checks have been written in the past, we’ve created a new wizard that allows you to quickly create several checks against the nagiostats binary to monitor the monitoring environment itself. We’ve just released a 1.0 version of this wizard and we’re curious to know what users think of it. Feel free to give it a try and send us your feedback!
Nagiostats Wizard on Exchange
Wizard Preview
Graphs from the Nagiostats Wizard