[Nagios-devel] Future of Nagios
mgagne at iweb.com
Thu May 7 19:21:31 UTC 2009
On 5/6/09 4:47 PM, Andreas Ericsson wrote:
> Well, restarting or just reloading the configuration doesn't really make
> a difference to what kind of monitoring is happening during the reload.
> Even if Nagios were to reload the configuration without requiring a
> restart, no network monitoring would happen during the reloading.
>> If we reload Nagios too often, it would simply pass the majority of its
>> time exporting configuration/status to NDOutils and scheduling checks
>> without doing any real work at all. Too seldom and new monitoring would
>> take too much time before being scheduled.
>> Any future plan regarding this aspect?
> Well, I've experimented a little bit. It seems to be several orders of
> magnitude faster to do the configuration parsing in two passes. One to
> find out how many objects there are of each type and sort them into a
> two-dimensional table of and then doing a binary search on that table,
> as opposed to creating fixed-sized hash tables and pre-insert objects
> into it. This is especially true for huge configurations, and appears
> to be caused by far more beneficial memory access patterns and the
> ability to only parse most objects a single time since we know that
> all hosts have been parsed by the time services are parsed, fe.
The main goal for us was to retrieve status information as fast as
possible in a centralized way. (because we have multiple Nagios servers)
NDOutils was the solution we choose to answer our needs for the
1) There's no known way (to me) to retrieve status information directly
from the daemon. It has to be exported to a file (status.dat)
2) Parsing status.dat takes too much time (I tried with Perl and PHP)
3) Writing a CGI script to export the status in XML using Nagios
functions isn't faster since it still relies on status.dat
4) Mounting a tmpfs folder and moving status.dat in it doesn't help
Unfortunately, the main "problem" with NDOutils is that it reexports the
configuration and status at every reload. Clearing the "old" information
and exporting the *exact* same information is very time consuming and no
I found a patch which could improve/fix this behavior:
=> Do not resend retained status to NDO
Only problem is that deleted hosts/services would never be removed from
MySQL if we apply the patch.
To conclude, the real problem isn't with the Nagios restart process
itself but with:
- NDOutils inefficiency at managing retention data
- The fact we can't access status information in a fast and efficient way.
So I was hoping for some improvements regarding this aspect. (maybe by
using IPC/shared memory or a similar solution to access the status
information directly from the daemon memory)
More information about the Nagios-devel