[Nagios-users] Master and slave servers for Nagios

Wheeler, JF (Jonathan) J.F.Wheeler at rl.ac.uk
Wed Apr 25 13:34:12 UTC 2007


> -----Original Message-----
> From: Jason Qualkenbush [mailto:jqualkenbush at iso-ne.com] 
> Sent: 25 April 2007 11:55
> 
> Wheeler, JF (Jonathan) wrote:
> > As I have reported in the past I have 2 slave servers and a master
> > server; all checks should be run from the slave servers and passed
back
> > to the master server.  I have been recently trying the understand
why
> > the master server still has kernel "Out of memory" problems such
that
> > the kernel starts killing active processes and, in some cases,
panics
> > because there are no more processes to kill (this happens perhaps
once
> > or twice per week usually around 4:50 - 5:10 in the morning).  As
part
> > of my investigations I have noticed that for a typical host 40% of
tests
> > are reported from the slave and 60% are run by the master.  I can
tell
> > this because 40% of messages for this typical host in
/var/log/nagios on
> > the master server begin "EXTERNAL_COMMAND" and 60% of messages begin
> > "Warning:".   My question is why this should be ?  Here is a copy of
> > nagios.log from the master server for one test of one host for today
(so
> > far):
> 
> Sounds like this has to do more with the freshness of the passive 
> check.  If the master server thinks the check isn't fresh, it will
then 
> run an active check to see for itself.  I'd tune in the freshness, and

> keep in mind the scheduling of the checks.  If you configure your 
> freshness to expire at five minutes, and the slave server schedules
that 
> check for once every six minutes, you are going to get behaviour like
you 
> mentioned.

Thanks for your reply.  However the tests are scheduled to run every 30
minutes on both master and slave servers (confirmed by checking in
retention.dat file).  If you look in the original message you will see
that the master server is correctly running the command by freshness
checking ("Warning" messages) every 30 minutes, but the slave results
are at longer intervals ("EXTERNAL" messages) though roughly at some
number of 30 minute intervals.
What are the possibilities for results from command issued by the slave
getting lost ?  Why are OK results not recorded in the slave server logs
?

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory




More information about the Nagios-users mailing list