[Nagios-users] Using two nagios servers...

Chris Beattie cbeattie at geninfo.com
Fri Oct 15 19:23:00 UTC 2010

Wow, I completely forgot that I’d responded to this.  This is what I do.  If you use this script, you’ll want to change the notification e-mail address, where it will send notifications when the failover server decides it needs to take over and when it decides to yield to the primary if the primary has come back online.



Failover Configuration


On the failover server, install the same OS the same way it's installed on the

primary monitoring server, but set Nagios to not start in runlevels 3 and 5 or

else the failover checking script will generate e-mail notifications when the

failover server is rebooted (Nagios will start before the failover server

notices it's running on the primary, and the message will come when the fail-

over server shuts the failover Nagios down).


On the failover server, generate a public/private key pair.  This is necessary in

order to avoid having to type in a password every time the state of the Nagios

process on the primary server is checked:


       # ssh-keygen -t rsa


Take the default name and location (/root/.ssh/id_rsa and id_rsa.pub).  Do not enter

a passphrase.


Copy id_rsa.pub to the primary server:


       # rsync -avzu id_rsa.pub primaryserverhostname:/root/.ssh/


On the primary server, append the id_rsa.pub to authorized_keys2:


       cat id_rsa.pub >> $HOME/.ssh/authorized_keys2

       chmod 0600 authorized_keys2


Download, compile, and install Nagios on the failover server the same way it's

installed on the primary server.


Create a script named nagios_check.sh in /root/:





       alertaddress='you at yourdomain'



       touch failed_nagios_checks

       failedchecks=$(cat failed_nagios_checks)


       if [[ -z "${1}" ]]


              echo Usage: nagios_check hostname




       nagiosstatusnow=$(${nagiospath}/libexec/check_by_ssh -H ${1} --command='/usr/local/nagios/libexec/check_nagios --filename=/usr/local/nagios/var/status.dat --expires=1 --command=nagios')


       nagiosrunninglocally=$(/etc/init.d/nagios status)


       if [[ "${nagiosstatus}" = "NAGIOS OK" ]]


              echo -ne "[`date`] ${nagiosstatus} on ${1}. "

              if [[ "${nagiosrunninglocally%% *}" = "nagios" ]]


                     echo -e Nagios is currently running on the failover server, and needs to be stopped.

                     /etc/init.d/nagios stop

                     /usr/bin/printf "%b" "[`date`] Nagios recovery on ${1} detected.  Stopping failover Nagios.\n\n${nagiosstatusnow}" | /bin/mail -s "Nagios recovery on ${1}" ${alertaddress}


              echo -e "Failed ${failedchecks} checks: synchronizing files.  Status: ${nagiosstatusnow} "

              echo 0 > failed_nagios_checks

              rsync --quiet --archive --compress --delete-during --exclude=var/spool/checkresults/* --exclude=var/archives/* --exclude=*~ --exclude=nagios.lock --exclude=nagios.cmd ${1}:${nagiospath} /usr/local


              failedchecks=$((${failedchecks} + 1))

              echo ${failedchecks} > failed_nagios_checks

              if [[ "${failedchecks}" -lt "${maxfaillimit}" ]]


                     echo -e "[`date`] Uh-oh! Failed ${failedchecks} out of ${maxfaillimit} checks.  Status: ${nagiosstatusnow} "


              if [[ "${failedchecks}" -ge "${maxfaillimit}" ]]


                     echo -ne "[`date`] ${nagiosstatus} on ${1}. "

                     if [[ "${nagiosrunninglocally%% *}" = "No" ]]


                           echo -e " Failed ${failedchecks} checks, and needs to be started on the failover server. "

                           /etc/init.d/nagios start

                           /usr/bin/printf "%b" "[`date`] Nagios on ${1} has failed ${failedchecks} checks.  Starting Nagios on failover server.\n\n${nagiosstatusnow}" | /bin/mail -s "Nagios failure on ${1}" ${alertaddress}


                           echo -e "Failed ${failedchecks} checks, but is already running on the failover server. "







Make it executable by root:


       chmod u+x nagios_check.sh


Run crontab -e as root and add this line:


       * * * * * /root/nagios_check.sh primaryserverhostname >> /var/log/nagios_check.log 2>&1


The *s set it to run every minute.  The output is redirected to a log file, and the 2>&1 redirects both STDOUT and STDERR.


At the top of the every minute now, the failover server will obtain a current replica of the

primary server's Nagios status (comments, acknowlegements, downtime, configuration files, etc).


Add a file in /etc/logrotate.d called nagios_check:



       /var/log/nagios_check.log {








From: quanta [mailto:quanta.linux at gmail.com] 
Sent: Wednesday, October 13, 2010 7:17 AM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Using two nagios servers...


Try something like this:


RETURN_STATUS=`/usr/local/nagios/libexec/check_nrpe -H <primary_host> -c check_nagios | awk -F: '{ print $1 }' | awk '{ print $2 }'`
if [ $RETURN_STATUS != "OK" ]; then
    sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg
    sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg
    sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg
    sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg
sudo /etc/init.d/nagios reload

Note: you must add nagios user to sudoers group (without password prompt).

On 08/16/2010 02:44 PM, ravishankar.gundlapali at wipro.com wrote: 



Even I run Nagios on Virtual machines.


Please let me know where can I get the support for running cron job on my secondary Nagios server to monitor the Nagios service on primary Nagios server?



Ravi G


From: Chris Beattie [mailto:cbeattie at geninfo.com] 
Sent: Monday, August 16, 2010 6:51 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Using two nagios servers...


Your servers will probably be fine servicing the extra Nagios polling, unless they are overloaded already.


Since I run Nagios on virtual machines, however, I tried to keep the load on my failover Nagios server minimized.  My failover Nagios server runs a cron job that uses the check_nagios plugin to monitor the state of the primary Nagios server.  If the primary server is up and running, the failover server will just rsync the state and configuration files from the primary.  If the primary server becomes unavailable, the cron job will start the Nagios service on the failover server and keep it running until it detects the primary has recovered.


From: ravishankar.gundlapali at wipro.com [mailto:ravishankar.gundlapali at wipro.com] 
Sent: Monday, August 16, 2010 7:45 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Using two nagios servers...


Hi All,


I am planning to configure all the servers in my client environment in two Nagios servers(in two different locations) in order to create Back up.


Please let me know whether there will be any overload on the servers as two Nagios servers will be polling them.




Ravi G

This SF.net email is sponsored by 
Make an app they can't live without
Enter the BlackBerry Developer Challenge
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nagios.com/pipermail/nagios-users/attachments/20101015/e1c7981f/attachment.html>

More information about the Nagios-users mailing list