[Nagios-users] Nagios kept from restarting after reboot by lock file

Mike Lindsey mike-nagios at 5dninja.net
Tue Dec 21 06:37:48 UTC 2010

On 12/20/10 8:16 AM, eric.berg at barclayscapital.com wrote:
> Alternatively, could you recommend a good system/resource monitoring tool that would be able to let me know if nagios is down and restart it automatically?
Add a cronjob on a five (or whatever you're comfortable with) minute 
interval, similar to:

PID=`cat /home/nagios/nagios/var/nagios.lock`
PIDTEST=`kill -0 ${PID} 2>&1 >/dev/null`

if [ "${PIDTEST}" -eq "1" ]
     rm /home/nagios/nagios/var/nagios.lock
     echo "Killed Lockfile and restarted Nagios" | mail -s "Nagios 
restart `hostname`" your-email at here.com

Just be aware that it'll also trigger that if block, if nagios is 
running under a different username.  You can check for that by doing 
some tests in the script with ps and grep.

> _____________________________________________
> From:   Berg, Eric: IT (NYK)
> Sent:   Monday, December 20, 2010 11:03 AM
> To:     'nagios-users at lists.sourceforge.net'
> Subject:        Nagios kept from restarting after reboot by lock file
> Gee, this seems like an annoying newbie problem, but if Nagios crashes or is killed (as on system reboot), it leaves a lock file around that prevents it from starting again until the lock file is manually removed.
> I see this on Monday mornings after weekend reboots on a Red Hat Linux box:
> nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 0).  Bailing out...
Sounds like something in the shutdown process is throwing a 0 into the 
pid file, or the startup in the rc script is.

Either way, you should never have a 0 in there, either the rc script is 
putting the wrong data in there, or it's reporting incorrectly.
> Does anyone know if there's a config option or something else that obviates the need to write a wrapper scropt to check to see if Nagios is really running and remove the lock file (look slike Nagios already knows it's not running by virtue of the value of the PID inthis very message!) so that it can cleanly start up again?

Mike Lindsey

More information about the Nagios-users mailing list