[Nagios-users] Nagios kept from restarting after reboot by lock file

On 12/20/10 8:16 AM, eric.berg at barclayscapital.com wrote:
> Alternatively, could you recommend a good system/resource monitoring tool that would be able to let me know if nagios is down and restart it automatically?
Add a cronjob on a five (or whatever you're comfortable with) minute 
interval, similar to:

PID=`cat /home/nagios/nagios/var/nagios.lock`
PIDTEST=`kill -0 ${PID} 2>&1 >/dev/null`

if [ "${PIDTEST}" -eq "1" ]
     rm /home/nagios/nagios/var/nagios.lock
     echo "Killed Lockfile and restarted Nagios" | mail -s "Nagios 
restart `hostname`" your-email at here.com

Just be aware that it'll also trigger that if block, if nagios is 
running under a different username.  You can check for that by doing 
some tests in the script with ps and grep.

> Gee, this seems like an annoying newbie problem, but if Nagios crashes or is killed (as on system reboot), it leaves a lock file around that prevents it from starting again until the lock file is manually removed.
> I see this on Monday mornings after weekend reboots on a Red Hat Linux box:
> nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its already held by another instance of Nagios (PID 0).  Bailing out...
Sounds like something in the shutdown process is throwing a 0 into the 
pid file, or the startup in the rc script is.

Either way, you should never have a 0 in there, either the rc script is 
putting the wrong data in there, or it's reporting incorrectly.
> Does anyone know if there's a config option or something else that obviates the need to write a wrapper scropt to check to see if Nagios is really running and remove the lock file (look slike Nagios already knows it's not running by virtue of the value of the PID inthis very message!) so that it can cleanly start up again?

