[Nagios-users] Nagios kept from restarting after reboot by lockfile

eric.berg at barclayscapital.com eric.berg at barclayscapital.com
Tue Dec 21 00:58:47 UTC 2010


We reboot all of our hosts on a weekly basis.  I used to price myself in keeping my boxes up as long as possible, but having spent years now supporting mission-critical financial production applications, I'm on board with the weekly reboots.  Lets you know early if some system or app change is problematic.

Reboot is being done via a standard reboot command.  

I've looked around for rc scripts that might address this issue, but haven't found any.  Got any pointers?

Regarding the rc.local solution, a) I'd prefer to solve the problem, not just address the symptoms, and b) elsewhere in this thread I've described the roadblocks that we have to doing anything a system level.  Yep, that's right, boys, we survive in the app developer layer within which we do not have root on these boxes.  It's a tedious, time-consuming, frustrating, productivity-killing endeavor to do just about anything you can't do yourself.

So....got any sample RC scripts, or command line params to nagios to make it smart enough to know that the PID that is in it's PID file isn't an active process?

Thanks.

Eric

> -----Original Message-----
> From: Daniel Wittenberg [mailto:daniel.wittenberg.r0ko at statefarm.com] 
> Sent: Monday, December 20, 2010 11:56 AM
> To: Nagios Users List
> Subject: Re: [Nagios-users] Nagios kept from restarting after 
> reboot by lockfile
> 
> Couple questions
> 1)  Why do you have to reboot your monitoring server weekly?
> 2) How is the reboot being done?
> 
> Reason I ask 2) is because the standard rc script will remove the
> lockfile when nagios is told to stop.  So if you are having 
> this problem
> is sounds like you are not doing a clean shutdown and 
> something could be
> wrong.
> 
> Either way, I guess worst case one way to check for this would be put
> something like this in your /etc/rc.d/rc.local:
> rm -f /var/lock/subsys/nagios
> 
> Assuming that's where your lockfile is. 
> 
> Dan
> 
> 
> -----Original Message-----
> From: eric.berg at barclayscapital.com
> [mailto:eric.berg at barclayscapital.com] 
> Sent: Monday, December 20, 2010 10:16 AM
> To: eric.berg at barclayscapital.com; nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Nagios kept from restarting after 
> reboot by
> lockfile
> 
> Alternatively, could you recommend a good system/resource monitoring
> tool that would be able to let me know if nagios is down and 
> restart it
> automatically?
> 
> _____________________________________________
> From:   Berg, Eric: IT (NYK)
> Sent:   Monday, December 20, 2010 11:03 AM
> To:     'nagios-users at lists.sourceforge.net'
> Subject:        Nagios kept from restarting after reboot by lock file
> 
> Gee, this seems like an annoying newbie problem, but if Nagios crashes
> or is killed (as on system reboot), it leaves a lock file around that
> prevents it from starting again until the lock file is 
> manually removed.
> 
> I see this on Monday mornings after weekend reboots on a Red Hat Linux
> box:
> 
> nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' looks like its
> already held by another instance of Nagios (PID 0).  Bailing out...
> 
> Does anyone know if there's a config option or something else that
> obviates the need to write a wrapper scropt to check to see 
> if Nagios is
> really running and remove the lock file (look slike Nagios 
> already knows
> it's not running by virtue of the value of the PID inthis 
> very message!)
> so that it can cleanly start up again?
> 
> Thanks.
> 
> Eric
> 
> _______________________________________________
> 
> This e-mail may contain information that is confidential, 
> privileged or
> otherwise protected from disclosure. If you are not an intended
> recipient of this e-mail, do not duplicate or redistribute it by any
> means. Please delete it and any attachments and notify the sender that
> you have received it in error. Unless specifically indicated, this
> e-mail is not an offer to buy or sell or a solicitation to buy or sell
> any securities, investment products or other financial product or
> service, an official confirmation of any transaction, or an official
> statement of Barclays. Any views or opinions presented are 
> solely those
> of the author and do not necessarily represent those of Barclays. This
> e-mail is subject to terms available at the following link:
> www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
> to the foregoing.  Barclays Capital is the investment banking division
> of Barclays Bank PLC, a company registered in England (number 1026167)
> with its registered offic
>  e at 1 Churchill Place, London, E14 5HP.  This email may relate to or
> be sent from other members of the Barclays Group.
> _______________________________________________
> 
> --------------------------------------------------------------
> ----------
> ------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> --------------------------------------------------------------
> ----------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS 
> when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 



More information about the Nagios-users mailing list