[Nagios-devel] Nagios sometimes shows wrong status

Michael Prochaska michael at prochas.net
Wed May 27 08:52:32 UTC 2009


Hi!

I've seen a strange behavior of nagios with a very simple check script.

the relevant part of the script:
#########################################################################
MAINTCNT="`/usr/sbin/metastat |grep -i maint |wc -l`"
RESYNCNT="`/usr/sbin/metastat |grep -i resync |wc -l`"

NOTOK=0
status=$STATE_UNKNOWN

if [ $RESYNCNT -gt 0 ]; then
        NOTOK=1
        TEXT="WARNING - One or more disks are in resync state. "
        status=$STATE_WARNING
fi

if [ $MAINTCNT -gt 0 ]; then
        NOTOK=1
        TEXT="CRITICAL - One or more disks are in maintenance state."
status=$STATE_CRITICAL
fi


if [ $NOTOK -eq 1 ]; then
        echo $TEXT
        datum=`date`
        echo $datum $status >> /tmp/svm.debug
        exit $status
fi

echo "OK - There is no maintenance necessary!"
exit $STATE_OK

#########################################################################

when executing the script from command line, the return code always is 2
and the output always is "CRITICAL - One or more disks are in maintenance
state." (because there is one dead disk) => thats ok

when nagios executes the script, the output always is "CRITICAL - One or
more disks are in maintenance state." but the return code sometimes is 0
and sometimes is 2 => thats not good

snippet from nagios.log:
[1243410051] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243410063] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410061
[1243410071] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243410083] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410081
[1243410091] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243410124] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410122
[1243410131] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243411031] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243411316] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243411323] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411320
[1243411326] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243411363] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411361
[1243411366] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
more disks are in maintenance state.
[1243411370] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411368
[1243411376] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
One or more disks are in maintenance state.
[1243411391] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411389
[1243411396] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;2;CRITICAL -
One or more disks are in maintenance state.
[1243411398] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411396
[1243411406] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;3;CRITICAL -
One or more disks are in maintenance state.
[1243411407] EXTERNAL COMMAND:
SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411405



/tmp/svm.debug confirmes the command line result:
> cat /tmp/svm.debug
Wed May 27 08:21:33 GMT 2009 2
Wed May 27 08:22:28 GMT 2009 2
Wed May 27 08:22:39 GMT 2009 2
Wed May 27 08:22:46 GMT 2009 2
Wed May 27 08:23:00 GMT 2009 2
Wed May 27 08:23:11 GMT 2009 2
Wed May 27 08:23:46 GMT 2009 2
Wed May 27 08:24:01 GMT 2009 2
Wed May 27 08:27:09 GMT 2009 2
Wed May 27 08:27:19 GMT 2009 2
Wed May 27 08:27:35 GMT 2009 2
Wed May 27 08:27:50 GMT 2009 2
Wed May 27 08:27:56 GMT 2009 2
Wed May 27 08:29:01 GMT 2009 2
Wed May 27 08:32:55 GMT 2009 2
Wed May 27 08:34:01 GMT 2009 2
Wed May 27 08:37:55 GMT 2009 2
Wed May 27 08:39:01 GMT 2009 2
Wed May 27 08:39:55 GMT 2009 2
Wed May 27 08:44:01 GMT 2009 2
Wed May 27 08:44:55 GMT 2009 2

and so on.....

any ideas whats going here wrong?


best regards,
michael







More information about the Nagios-devel mailing list