[Nagios-devel] [Nagios-users] external commands and segfault -- again

Andurin andurin at process-zero.de
Wed Jan 31 11:54:18 UTC 2007


Hi List,

I am afraid to say, there seems to be a further or a new bug within the
downtimes in Nagios 2.7 (non-CVS).

I've created a downtime for a parent host with the option "Schedule
triggered downtime for all child hosts" from:

today 12:00:00 up to 12:10:00

and after this a second downtime from

today 12:09:00 up to 12:20:00

So these two downtime for one parent with a few child hosts overlaps
each other with one minute.

After the first downtime ends nagios dies with a Segfault.

More bad news:
I have tried to use the unstripped binary with the gnu debugger to catch
the buggy lines... but the segfault does not occur.

I know that a collegue of mine has the same problem on his nagios server.

Here is the snippet of my nagios.log

[1170240406] EXTERNAL COMMAND:
SCHEDULE_AND_PROPAGATE_TRIGGERED_HOST_DOWNTIME;vvcdo-atm-R1;1170240379;1170240900;1;0;7200;baeckerh;Testing
Downtimes
[1170240406] HOST DOWNTIME ALERT: vvcdo-atm-R1;STARTED; Host has entered
a period of scheduled downtime
[1170240406] HOST DOWNTIME ALERT: child-berlin;STARTED; Host has entered
a period of scheduled downtime
.... more logs about the childs...

[1170240426] EXTERNAL COMMAND:
SCHEDULE_AND_PROPAGATE_TRIGGERED_HOST_DOWNTIME;vvcdo-atm-R1;1170240840;1170241080;1;0;7200;baeckerh;Testing
Downtimes 2
[1170240436] Auto-save of retention data completed successfully.
[1170240496] Auto-save of retention data completed successfully.
[1170240556] Auto-save of retention data completed successfully.
[1170240616] Auto-save of retention data completed successfully.
[1170240676] Auto-save of retention data completed successfully.
[1170240736] Auto-save of retention data completed successfully.
[1170240796] Auto-save of retention data completed successfully.
[1170240856] Auto-save of retention data completed successfully.

BANG!

[1170240922] Nagios 2.7 starting... (PID=27191)
[1170240922] LOG VERSION: 2.0

How can I try to get further informations why nagios segfaults when
using the unstripped binary or the gdb are not catching the segfault?

Kind regards
Hendrik

Ethan Galstad schrieb:
> Andreas Ericsson wrote:
>   
>> bobi at netshel.net wrote:
>>     
>> <snip> many great error descriptions
>>
>>     
>
> Hmmmm... this is not good.  I just looked through the source code and 
> found a bug that looks like it could be the cause of the problem. There 
> are actually two potential segfault scenarios that I found are they have 
> been around for a long time...
>
> 1. If a scheduled downtime entry is manually deleted/cancelled, the 
> corresponding event in the event queue is not removed.  The event item 
> still contains a pointer to the (now deleted) downtime entry.  This can 
> cause a segfault.
>
> 2. There was another code segment in downtime.c where when a downtime 
> entry was deleted, it was deleted and then later referenced when Nagios 
> searched through other downtime entries to see if they were triggered by 
> the original (deleted) downtime.  Why this hasn't caused segfaults every 
> time a downtime entry is deleted is beyond me.
>
> At any rate, I have just posted a patch to the 2.x branch of CVS.  The 
> patch changes the way scheduled downtime is referenced from the event 
> queue.  Instead of storing a pointer to the downtime data struct, the 
> downtime id number is now used instead.  The timed event handler will 
> search for a downtime entry matching the id before it does anything.  If 
> the downtime was already deleted, its okay.  Give it a try and see if 
> things improve.
>
> Unfortunately, this patch will now break the ndoutils addon (yesterday's 
> release, as well as earlier revisions).  I'll get a patch in CVS shortly 
> to fix this.  Thanks for the great problem description!
>
>
>
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios at nagios.org
> Website: http://www.nagios.org
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>   





More information about the Nagios-devel mailing list