[Nagios-checkins] SF.net SVN: nagios:[2035] nagioscore/trunk/base/utils.c

ageric at users.sourceforge.net ageric at users.sourceforge.net
Thu Aug 2 00:44:52 UTC 2012


Revision: 2035
          http://nagios.svn.sourceforge.net/nagios/?rev=2035&view=rev
Author:   ageric
Date:     2012-08-02 00:44:52 +0000 (Thu, 02 Aug 2012)
Log Message:
-----------
core: Fix deleting too old check result files

Even under pretty normal circumstances, the check result spool dir
can fill up with a tremendous amount of check result files, which kills
Nagios' performance completely.

The problem is reloads, where old checks may be abandoned in case
they take too long to finish. In that case, half the check result file
is stashed in the spool directory (the other half is only written as
the check returns). With a huge amount of checks and semi-frequent
restarts, the checks will start to accumulate and Nagios will spend
more and more time scanning a huge directory of files where very few of
the check result files have ".ok" files accompanying them, leading to
a ton of cache-misses when we try to stat() the ".ok" file.

This patch fixes it by using the mtime from the stat call earlier in
the chain so even check results without an ".ok" file can be deleted.

Signed-off-by: Andreas Ericsson <ae at op5.se>

Modified Paths:
--------------
    nagioscore/trunk/base/utils.c

Modified: nagioscore/trunk/base/utils.c
===================================================================
--- nagioscore/trunk/base/utils.c	2012-08-02 00:44:36 UTC (rev 2034)
+++ nagioscore/trunk/base/utils.c	2012-08-02 00:44:52 UTC (rev 2035)
@@ -2055,11 +2055,14 @@
 			if (!S_ISREG(stat_buf.st_mode))
 				continue;
 
+			/* at this point we have a regular file... */
 
+			/* if the file is too old, we delete it */
+			if (stat_buf.st_mtime + max_check_result_file_age < time(NULL)) {
+				delete_check_result_file(dirfile->d_name);
+				continue;
 				}
 
-			/* at this point we have a regular file... */
-
 			/* can we find the associated ok-to-go file ? */
 			asprintf(&temp_buffer, "%s.ok", file);
 			result = stat(temp_buffer, &ok_stat_buf);

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.





More information about the Nagios-commits mailing list