[Nagios-devel] Latencies with process-perfdata command

Hendrik Baecker andurin at process-zero.de
Tue Jan 30 19:11:49 UTC 2007


Hi List,

I figured out, that I run into some latency problems if I use a
process-perfdata command to handle plugin performance data with external
scripts.

So I am asking myself, what would be the best way (in thought of not
getting into latency troube) to handle the perfdata.

My external processing scripts (perl) are running quiet fast, but I
think, that every process that nagios has to fork is pure balast for the
performance.

For testing I've wrote a small c programm, that only checks the right
calling of the programm itself (right settings for command line
options), forks itself and after successfully fork kills the father
(mother?) process. The forked child process then does its job.

I thought it would be better for nagios if it gets the fact that, the
external command exits as fast as possible. This little (and yes - a
little bit dirty) programm is doing the job quiet fine and my latencies
are blown nearly away.
But not in that way that I was thinking of.

So I am now at a point to say that the execution time of external
scripts and the nagios check latencies are in some kind of coherence.

No kind of processing perfdata => no latency
directly run an external perl script => up to horrible latency
faking a fast exit to nagios through forking => nicer latency

Actual I am thinking about solving this issue in some way like this:

Don't using an external command but let nagios write the perfdata files
as described in nagios documentation and nagios.cfg and write a small
daemon to read the file and process the perfdata as "usual" (writing
rrdfiles).

So, what I want to discuss is the best way to do the job of processing
performance data from compatible plugins, and I want to understand why
it seems to be bad for nagios to execute external commands with an
execution time of a few seconds.

AFAIK, if nagios should execute external command, it forks a shell to do
this and keeps the control of this child to kill it if it need more time
than given timeout. But where lies the problem if I fork within my
external command and exit the father process as soon as possible to keep
the execution time for nagios as smal as possible?

Hope someone is understanding my question and my problem.

Btw: I am not really looking for help to solve my latencies within
nagios, normal_check_interval and other performance tweeks. I am running
multiple nagios instances on the same hardware and the system has an
average load of 10 - yes, I know that latencies are as programmed with
those high loads. But my check latencies are going down to one or two
seconds if I fully disable the process_perfdata command within nagios.cfg.

Thanks in advance for every kind of answer (except of ndr and auto
replies ;o) )

Regards,
Hendrik




More information about the Nagios-devel mailing list