[Nagios-devel] [PATCH] Re: alternative scheduler

Fredrik Thulin ft at it.su.se
Fri Dec 3 13:05:52 UTC 2010

On Fri, 2010-12-03 at 12:55 +0100, Andreas Ericsson wrote:
> > No, actually not. Erlang is a soft real time system. My approach was to
> > ask the Erlang VM to send me a tick every N ms (N = 300s * 1000 / number
> > of checks). So if N is 50, the VM will signal me once every 50 ms, very
> > precisely and without any drift.
> > 
> If N is constant, it can't be the lvalue of the above expression.

I meant to say that N is calculated when the list of checks is
(re)loaded. As I don't even try to have retry_intervals and such, a
steady tick interval works great as long as I can finish initiating
another service check in between ticks. 

Note that I say initiate, not complete - I have more cores that can
finish the job of starting the check.

Applying back pressure to the spawner when there in fact *isn't* enough
system resources to start checks is an interesting topic that I don't
have any clear ideas about how to do. My naive implementation was to not
ever queue tick signals, but rather skip them if I couldn't finish
processing a tick before additional ticks arrive.

> > I then just had to finish starting another check command in =<  49 ms,
> > and go back to sleep. All handling of check results is done completely
> > asynchronous to this starting of new checks.
> > 
> > This is all in src/npers_spawner.erl if anyone is interested in the
> > details.
> >
> That's still "doing more than you did before", on a system level, so the
> previous implementation must have been buggy somehow. Perhaps erlang
> blocked a few signals when the signal handler was already running, or
> perhaps you didn't start enough checks per tick?

I agree it is more work for the scheduler, but that is better than
having under-utilized additional CPUs/cores, right?

> If the above expression was correct (N is not constant), this algorithm
> makes the cost for running a single check exponential with the number of
> checks to run. Ie, the more checks you have, the more expensive each check
> will become. The curve will converge on (infinity - 1) faster with a larger
> exponent. In this case, the exponent is ticks/sec, so reducing the ticktime
> means you're effectively reducing performance unless there are other
> factors involved that shaves enough cycles to make this change disappear
> in the noise.

Sorry, you lost me here. Perhaps I just failed to explain what N was?


More information about the Nagios-devel mailing list