[Nagios-devel] Test Please: Buffer Slots Variable CVS Code

Ethan Galstad nagios at nagios.org
Wed Jan 3 03:12:33 UTC 2007


Ton Voon wrote:
> 
> On 22 Dec 2006, at 01:50, Ethan Galstad wrote:
> 
>> Based on the recent thread about hanging Nagios processes, I have 
>> removed the COMMAND_BUFFER_SLOTS and SERVICE_BUFFER_SLOTS definitions 
>> out to config file variables:
>>
>> external_command_buffer_slots=4096
>> check_result_buffer_slots=4096
>>
>> I have also updated nagiostats to report the avail/used number of slots 
>> for graphing in MRTG.  Could folks try out the latest 2.x CVS code and 
>> give it some testing?
> 
> Ethan,
> 
> Thanks for applying to CVS. Several comments:
> 
> - external_command_buffer_slots and check_result_buffer_slots only needs 
> to be an int as the circular_buffer struct only uses an int for items
> 
> - in xsddefault.c, when you print out external_command_buffer.items, I 
> think this is not thread-safe. My thread knowledge is pretty limited, so 
> please correct me if I am wrong. The main nagios process writes the 
> status data via xsddefault_save_status_data, which needs to read the 
> external_command_buffer variable. However, this variable is written to 
> by the command_file_worker_thread. So I think the 
> xsddefault_save_status_data routine needs a thread lock on 
> external_command_buffers before it can read the items data, otherwise 
> there is the potential for corrupt data. Note, there is a cost to that, 
> especially if the status data is being written with 
> aggregate_status_updates = 0.
> 
> - your output to status.dat is different from mine. You are outputting 
> max_external_command_buffer_slots (the value defined in nagios.cfg) and 
> used_external_command_buffer_slots (the current number of items in the 
> buffer). In my patch, I had a different definition: 
> max_command_buffer_items meant the "maximum number of items that has 
> been in the buffer". 
> 
> (I would prefer used_external_command_buffer_slots be changed to 
> current_external_command_buffer_slots because it more accurately 
> describes "this is the number I have now".)
> 
>  From now on, I'll call it high_external_command_buffer_items, as it can 
> also be the "high water mark of the number of items in the buffer". This 
> is a useful statistic as it tells you what the 
> max_external_command_buffer_slots should be to get no holdups.
> 
> Also, it probably makes sense to put the high water mark within the 
> circular_buffer struct.
> 
> Please find a patch attached with these changes.
> 
> On my small test system, the used_check_result_buffer_slots is usually 
> 0. When I introduce 1 fake slave (128 results per 10 seconds), 
> used_check_result_buffer fluctuates from 0 to 20s to 30s. Introducing a 
> 2nd fake slave, the high mark moves up to 100s. A 3rd slave moves the 
> high mark to 192.
> 
> If I introduce NDO into the system, I get a large iowait time (in the 
> 80%s), presumably database writes. The status file is not updated as 
> regularly (one instance of 60 seconds between writes), but when it does, 
> then the high_* values jump up to the 200-300s. This is a poorly 
> configured database, so I'm guessing that there are delays due to the 
> main nagios process passing data to the the broker module.
> 
> At the moment with 2 slaves sending 128 packets per 10 seconds, I'm 
> getting high values of 983 for external commands and 1405 for check results.
> 
> I think these recent changes help with seeing if there are bottlenecks 
> at the reading of the command pipe, but I think there are possibly other 
> slow downs further down the chain (which Nagios 3 may aid with).
> 
> Ton


Good suggestions.  I have applied almost identical patches to CVS based 
on your comments.  Thanks!



Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org




More information about the Nagios-devel mailing list