False Positives in WhatsUp Gold

0

Have you ever gotten an alert in the middle of the night, just to login and check to see everything is good and happy? Then just as you’re about to logoff WhatsUp Gold labels the device ‘Up’ again? Being a former system administrator, I had to deal with that on just one occasion. That is when I learned about timeout/retry values on active monitors within WhatsUp Gold. The problem is, by default a lot of the timeout/retry values on the active monitors are too aggressive. For example, the ‘Ping’ active monitor (one of the number one offenders) has a default timeout of 1 second with 1 retry. So let me lay out a scenario. WhatsUp Gold polls active monitors every 60 seconds by default. Let’s say you have your action policy set to e-mail you immediately when down. That means if the system I ping drops 2 packets when the polling command is sent, it is going to be labeled down and you will end up getting an e-mail. The device will continue to be labeled down until a successful polling cycle occurs, which could be the next polling cycle — or more. Let’s say I have the active monitor’s timeout and retry values set higher. What I typically use is a timeout of 8 seconds with 2 retries. Under that same scenario, the system drops a couple of the ICMP requests but remains labeled ‘Up’, because it responded to the subsequent ones due to the higher timeout and retry values.

What is important to note is that every monitor within WhatsUp Gold (excluding WMI based monitors) have an adjustable value for timeout and retries. Now, don’t go crazy and adjust them all if you don’t have to! Simply adjust the offending active monitor. To verify, when a monitor goes down refer to the ‘Device Status’ page and click on the ‘General’ tab. In there you will see ‘State Change Log’ for that device. If the monitor message shows ‘Timeout’ as the problem, then you’re good to go ahead and adjust the timeout for that monitor. Note that, adjusting the value is done within the active monitor library and thus applies that timeout/retry to *ALL* devices that have that monitor applied. From experience, the monitors that need to be adjusted more frequently are ping, interface, and power supply. Adjust them to 8 second timeout with 2 retries as recommend above. If you still see the issue, increase it a bit more.

Post your comment