In the last days EventLogging has been sending some validation alerts that claim a slight difference between the number of raw and validated events. Those alerts have normally a 2 minute duration.
We should look into that to determine the root cause, even if those seem not to be critical.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Duplicate | None | T106495 Change Icinga graphite alert for EventLogging delay | |||
Resolved | mforns | T105167 Troubleshoot EventLogging validation alerts {oryx} [3 pts] |
Event Timeline
By listing all alert timestamps and durations I could not find any matching anomaly in graphite, the logs or the database.
Dan gave the idea that maybe icinga is getting semi-up-to-date metrics from graphite. I personally have seen lots of times how graphite metrics update in weird ways, and usually take seconds or minutes until the updates are consistent. Probably icinga is querying graphite for the last minute's metrics, but those are incomplete, generating thus false alerts.
The fact that all alerts have the (approximately) same duration of 2 minutes also makes me think that this theory is correct.
As there is no sign of erroneous functioning at the time of the alerts, I will push this task to done.
We can consider, if possible, to make icinga query for metrics with 1 minute lag maybe?
verifying this is done... It's not an EventLogging problem.
new task created to adjust Icinga alerts: T106495