Page MenuHomePhabricator

Move wikidata lag checks off Icinga
Closed, ResolvedPublic

Description

This is a followup in the context of migrating wikidata alerts to AlertManager (T287741). The remaining Icinga checks are mostly about wikidata lag, namely fetching https://www.wikidata.org//w/api.php?action=query&meta=siteinfo&format=json&siprop=statistics to compare the lag value to a threshold (in a shell script)

It was decided to move these checks to be grafana based and ditch the current icinga checks

Event Timeline

After talking to @Addshore it seems all those metrics we have for wikidata are actually can be accessed in grafana and we simply can migrated those to grafana. It seems better, less scattered and much easier to do.

I do it.

I would be happy to help with T240685 regardless.

Change 715772 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] Absent wikidata alerts

https://gerrit.wikimedia.org/r/715772

We have three alerts there:

  • Checking the median lag being below 600s
    • It was flaky and we set it to mute (or reduced its severity) and we never looked at it since, let's just nuke it
  • Checking the median lag being below 4000s
    • After the previous alert being flaky we decided on an hour, but (I kid you not), due to the nature of these kind of alerts, we had to go with 4000s
    • You want to know more? Look at the regex here (in ./modules/icinga/files/check_wikidata_crit: --ereg '"median":[^}]*"lag":([1-3]?[0-9]?[0-9]?[0-9]),')
  • Checking the median lag being below 4000s for test wikidata
    • I actually don't know about this one, we don't collect that info for statsd, I think it would be easy to do though. I nuked it for now and will add the dispatch to the script.
fgiunchedi renamed this task from Collect wikidata/siteinfo in Prometheus to Move wikidata lag checks off Icinga.Sep 1 2021, 8:10 AM
fgiunchedi updated the task description. (Show Details)

Thank you @Addshore and @Ladsgroup ! Much easier to go Grafana for now, I've retitled/repurposed the task and thanks for your help on T240685 !

Change 715772 merged by Filippo Giunchedi:

[operations/puppet@production] Drop wikidata alerts

https://gerrit.wikimedia.org/r/715772

Change 715961 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] Clean up absented files and unused configs

https://gerrit.wikimedia.org/r/715961

Change 715961 merged by Filippo Giunchedi:

[operations/puppet@production] Clean up absented files and unused configs

https://gerrit.wikimedia.org/r/715961

lmata subscribed.

Closing, please reopen if something differs from my assessment.