Page MenuHomePhabricator

Retire WatchMouse (CA DX APP)
Closed, ResolvedPublic

Assigned To
Authored By
Jan 13 2022, 5:11 PM
Referenced Files
"Love" token, awarded by lmata."Like" token, awarded by Ladsgroup."100" token, awarded by MoritzMuehlenhoff."Cup of Joe" token, awarded by herron.


The o11y team has discussed this internally and has decided to sunset Watchmouse as we do not seem to be obtaining great value from Watchmouse as external monitoring. Recent examples in T292603.

Historically the tool was initially used for the public status page and to expose some KPIs for external availability and an external uptime checker. Additional background: T81454, T85829, T89877, T79416

StatusPage effectively replaced Watchmouse recently (See T202061 and T285769), and the new stack is ready for production.

We also have enough redundancy with our existing Icinga and external testing infrastructure to sunset this tool without losing core functionality while reducing the scope of technology to support. In addition, we aim to improve external monitoring as part of plans and roadmap for alerting. However, we will not wait for the new solution's implementation as a dependency for moving forward with this decommission this quarter.

Checks currently defined in watchmouse / CA App Synthetic Monitor:

Note: "ops-critical-phone" is actually an email contact for "noc@" and "watchmouse@" (and watchmouse@ is further aliased to maint-announce@)

Enabled (Y default)NameTagsURLTypeContactReplacementRetirement Status
API check in icinga✅ deactivated in watchmouse
DNSwiki_platformwikipedia.orgDNSops-critical-phoneexisting check in icinga✅ deactivated in watchmouse
Dumps download to watchrat✅ deactivated in watchmouse
Gerrit opsexisting check in icinga✅ deactivated in watchmouse
https content - commons opsexisting check in icinga✅ deactivated in watchmouse
https services - commonswiki_platform check in icinga✅ deactivated in watchmouse
https services - foundationwikiwiki_platform to watchrat✅ deactivated in watchmouse
https services - loginwikiwiki_platform to watchrat✅ deactivated in watchmouse
https services - mediawikiwiki_platform to watchrat✅ deactivated in watchmouse
https services - wikibookswiki_platform check in icinga✅ deactivated in watchmouse
https services - wikidatawiki_platform check in icinga✅ deactivated in watchmouse
https services - wikinewswiki_platform to watchrat✅ deactivated in watchmouse
https services - wikipediawiki_platform check in icinga✅ deactivated in watchmouse
https services - wikiquotewiki_platform to watchrat✅ deactivated in watchmouse
https services - wikisourcewiki_platform to watchrat✅ deactivated in watchmouse
https services - wikiversitywiki_platform to watchrat✅ deactivated in watchmouse
https services - wikivoyagewiki_platform to watchrat✅ deactivated in watchmouse
https services - wiktionarywiki_platform to watchrat✅ deactivated in watchmouse
NIcinga (disabled)http://icinga.wikimedia.orgHTTPicingaretire✅ deactivated in watchmouse
Images & media (HTTPS)wiki_platform check in icinga✅ deactivated in watchmouse
Images & mediawiki_platform of above✅ deactivated in watchmouse
IRC✅ deactivated in watchmouse
Mail (SMTP)publicmx1001.wikimedia.orgSMTPops-critical-phoneexisting check in icinga✅ deactivated in watchmouse
Mobile sitewiki_platform opsexisting check in icinga✅ deactivated in watchmouse
Phabricator check in icinga✅ deactivated in watchmouse
Static assets (CSS/JS)wiki_platform opsretire✅ deactivated in watchmouse
Static assets (HTTPS - CSS/JS)wiki_platform opsadded to watchrat✅ deactivated in watchmouse
Wiki commons (s4)wiki_platform check in icinga✅ deactivated in watchmouse
NWiki commons (s4) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wiki platform [[w:de:Main Page]] (s5)wiki_platform to watchrat✅ deactivated in watchmouse
NWiki platform [[w:de:Main Page]] (s5) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wiki platform [[w:dsb:Main Page]] (s3)wiki_platform to watchrat✅ deactivated in watchmouse
NWiki platform [[w:dsb:Main Page]] (s3) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wiki platform [[w:en:Main Page]] (s1)wiki_platform checkin icinga✅ deactivated in watchmouse
NWiki platform [[w:en:Main Page]] (s1) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wiki platform [[w:en:Special:Random]]wiki_platform to watchrat✅ deactivated in watchmouse
Wiki platform [[w:fi:Main Page]] (s2)wiki_platform to watchrat✅ deactivated in watchmouse
NWiki platform [[w:fi:Main Page]] (s2) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wiki platform [[w:fr:Main Page]] (s6)wiki_platform to watchrat✅ deactivated in watchmouse
NWiki platform [[w:fr:Main Page]] (s6) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wiki platform [[w:uk:Main Page]] (s7)wiki_platform to watchrat✅ deactivated in watchmouse
NWiki platform [[w:uk:Main Page]] (s7) - UNCACHEDwiki_platform✅ deactivated in watchmouse
Wikimedia blog check in icinga✅ deactivated in watchmouse
wikimedia foundation mainpagewiki_platform (dupe of above)✅ deactivated in watchmouse
donate http✅ deactivated in watchmouse
donate https to watchrat (with email alert routing to fr-tech)✅ deactivated in watchmouse
frdata http✅ deactivated in watchmouse
frdata https to watchrat✅ deactivated in watchmouse
payments https to watchrat✅ deactivated in watchmouse
payments listener to watchrat✅ deactivated in watchmouse
banner load testing chrome8✅ deactivated in watchmouse
banner load testing FF3.6✅ deactivated in watchmouse
banner load testing ie6✅ deactivated in watchmouse
banner load testing ie7✅ deactivated in watchmouse
banner load testing ie8✅ deactivated in watchmouse
banner load testing safari4✅ deactivated in watchmouse
mr1-codfw OOBmr1-codfw.oob.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
mr1-eqiad OOBmr1-eqiad.oob.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
mr1-eqsin OOBmr1-eqsin.oob.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
mr1-esams OOBmr1-esams.oob.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
mr1-ulsfo OOBmr1-ulsfo.oob.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
Ping offload text-lb.codfwtext-lb.codfw.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
Ping offload text-lb.eqiadtext-lb.eqiad.wikimedia.orgPINGwikimedia opsexisting check in icinga✅ deactivated in watchmouse
secure.wikimedia.org to watchrat✅ deactivated in watchmouse
shop.wikimedia.orgpublic✅ deactivated in watchmouse
en wiki login -scriptScriptops-non-critical-mailretire✅ deactivated in watchmouse
icinga https port Danisretire✅ deactivated in watchmouse
icinga-https Danisretire✅ deactivated in watchmouse - scriptScriptops-non-critical-mailretire✅ deactivated in watchmouse

High level todo list:

  • set up replacement static check capability via prometheus blackbox exporter (nicknamed "watchrat")
  • export/import relevant watchmouse checks, audit checks and move needed checks to watchrat config
  • enable alerting
  • disable watchmouse checks

Event Timeline

After an internal discussion with the team, we are aiming for a target retirement at the end of Jan 2022.

Change 747550 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] prometheus: add blackbox generic \"watchrat\" http/s static check support

Change 747550 merged by Herron:

[operations/puppet@production] prometheus: add blackbox generic \"watchrat\" http/s static check support

Change 759297 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] watchrat: check URLs from watchmouse not already covered by icinga

herron updated the task description. (Show Details)
herron updated the task description. (Show Details)

Change 759302 had a related patch set uploaded (by Herron; author: Herron):

[operations/alerts@master] initial sketch of watchrat alert

Change 759297 merged by Herron:

[operations/puppet@production] watchrat: check URLs from watchmouse not already covered by icinga

Change 759302 merged by Herron:

[operations/alerts@master] watchrat: add http probe alerting with warning severity

herron updated the task description. (Show Details)

Change 761064 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] watchrat: route alerts to irc and noc@

Change 761403 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] watchrat: route donate.wm.o alerts to fr-ircmail

Change 761064 merged by Herron:

[operations/puppet@production] watchrat: route alerts to irc and noc@

Change 761403 merged by Herron:

[operations/puppet@production] watchrat: route donate.wm.o alerts to fr-ircmail

herron triaged this task as Medium priority.Feb 10 2022, 5:43 PM
herron updated the task description. (Show Details)

Change 761715 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] watchrat: add shop.wm.o to url list

Change 761715 merged by Herron:

[operations/puppet@production] watchrat: add shop.wm.o to url list

herron claimed this task.
herron subscribed.

All checks are now deactivated, resolving!

Change 771009 had a related patch set uploaded (by Herron; author: Herron):

[operations/alerts@master] watchrat: require 3+ sites to agree on error status before alerting

Change 771009 merged by jenkins-bot:

[operations/alerts@master] watchrat: require 3+ sites to agree on error status before alerting