Page MenuHomePhabricator

failover alert1001 to alert2001
Closed, ResolvedPublic

Description

To avoid a monitoring outage during the maintenance in the parent task we need to fail services on alert1001 over to alert2001 before maintenance begins.

Event Timeline

Change 899629 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] alerting_host: failover icinga and alertmanger from eqiad to codfw

https://gerrit.wikimedia.org/r/899629

herron triaged this task as High priority.Mar 29 2023, 6:14 PM

Change 904614 had a related patch set uploaded (by Herron; author: Herron):

[operations/dns@master] dns: repoint alert host services to alert2001

https://gerrit.wikimedia.org/r/904614

disabled icinga meta monitoring on wikitech-static

Change 899629 merged by Herron:

[operations/puppet@production] alerting_host: failover icinga and alertmanger from eqiad to codfw

https://gerrit.wikimedia.org/r/899629

Change 904614 merged by Herron:

[operations/dns@master] dns: repoint alert host services to alert2001

https://gerrit.wikimedia.org/r/904614

re-enabled icinga meta monitoring on wikitech-static

resolving as alerting host services are now running from alert2001