Tracking task to fail services back to alert1001 reasonably soon after switch maintenance has completed
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | MoritzMuehlenhoff | T253824 planned upstream deprecation of the ssh-rsa signing algorithm (RSA with SHA-1) | |||
| Resolved | ayounsi | T254013 all network devices must run OpenSSH >= 7.2p1 but != 7.4p1 | |||
| Resolved | ayounsi | T317175 Junos: resolve DNS through mgmt_junos | |||
| Resolved | ayounsi | T327862 Use mgmt_junos on all network devices | |||
| Restricted Task | |||||
| Resolved | ayounsi | T316539 Upgrade network devices to Junos 20+ | |||
| Resolved | ayounsi | T327248 eqiad/codfw virtual-chassis upgrades | |||
| Resolved | Clement_Goubert | T327920 March 2023 Datacenter Switchover | |||
| Resolved | ayounsi | T331882 eqiad row C switches upgrade | |||
| Resolved | herron | T333478 failover alert1001 to alert2001 | |||
| Resolved | herron | T333837 failover alert2001 to alert1001 |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2023-04-24T14:07:09Z] <herron> beginning alert host failover from alert2001 to alert1001 T333837
Mentioned in SAL (#wikimedia-operations) [2023-04-24T14:07:43Z] <herron> disabled icinga meta monitoring on wikitech-static T333837
Change 910878 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] Revert "alerting_host: failover icinga and alertmanger from eqiad to codfw"
Change 910879 had a related patch set uploaded (by Herron; author: Herron):
[operations/dns@master] Revert "dns: repoint alert host services to alert2001"
Change 910878 merged by Herron:
[operations/puppet@production] Revert "alerting_host: failover icinga and alertmanger from eqiad to codfw"
Change 910879 merged by Herron:
[operations/dns@master] Revert "dns: repoint alert host services to alert2001"
Mentioned in SAL (#wikimedia-operations) [2023-04-24T14:31:43Z] <herron> re-enabled icinga meta monitoring on wikitech-static T333837