Common information
- dashboard: TODO
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- alertname: ManagementSSHDown
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- severity: task
- site: esams
- source: prometheus
- team: dcops
Firing alerts
- dashboard: TODO
- description: The management interface at re0.cr2-esams:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for re0.cr2-esams:22
- alertname: ManagementSSHDown
- instance: re0.cr2-esams:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: OE14
- severity: task
- site: esams
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at re1.cr2-esams:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for re1.cr2-esams:22
- alertname: ManagementSSHDown
- instance: re1.cr2-esams:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: OE14
- severity: task
- site: esams
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at asw2-esams.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for asw2-esams.mgmt:22
- alertname: ManagementSSHDown
- instance: asw2-esams.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: OE16
- severity: task
- site: esams
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at re0.cr3-esams:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for re0.cr3-esams:22
- alertname: ManagementSSHDown
- instance: re0.cr3-esams:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: OE16
- severity: task
- site: esams
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at re1.cr3-esams:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for re1.cr3-esams:22
- alertname: ManagementSSHDown
- instance: re1.cr3-esams:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: OE16
- severity: task
- site: esams
- source: prometheus
- team: dcops
- Source