Common information
- dashboard: TODO
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- alertname: ManagementSSHDown
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- severity: task
- site: eqsin
- source: prometheus
- team: dcops
Firing alerts
- dashboard: TODO
- description: The management interface at cr3-eqsin.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for cr3-eqsin.mgmt:22
- alertname: ManagementSSHDown
- instance: cr3-eqsin.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: 603
- severity: task
- site: eqsin
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at asw1-eqsin.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for asw1-eqsin.mgmt:22
- alertname: ManagementSSHDown
- instance: asw1-eqsin.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: 604
- severity: task
- site: eqsin
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at cr2-eqsin.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for cr2-eqsin.mgmt:22
- alertname: ManagementSSHDown
- instance: cr2-eqsin.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: 604
- severity: task
- site: eqsin
- source: prometheus
- team: dcops
- Source