Common information
- dashboard: TODO
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- alertname: ManagementSSHDown
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- severity: task
- site: ulsfo
- source: prometheus
- team: dcops
Firing alerts
- dashboard: TODO
- description: The management interface at asw2-ulsfo.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for asw2-ulsfo.mgmt:22
- alertname: ManagementSSHDown
- instance: asw2-ulsfo.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: 103.02.22
- severity: task
- site: ulsfo
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at cr3-ulsfo.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for cr3-ulsfo.mgmt:22
- alertname: ManagementSSHDown
- instance: cr3-ulsfo.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: 103.02.22
- severity: task
- site: ulsfo
- source: prometheus
- team: dcops
- Source
- dashboard: TODO
- description: The management interface at cr4-ulsfo.mgmt:22 has been unresponsive for multiple hours.
- runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
- summary: Unresponsive management for cr4-ulsfo.mgmt:22
- alertname: ManagementSSHDown
- instance: cr4-ulsfo.mgmt:22
- job: probes/mgmt
- module: ssh_banner
- prometheus: ops
- rack: 103.02.23
- severity: task
- site: ulsfo
- source: prometheus
- team: dcops
- Source