Page MenuHomePhabricator

ManagementSSHDown
Closed, ResolvedPublic

Description

Common information

  • alertname: ManagementSSHDown
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • severity: task
  • site: ulsfo
  • source: prometheus
  • team: dcops

Firing alerts


  • dashboard: TODO
  • description: The management interface at asw2-ulsfo.mgmt:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for asw2-ulsfo.mgmt:22
  • alertname: ManagementSSHDown
  • instance: asw2-ulsfo.mgmt:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: 103.02.22
  • severity: task
  • site: ulsfo
  • source: prometheus
  • team: dcops
  • Source

  • dashboard: TODO
  • description: The management interface at cr3-ulsfo.mgmt:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for cr3-ulsfo.mgmt:22
  • alertname: ManagementSSHDown
  • instance: cr3-ulsfo.mgmt:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: 103.02.22
  • severity: task
  • site: ulsfo
  • source: prometheus
  • team: dcops
  • Source

  • dashboard: TODO
  • description: The management interface at cr4-ulsfo.mgmt:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for cr4-ulsfo.mgmt:22
  • alertname: ManagementSSHDown
  • instance: cr4-ulsfo.mgmt:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: 103.02.23
  • severity: task
  • site: ulsfo
  • source: prometheus
  • team: dcops
  • Source

Event Timeline