Page MenuHomePhabricator

ManagementSSHDown
Closed, ResolvedPublic

Description

Common information

  • alertname: ManagementSSHDown
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • severity: task
  • site: esams
  • source: prometheus
  • team: dcops

Firing alerts


  • dashboard: TODO
  • description: The management interface at re0.cr2-esams:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for re0.cr2-esams:22
  • alertname: ManagementSSHDown
  • instance: re0.cr2-esams:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: OE14
  • severity: task
  • site: esams
  • source: prometheus
  • team: dcops
  • Source

  • dashboard: TODO
  • description: The management interface at re1.cr2-esams:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for re1.cr2-esams:22
  • alertname: ManagementSSHDown
  • instance: re1.cr2-esams:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: OE14
  • severity: task
  • site: esams
  • source: prometheus
  • team: dcops
  • Source

  • dashboard: TODO
  • description: The management interface at asw2-esams.mgmt:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for asw2-esams.mgmt:22
  • alertname: ManagementSSHDown
  • instance: asw2-esams.mgmt:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: OE16
  • severity: task
  • site: esams
  • source: prometheus
  • team: dcops
  • Source

  • dashboard: TODO
  • description: The management interface at re0.cr3-esams:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for re0.cr3-esams:22
  • alertname: ManagementSSHDown
  • instance: re0.cr3-esams:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: OE16
  • severity: task
  • site: esams
  • source: prometheus
  • team: dcops
  • Source

  • dashboard: TODO
  • description: The management interface at re1.cr3-esams:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
  • summary: Unresponsive management for re1.cr3-esams:22
  • alertname: ManagementSSHDown
  • instance: re1.cr3-esams:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: OE16
  • severity: task
  • site: esams
  • source: prometheus
  • team: dcops
  • Source