Common information
- alertname: SystemdUnitDown
- cluster: wmcs
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
Firing alerts
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephmon1001
- description: Unit prometheus-node-kernel-panic.service on node cloudcephmon1001 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1001 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephmon1001:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephmon1002
- description: Unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1002 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephmon1002:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephmon1003
- description: Unit prometheus-node-kernel-panic.service on node cloudcephmon1003 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephmon1003 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephmon1003:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1012
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1012 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1012 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1012:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1013
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1013 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1013 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1013:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1014
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1014 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1014 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1014:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1015
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1015 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1015 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1015:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1016
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1016 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1016 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1016:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1017
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1017 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1017 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1017:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1018
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1018 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1018 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1018:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1019
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1019 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1019 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1019:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1020
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1020 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1020 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1020:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1022
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1022 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1022 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1022:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1023
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1023 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1023 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1023:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source
- dashboard: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcephosd1024
- description: Unit prometheus-node-kernel-panic.service on node cloudcephosd1024 has been down for long.
- runbook: https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown
- summary: The systemd unit prometheus-node-kernel-panic.service on node cloudcephosd1024 has been failing for more than two hours.
- alertname: SystemdUnitDown
- cluster: wmcs
- instance: cloudcephosd1024:9100
- job: node
- name: prometheus-node-kernel-panic.service
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- state: failed
- team: wmcs
- Source