This is a ticket that was automatically created by a new type of monitoring.
We got these first alerts while still testing it. They were false positives, not actual service problems, rather bugs in the config for the monitoring check.
We are still working on that and have agreed to deactivate it for the inspiration week. It will be re-enabled and used for production thereafter.
The part that this ticket was auto-created is also the nice part about it as it proofs that part is working.
Now reusing the ticket as a more general monitoring for VRTS ticket to finish that.
Original automatic ticket text is below:
Common information
- dashboard: https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All
- runbook: https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown
- alertname: ProbeDown
- instance: otrs1001:1443
- job: probes/custom
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- team: serviceops-collab
Firing alerts
- dashboard: https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All
- description: otrs1001:1443 failed when probed by http_ticket_wikimedia_org_ip4 from eqiad. Availability is 0%.
- logs: https://logstash.wikimedia.org/app/dashboards#/view/f3e709c0-a5f8-11ec-bf8e-43f1807d5bc2?_g=(filters:!((query:(match_phrase:(service.name:http_ticket_wikimedia_org_ip4)))))
- runbook: https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown
- summary: Service otrs1001:1443 has failed probes (http_ticket_wikimedia_org_ip4)
- address: 10.64.16.39
- alertname: ProbeDown
- family: ip4
- instance: otrs1001:1443
- job: probes/custom
- module: http_ticket_wikimedia_org_ip4
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- team: serviceops-collab
- Source
- dashboard: https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All
- description: otrs1001:1443 failed when probed by http_ticket_wikimedia_org_ip6 from eqiad. Availability is 0%.
- logs: https://logstash.wikimedia.org/app/dashboards#/view/f3e709c0-a5f8-11ec-bf8e-43f1807d5bc2?_g=(filters:!((query:(match_phrase:(service.name:http_ticket_wikimedia_org_ip6)))))
- runbook: https://wikitech.wikimedia.org/wiki/Network_monitoring#ProbeDown
- summary: Service otrs1001:1443 has failed probes (http_ticket_wikimedia_org_ip6)
- address: 2620:0:861:102:10:64:16:39
- alertname: ProbeDown
- family: ip6
- instance: otrs1001:1443
- job: probes/custom
- module: http_ticket_wikimedia_org_ip6
- prometheus: ops
- severity: critical
- site: eqiad
- source: prometheus
- team: serviceops-collab
- Source