Page MenuHomePhabricator

Move Cloud VPS control plane alerting to alertmanager
Open, Needs TriagePublic

Description

Move from Icinga to alertmanager or replace with a better solution:

  • Galera
  • HAProxy
  • API endpoint checks
  • Instance spreadcheck
  • Nova flavor property check
  • Designate alias-dump check

Event Timeline

Change 953727 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/alerts@master] team-wmcs: Add Galera checks

https://gerrit.wikimedia.org/r/953727

Change 954001 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] Move WMCS haproxy scrapes to WMCS prometheus instance

https://gerrit.wikimedia.org/r/954001

Change 954001 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] Move WMCS haproxy scrapes to WMCS prometheus instance

https://gerrit.wikimedia.org/r/954001

Change 954052 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/alerts@master] team-wmcs: Add CloudLB backend status checks

https://gerrit.wikimedia.org/r/954052

Change 954052 merged by jenkins-bot:

[operations/alerts@master] team-wmcs: Add CloudLB backend status checks

https://gerrit.wikimedia.org/r/954052

Change 954102 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] openstack: Remove a bunch of Icinga checks

https://gerrit.wikimedia.org/r/954102

Change 953727 merged by jenkins-bot:

[operations/alerts@master] team-wmcs: Add Galera checks

https://gerrit.wikimedia.org/r/953727

Change 954102 merged by Majavah:

[operations/puppet@production] openstack: Remove a bunch of Icinga checks

https://gerrit.wikimedia.org/r/954102

Change 960612 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:openstack::galera: drop nrpe process check

https://gerrit.wikimedia.org/r/960612

Change 960612 merged by Majavah:

[operations/puppet@production] P:openstack::galera: drop nrpe process check

https://gerrit.wikimedia.org/r/960612

Change 978476 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:wmcs: disable systemd icinga alerts

https://gerrit.wikimedia.org/r/978476

Change 978476 merged by Majavah:

[operations/puppet@production] P:wmcs: disable systemd icinga alerts

https://gerrit.wikimedia.org/r/978476

Change 979056 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] openstack: spreadcheck: remove in favour of server groups

https://gerrit.wikimedia.org/r/979056

Change 979056 merged by Majavah:

[operations/puppet@production] openstack: spreadcheck: remove in favour of server groups

https://gerrit.wikimedia.org/r/979056

Change 992162 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] openstack: remove spreadcheck, absented

https://gerrit.wikimedia.org/r/992162

Change 992162 merged by Filippo Giunchedi:

[operations/puppet@production] openstack: remove spreadcheck, absented

https://gerrit.wikimedia.org/r/992162