- modules/profile/manifests/etherpad.pp: monitoring::service { 'etherpad-lite-http':
- modules/profile/manifests/etherpad.pp: nrpe::monitor_service { 'etherpad-lite-proc':
- modules/profile/manifests/releases/common.pp: monitoring::service { 'https_releases':
- modules/profile/manifests/releases/mediawiki.pp: monitoring::service { 'http_releases_jenkins':
- modules/profile/manifests/gerrit/proxy.pp: monitoring::service { 'https':
- modules/profile/manifests/microsites/peopleweb.pp: monitoring::service { 'https-peopleweb':
- modules/profile/manifests/microsites/peopleweb.pp: monitoring::service { 'https-peopleweb-expiry':
- modules/profile/manifests/microsites/static_codereview.pp: monitoring::service { 'static-codereview-http':
- modules/profile/manifests/microsites/static_rt.pp: monitoring::service { 'static-rt-https':
- modules/profile/manifests/vrts.pp: monitoring::service { 'smtp':
- modules/profile/manifests/phabricator/main.pp: monitoring::service { 'smtp':
- modules/profile/manifests/gerrit.pp: monitoring::service { 'gerrit_ssh':
Description
Details
Related Objects
- Mentioned In
- T329587: http::blackbox monitoring for all https services (let all serviceops-collab alertmanager alerts create tickets)
T334250: sre-collab/releng: convert or remove all nrpe::monitor_service checks
T316022: Clean up check_ssl checks from puppet also covered by blackbox prober - Mentioned Here
- T334250: sre-collab/releng: convert or remove all nrpe::monitor_service checks
T329587: http::blackbox monitoring for all https services (let all serviceops-collab alertmanager alerts create tickets)
Event Timeline
The point is to go through Icinga, look at all sre-collab owned hosts and services and identify if there are any custom checks that are NOT base checks that every host has, like disk space, CPU etc.. and that have NOT already been replaced by recently added blackbox::http checks.
If there are none.. this is done..
If there are some that have been replaced by blackbox checks.. remove them.
If there is anything else special.. ask how they can be converted to alertmanager.
(sprint week related)
One found and replaced with prometheus alert. This was for https://static-codereview.wikimedia.org
Icinga alerts to convert:
convert to blackbox::http check for collab team:
modules/profile/manifests/etherpad.pp: monitoring::service { 'etherpad-lite-http':
modules/profile/manifests/releases/common.pp: monitoring::service { 'https_releases':
modules/profile/manifests/releases/mediawiki.pp: monitoring::service { 'http_releases_jenkins':
modules/profile/manifests/gerrit/proxy.pp: monitoring::service { 'https':
modules/profile/manifests/microsites/peopleweb.pp: monitoring::service { 'https-peopleweb':
modules/profile/manifests/microsites/peopleweb.pp: monitoring::service { 'https-peopleweb-expiry':
modules/profile/manifests/microsites/static_codereview.pp: monitoring::service { 'static-codereview-http':
modules/profile/manifests/microsites/static_rt.pp: monitoring::service { 'static-rt-https':
serviceops:
modules/noc/manifests/init.pp: monitoring::service { 'https-noc':
modules/noc/manifests/init.pp: monitoring::service { 'https-noc-ssl-expiry':
not sure yet how to replace:
modules/profile/manifests/vrts.pp: monitoring::service { 'smtp':
modules/profile/manifests/phabricator/main.pp: monitoring::service { 'smtp':
modules/profile/manifests/gerrit.pp: monitoring::service { 'gerrit_ssh':
Change 902783 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] etherpad: replace Icinga with Prometheus monitoring
Change 902785 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] releases: remove Icinga monitoring
Change 902788 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] releases-jenkins: replace Icinga with Prometheus monitoring
Change 902799 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] gerrit: replace Icinga with Prometheus monitoring
Change 902801 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] peopleweb: replace Icinga with Prometheus monitoring
Change 902802 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] miscweb/static_rt: replace Icinga with Prometheus monitoring
Change 902802 merged by Dzahn:
[operations/puppet@production] miscweb/static_rt: replace Icinga with Prometheus monitoring
Change 903318 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] alertmanager: send sre-collab alerts to -operations and -sre-collab
Change 903318 merged by Dzahn:
[operations/puppet@production] alertmanager: send sre-collab alerts to -operations and -sre-collab
Change 902801 merged by Dzahn:
[operations/puppet@production] peopleweb: replace Icinga with Prometheus monitoring
Change 902785 merged by Dzahn:
[operations/puppet@production] releases: remove Icinga monitoring
Change 902783 merged by Dzahn:
[operations/puppet@production] etherpad: replace Icinga with Prometheus monitoring
Change 902788 merged by Dzahn:
[operations/puppet@production] releases-jenkins: replace Icinga with Prometheus monitoring
Change 903801 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] noc: replace Icinga with Prometheus monitoring
Change 903805 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] vrts: replace Icinga with Prometheus for SMTP monitoring
Change 903826 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] phabricator: replace Icinga with Prometheus for SMTP monitoring
Change 904173 had a related patch set uploaded (by Jelto; author: Jelto):
[operations/puppet@production] releases: rename new blackbox check for jenkins login page
Change 904173 merged by Dzahn:
[operations/puppet@production] releases: rename new blackbox check for jenkins login page
Change 903826 merged by Dzahn:
[operations/puppet@production] phabricator: replace Icinga with Prometheus for SMTP monitoring
Change 903801 merged by Dzahn:
[operations/puppet@production] noc: replace Icinga with Prometheus monitoring
Change 903805 merged by Dzahn:
[operations/puppet@production] vrts: replace Icinga with Prometheus for SMTP monitoring
Change 904856 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] etherpad: remove process monitoring
Change 904857 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] gerrit: replace Icinga monitoring with Prometheus, ssh port 29418
Change 905178 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):
[operations/puppet@production] noc: Fix alertmanager severity
Change 905178 merged by Clément Goubert:
[operations/puppet@production] noc: Fix alertmanager severity
Change 904856 merged by Dzahn:
[operations/puppet@production] etherpad: remove process monitoring
gerrit monitoring switch still in discussion/review but will be done as part of T329587
This is done but there is a continuation of it for a different class of monitoring checks. T334250
Change 904857 merged by Dzahn:
[operations/puppet@production] gerrit: replace Icinga monitoring with Prometheus, ssh port 29418
Change 902799 merged by Dzahn:
[operations/puppet@production] gerrit: add Prometheus blackbox https monitoring
Change 913262 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] gerrit: follow_redirects in blackbox::http monitoring
Change 913262 merged by Dzahn:
[operations/puppet@production] gerrit: follow_redirects in blackbox::http monitoring
Change 913272 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] gerrit: accept http status 404 in blackbox http monitor, for now
Change 913272 merged by Dzahn:
[operations/puppet@production] gerrit: accept http status 404 in blackbox http monitor, for now
Change 913273 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] gerrit: accept 200 in addition to 302 and 404 in monitoring
Change 913273 merged by Dzahn:
[operations/puppet@production] gerrit: accept 200 in addition to 302 and 404 in monitoring
Change 913275 had a related patch set uploaded (by Dzahn; author: Dzahn):
[operations/puppet@production] gerrit: do not monitor the replica
Change 913275 merged by Dzahn:
[operations/puppet@production] gerrit: do not monitor the replica