Page MenuHomePhabricator

Configure prometheus monitoring for Ceph
Closed, ResolvedPublic

Description

The Ceph manager has a plugin that exposes metrics to prometheus. https://docs.ceph.com/docs/master/mgr/prometheus/

Ceph manager is configured to run on all 3 monitor hosts (cloudcephmon100[1-3]). However, it's only active on one host at any given time. We'll need to configure a load balancer with all 3 backend cloudcephmon100[1-3] hosts on TCP port 9283. The current version of Ceph allows you to add all manager hosts to the prometheus scrape target.

  • Enable the prometheus plugin
  • Build / Find grafana dashboard
  • Identify metrics for alerting

Event Timeline

grafana dashboards that work with the ceph prometheus plugin can be found at https://github.com/ceph/ceph/tree/master/monitoring/grafana

Change 558707 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/dns@master] add forward and reverse for cloudcephmgr.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/558707

Change 559110 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] lvs ceph: add cloudceph service and cluster

https://gerrit.wikimedia.org/r/559110

Change 558707 merged by Jhedden:
[operations/dns@master] add forward and reverse for cloudceph.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/558707

Change 560410 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: allow lvs traffic to manager exporter

https://gerrit.wikimedia.org/r/560410

Change 560410 merged by Jhedden:
[operations/puppet@production] ceph: allow lvs traffic to manager exporter

https://gerrit.wikimedia.org/r/560410

Change 559110 merged by Jhedden:
[operations/puppet@production] lvs ceph: add cloudceph service and cluster

https://gerrit.wikimedia.org/r/559110

Mentioned in SAL (#wikimedia-operations) [2020-01-07T17:13:22Z] <vgutierrez> restarting pybal on lvs1016 - T240715

Mentioned in SAL (#wikimedia-operations) [2020-01-07T17:18:04Z] <vgutierrez> restarting pybal on lvs1015 - T240715

Change 562574 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: Update ceph role system::role name

https://gerrit.wikimedia.org/r/562574

Change 562574 merged by Jhedden:
[operations/puppet@production] ceph: Update ceph role desc and lvs pool map

https://gerrit.wikimedia.org/r/562574

Change 562610 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: update ferm for prometheus exporter

https://gerrit.wikimedia.org/r/562610

Change 562610 merged by Jhedden:
[operations/puppet@production] ceph: update ferm for prometheus exporter

https://gerrit.wikimedia.org/r/562610

Change 562616 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: add prometheus servers to ferm rules

https://gerrit.wikimedia.org/r/562616

Change 562616 merged by Jhedden:
[operations/puppet@production] ceph: add prometheus servers to ferm rules

https://gerrit.wikimedia.org/r/562616

Change 562637 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] lvs: update cloudceph proxy check url

https://gerrit.wikimedia.org/r/562637

Change 562979 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: add prometheus scrape config

https://gerrit.wikimedia.org/r/562979

Change 562637 abandoned by Jhedden:
lvs: update cloudceph proxy check url

https://gerrit.wikimedia.org/r/562637

Change 562979 abandoned by Jhedden:
ceph: add prometheus scrape config

https://gerrit.wikimedia.org/r/562979

Change 563190 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: add wmcs prometheus scrape config

https://gerrit.wikimedia.org/r/563190

Change 563190 merged by Jhedden:
[operations/puppet@production] ceph: add wmcs prometheus scrape config

https://gerrit.wikimedia.org/r/563190

JHedden triaged this task as Medium priority.
JHedden updated the task description. (Show Details)

Change 572355 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] icinga: add cloudvps ceph alerts

https://gerrit.wikimedia.org/r/572355

Change 572355 merged by Dzahn:
[operations/puppet@production] icinga: add cloudvps ceph alerts

https://gerrit.wikimedia.org/r/572355