Page MenuHomePhabricator

Configure prometheus monitoring for Ceph
Closed, ResolvedPublic

Description

The Ceph manager has a plugin that exposes metrics to prometheus. https://docs.ceph.com/docs/master/mgr/prometheus/

Ceph manager is configured to run on all 3 monitor hosts (cloudcephmon100[1-3]). However, it's only active on one host at any given time. We'll need to configure a load balancer with all 3 backend cloudcephmon100[1-3] hosts on TCP port 9283. The current version of Ceph allows you to add all manager hosts to the prometheus scrape target.

  • Enable the prometheus plugin
  • Build / Find grafana dashboard
  • Identify metrics for alerting

Details

Related Gerrit Patches:
operations/puppet : productionceph: add wmcs prometheus scrape config
operations/puppet : productionceph: add prometheus scrape config
operations/puppet : productionlvs: update cloudceph proxy check url
operations/puppet : productionceph: add prometheus servers to ferm rules
operations/puppet : productionceph: update ferm for prometheus exporter
operations/puppet : productionceph: Update ceph role desc and lvs pool map
operations/puppet : productionlvs ceph: add cloudceph service and cluster
operations/puppet : productionceph: allow lvs traffic to manager exporter
operations/dns : masteradd forward and reverse for cloudceph.svc.eqiad.wmnet

Event Timeline

JHedden created this task.Dec 13 2019, 8:11 PM

grafana dashboards that work with the ceph prometheus plugin can be found at https://github.com/ceph/ceph/tree/master/monitoring/grafana

JHedden updated the task description. (Show Details)Dec 17 2019, 10:15 PM
JHedden updated the task description. (Show Details)Dec 17 2019, 10:18 PM

Change 558707 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/dns@master] add forward and reverse for cloudcephmgr.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/558707

Change 559110 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] lvs ceph: add cloudceph service and cluster

https://gerrit.wikimedia.org/r/559110

Change 558707 merged by Jhedden:
[operations/dns@master] add forward and reverse for cloudceph.svc.eqiad.wmnet

https://gerrit.wikimedia.org/r/558707

Change 560410 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: allow lvs traffic to manager exporter

https://gerrit.wikimedia.org/r/560410

Change 560410 merged by Jhedden:
[operations/puppet@production] ceph: allow lvs traffic to manager exporter

https://gerrit.wikimedia.org/r/560410

Change 559110 merged by Jhedden:
[operations/puppet@production] lvs ceph: add cloudceph service and cluster

https://gerrit.wikimedia.org/r/559110

Mentioned in SAL (#wikimedia-operations) [2020-01-07T17:13:22Z] <vgutierrez> restarting pybal on lvs1016 - T240715

Mentioned in SAL (#wikimedia-operations) [2020-01-07T17:18:04Z] <vgutierrez> restarting pybal on lvs1015 - T240715

Change 562574 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: Update ceph role system::role name

https://gerrit.wikimedia.org/r/562574

Change 562574 merged by Jhedden:
[operations/puppet@production] ceph: Update ceph role desc and lvs pool map

https://gerrit.wikimedia.org/r/562574

Change 562610 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: update ferm for prometheus exporter

https://gerrit.wikimedia.org/r/562610

Change 562610 merged by Jhedden:
[operations/puppet@production] ceph: update ferm for prometheus exporter

https://gerrit.wikimedia.org/r/562610

Change 562616 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: add prometheus servers to ferm rules

https://gerrit.wikimedia.org/r/562616

Change 562616 merged by Jhedden:
[operations/puppet@production] ceph: add prometheus servers to ferm rules

https://gerrit.wikimedia.org/r/562616

Change 562637 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] lvs: update cloudceph proxy check url

https://gerrit.wikimedia.org/r/562637

Change 562979 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: add prometheus scrape config

https://gerrit.wikimedia.org/r/562979

Change 562637 abandoned by Jhedden:
lvs: update cloudceph proxy check url

https://gerrit.wikimedia.org/r/562637

JHedden updated the task description. (Show Details)Wed, Jan 8, 10:36 PM
JHedden updated the task description. (Show Details)Wed, Jan 8, 10:39 PM

Change 562979 abandoned by Jhedden:
ceph: add prometheus scrape config

https://gerrit.wikimedia.org/r/562979

Change 563190 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] ceph: add wmcs prometheus scrape config

https://gerrit.wikimedia.org/r/563190

Change 563190 merged by Jhedden:
[operations/puppet@production] ceph: add wmcs prometheus scrape config

https://gerrit.wikimedia.org/r/563190

JHedden updated the task description. (Show Details)Thu, Jan 9, 11:16 PM
JHedden closed this task as Resolved.Wed, Jan 15, 3:57 PM
JHedden triaged this task as Medium priority.
JHedden updated the task description. (Show Details)