Page MenuHomePhabricator

Grafana "cloud-vps-project-board" needs to be migrated from Graphite to Prometheus
Closed, ResolvedPublic

Description

The dashboard at https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board is currently the most comprehensive way to look at the system health across a Cloud VPS project. This dashboard is using the legacy Graphite data collected by Diamond rather than the newer Prometheus data collection system.

T210993: Deprecate Diamond collectors in Cloud VPS has been working towards the removal of Diamond collectors for quite some time and https://gerrit.wikimedia.org/r/c/operations/puppet/+/632471 is currently proposed to complete the removal. When this happens the related Graphite data will stop being collected and the dashboard will become stale.

Related Objects

StatusSubtypeAssignedTask
Resolvedfgiunchedi
Resolvedcolewhite
ResolvedMoritzMuehlenhoff
Resolvedtaavi
Resolvedtaavi
Resolveddcaro
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolved JHedden
Resolved JHedden
Resolved Bstorm
Resolvedbd808
ResolvedAndrew
DeclinedNone
Resolved nskaggs
Resolvedtaavi
Resolvedjbond
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolvedtaavi
Resolveddcaro
ResolvedAndrew

Event Timeline

The code at https://gerrit.wikimedia.org/r/c/operations/puppet/+/632570/3/modules/profile/manifests/wmcs/instance.pp talks about checking "puppet freshness".

The dashboard linked here https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId=1 has some "Puppet agent" but only 2 and they have no data?

Is that the same thing or unrelated?

Because if those puppet freshness checks don't exist or are not used we could maybe still remove that part from instances.

Is that the same thing or unrelated?

See T266050: Build Prometheus service for use by all Cloud VPS projects and their instances. TL;DR we need to build a Prometheus cluster that can scrape all ~700 Cloud VPS instances to move the puppet freshness dashboard to Prometheus backed data.

Andrew triaged this task as Medium priority.Jan 12 2021, 5:08 PM
taavi changed the task status from Stalled to Open.Dec 24 2022, 12:22 PM