Page MenuHomePhabricator

Migrate cloudmetrics workload from cloudmetrics100[1-2] to cloudmetrics100[3-4]
Closed, ResolvedPublic

Description

Ideally we would not erase history when doing this.

  1. What/where are the historic metrics that we should preserve?

Probably just Carbon and Prometheus. Carbon is already rsyncing; Prometheus needs some by-hand work:

https://wikitech-static.wikimedia.org/wiki/Prometheus#Sync_data_from_an_existing_Prometheus_host

  1. What/where are the UI endpoints that we should check to confirm proper function of this service?

https://graphite-labs.wikimedia.org/ and https://grafana-labs.wikimedia.org/?orgId=1

Event Timeline

Andrew updated the task description. (Show Details)

Change 747174 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/dns@master] Replace cloudmetrics1001 with cloudmetrics1003

https://gerrit.wikimedia.org/r/747174

Change 747174 merged by Andrew Bogott:

[operations/dns@master] Replace cloudmetrics1001 with cloudmetrics1003

https://gerrit.wikimedia.org/r/747174

Mentioned in SAL (#wikimedia-operations) [2022-11-03T15:17:51Z] <Emperor> comment out www-data crontab on cloudmetrics100{1,2} T297712

Andrew claimed this task.

Unfortunately there are a bunch of direct hits to these machine names, rather than point to prometheus-labmon.eqiad.wmnet as I suppose we're meant to?

https://codesearch-beta.wmcloud.org/search/?q=cloudmetrics100%281%7C2%29&files=&excludeFiles=&repos=

Change 857763 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [Beta Cluster] Point statsd service to prometheus-labmon, cloudmetrics1001 decom'ed

https://gerrit.wikimedia.org/r/857763

Change 857765 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] changeprop: Point Beta Cluster metrics to prometheus-labmon, cloudmetrics1002 is gone

https://gerrit.wikimedia.org/r/857765

Change 857765 merged by Andrew Bogott:

[operations/deployment-charts@master] changeprop: Point Beta Cluster metrics to prometheus-labmon, cloudmetrics1002 is gone

https://gerrit.wikimedia.org/r/857765

Change 857763 merged by Andrew Bogott:

[operations/mediawiki-config@master] [Beta Cluster] Point statsd service to prometheus-labmon, cloudmetrics1001 decom'ed

https://gerrit.wikimedia.org/r/857763

Hm, I merged https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/857763 thinking it was a puppet patch :/ Clearly it's not a harmful patch but I'm sorry for jumping the gun w/out fully understanding the wmf-config deployment process.

Hm, I merged https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/857763 thinking it was a puppet patch :/ Clearly it's not a harmful patch but I'm sorry for jumping the gun w/out fully understanding the wmf-config deployment process.

No worries!