Page MenuHomePhabricator

Clean up ORES metrics
Closed, ResolvedPublic

Description

The initial rollout of the statsd exporter configuration made a lot of metric noise. It needs cleaned up.

Related Objects

Event Timeline

@fgiunchedi, would you mind having a quick look at P9701? I'd like to run it on production.

@fgiunchedi, would you mind having a quick look at P9701? I'd like to run it on production.

LGTM! Thanks for taking care of this

We'll also need to temporarily pass --web.enable-admin-api to prometheus for DELETE to work. Mentioning all metrics explicitly works, though you can use a regular expression to match on metric name, I'm assuming ores_<hostname> is the offender in this case, thus {__name__=~"^ores_ores[12].*"} should do it as a query for DELETE.

Mentioned in SAL (#wikimedia-operations) [2019-11-22T03:49:27Z] <shdubsh> restart prometheus@ops on prometheus1003 T238807

Initial clean up is done. Last thing is to clean the tombstones.

Mentioned in SAL (#wikimedia-operations) [2019-11-22T16:22:06Z] <shdubsh> clean tombstones on prometheus1003 - T238807

Mentioned in SAL (#wikimedia-operations) [2019-11-22T17:09:13Z] <shdubsh> restart prometheus on prometheus1004 - T238807

Mentioned in SAL (#wikimedia-operations) [2019-11-22T17:30:33Z] <shdubsh> clean tombstones on prometheus1004 - T238807

Mentioned in SAL (#wikimedia-operations) [2019-11-22T18:02:57Z] <shdubsh> restore prometheus services default settings - T238807

needs to be done in codfw as well

Mentioned in SAL (#wikimedia-operations) [2019-12-09T21:48:52Z] <shdubsh> restart prometheus on prometheus2003 -- T238807

Mentioned in SAL (#wikimedia-operations) [2019-12-09T22:54:40Z] <shdubsh> restart prometheus on prometheus2004 -- T238807