Page MenuHomePhabricator

Perform a statsd and Graphite switch
Closed, InvalidPublic

Description

I couldn't find a task about doing a switchover for Graphite and Statsd, nor about making them active-active in some way.

This task is intended to be about doing a switch from Eqiad to Codfw, and back, for graphite1001.eqiad and statsd.eqiad. And any preparation work to make such a switch possible or easier

Event Timeline

Change 467239 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] errorpages: Use service discovery for statsd in hhvm-fatal-error.php

https://gerrit.wikimedia.org/r/467239

fgiunchedi subscribed.

The most similar task is likely T88997: Improve graphite failover and related. As far as graphite goes sending carbon line-oriented traffic is already active-active in the sense that traffic can be sent to any graphite frontend in codfw/eqiad and it'll be mirrored to the other datacenter.

For turning statsd active/active there's the caveat of services using "global aggregation" active/active isn't possible because we rely on the aggregation provided by a single statsd daemon on statsd.eqiad.wmnet. For services not using/needed to aggregate metrics globally they can already send statsd traffic to either datacenter, though we didn't do that as having all hosts point to a single CNAME for statsd seems simpler.

Hope that helps!

jijiki triaged this task as Medium priority.Oct 23 2018, 2:57 PM

Change 467239 merged by jenkins-bot:
[operations/mediawiki-config@master] errorpages: Use service discovery for statsd in hhvm-fatal-error.php

https://gerrit.wikimedia.org/r/467239

Resolving in favor of T88997: Improve graphite failover though please reopen if needed!