Page MenuHomePhabricator

Create a visual representation of where each service is active from, any given time
Open, Needs TriagePublic

Description

During a recent incident, one of the issues that came up (though not an old issue), was that we had no immediate visibility of where each of our services in discovery are served from. That information is easily obtainable via confctl, when everything is onfire, a human consuming the following information is not easy:

{"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=eventgate-analytics-external"}
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=kartotherian"}
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=parsoid-php"}
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=mw-web"}
{"codfw": {"pooled": true, "references": [], "ttl": 10}, "tags": "dnsdisc=api-ro"}
{"codfw": {"pooled": true, "references": [], "ttl": 10}, "tags": "dnsdisc=appservers-ro"}
{"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=echostore"}
{"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=mw-api-ext"}
{"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=eventstreams-internal"}
{"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=recommendation-api"}

<snip>

Moreover, the above data does not allow us to know if a service is active-active or active passive.

One solution to this could be polling confd for that information and representing them on grafana,

Event Timeline

Change 886069 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] configmaster: Remove disc_desired_state.py

https://gerrit.wikimedia.org/r/886069

Change 886069 merged by Clément Goubert:

[operations/puppet@production] configmaster: Remove disc_desired_state.py

https://gerrit.wikimedia.org/r/886069

Change 886839 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] configmaster: Cleanup disc_desired_state

https://gerrit.wikimedia.org/r/886839

Just to add to the available options, listing the services, their A/A A/P status and in which DCs they are pooled is also easily achievable with a cookbook using https://doc.wikimedia.org/spicerack/master/api/index.html#spicerack.Spicerack.service_catalog

Change 886839 merged by Clément Goubert:

[operations/puppet@production] configmaster: Cleanup disc_desired_state

https://gerrit.wikimedia.org/r/886839

Removed references to disc_desired_state from wikitech LVS and SwitchDC docs