Page MenuHomePhabricator

DNS: dynamically generate entries for service discovery
Closed, ResolvedPublic

Description

As stated in the parent task (T149617) we're going to automatically generate the DNS configuration for the configured services so that they can be discovered querying the DNS directly. The proposed solution follows.

When querying services that are present in multiple datacenters, the response will be generated according to the following schema:

Service capabilityQueryDefault ResponseResponse if default is DOWN
active/activeRO endpointLocal DC endpoint IPRemote DC endpoint IP
active/activeRW endpointLocal DC endpoint IPRemote DC endpoint IP
active/passiveRO endpointLocal DC endpoint IPRemote DC endpoint IP
active/passiveRW endpointActive (from etcd) DC endpoint IPLocal failover IP

The configuration of the DNS will be generated with all the endpoints, and in the case of an active/passive service, when querying the RW endpoint, only the active one will be in an UP state and the other(s) will be in a DOWN state.

The mechanism to update the state file monitored by gdnsd must ensure that at any given time only one endpoint is UP if the service capability is active/passive.

As a failover mechanism in case no endpoint will be available, a valid IP will be returned in any case, to avoid issues with clients not behaving correctly if no DNS answer is returned and the DNS negative cache.
The failover IP will just respond 503s to any request and there will be one for each DC, in order to always respond with the local one.

Event Timeline

Change 331789 had a related patch set uploaded (by Volans):
[WIP] DNS: service discovery

https://gerrit.wikimedia.org/r/331789

We should probably divorce the RO/RW distinction from the core design here. Not all services will have an RW/RO distinction (I would expect most not to), and those will be things we try to eliminate with better (active/active) design over time. if a specific services needs a split into "active/passive RW + active/active RO", we can solve that by calling it two separate services at this level: foo-rw and foo-ro, with different active/passive rules and distinct failover.

if specific services needs a split into "active/passive RW + active/active RO", we can solve that by calling it two separate services at this level: foo-rw and foo-ro, with different active/passive rules and distinct failover.

+1 to using names throughout. Plain A/CNAME lookups are a lot easier to use, and encoding the ro/rw bit in the name calls out the distinction (where needed) clearly.

Change 340154 had a related patch set uploaded (by BBlack; owner: BBlack):
geo config structure changes for svc discovery

https://gerrit.wikimedia.org/r/340154

Change 340156 had a related patch set uploaded (by BBlack; owner: BBlack):
authdns: re-structure prep for discovery

https://gerrit.wikimedia.org/r/340156

Change 340156 merged by BBlack:
authdns: re-structure prep for discovery

https://gerrit.wikimedia.org/r/340156

Change 340154 merged by BBlack:
geo config structure changes for discovery

https://gerrit.wikimedia.org/r/340154

Change 331789 merged by BBlack:
[operations/puppet] DNS: service discovery

https://gerrit.wikimedia.org/r/331789

Change 341564 had a related patch set uploaded (by bblack):
[operations/puppet] authdns lint support for full puppetized config

https://gerrit.wikimedia.org/r/341564

Change 341573 had a related patch set uploaded (by bblack):
[operations/dns] linting: remove config-geo-test

https://gerrit.wikimedia.org/r/341573

Change 341574 had a related patch set uploaded (by bblack):
[operations/dns] add first discovery records

https://gerrit.wikimedia.org/r/341574

Change 341564 merged by BBlack:
[operations/puppet] authdns lint support for full puppetized config

https://gerrit.wikimedia.org/r/341564

Change 341573 abandoned by BBlack:
linting: remove config-geo-test

https://gerrit.wikimedia.org/r/341573

Change 341574 merged by BBlack:
[operations/dns] add first discovery records mock lint data

https://gerrit.wikimedia.org/r/341574

Change 343926 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/puppet] restbase: use the dns discovery host for citoid

https://gerrit.wikimedia.org/r/343926

Change 343926 merged by Giuseppe Lavagetto:
[operations/puppet] restbase: use the dns discovery host for citoid

https://gerrit.wikimedia.org/r/343926