DNS: dynamically generate entries for service discovery
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Volans
	Jan 24 2017, 5:16 AM

Description

As stated in the parent task (T149617) we're going to automatically generate the DNS configuration for the configured services so that they can be discovered querying the DNS directly. The proposed solution follows.

When querying services that are present in multiple datacenters, the response will be generated according to the following schema:

Service capability	Query	Default Response	Response if default is DOWN
active/active	RO endpoint	Local DC endpoint IP	Remote DC endpoint IP
active/active	RW endpoint	Local DC endpoint IP	Remote DC endpoint IP
active/passive	RO endpoint	Local DC endpoint IP	Remote DC endpoint IP
active/passive	RW endpoint	Active (from etcd) DC endpoint IP	Local failover IP

The configuration of the DNS will be generated with all the endpoints, and in the case of an active/passive service, when querying the RW endpoint, only the active one will be in an UP state and the other(s) will be in a DOWN state.

The mechanism to update the state file monitored by gdnsd must ensure that at any given time only one endpoint is UP if the service capability is active/passive.

As a failover mechanism in case no endpoint will be available, a valid IP will be returned in any case, to avoid issues with clients not behaving correctly if no DNS answer is returned and the DNS negative cache.
The failover IP will just respond 503s to any request and there will be one for each DC, in order to always respond with the local one.

Details

Subject	Repo	Branch	Lines +/-
restbase: use the dns discovery host for citoid	operations/puppet	production	+1 -1
add first discovery records + mock lint data	operations/dns	master	+18 -0
linting: remove config-geo-test	operations/dns	master	+0 -10
authdns lint support for full puppetized config	operations/puppet	production	+160 -112
DNS: service discovery	operations/puppet	production	+144 -2
geo config structure changes for discovery	operations/dns	master	+276 -272
authdns: re-structure prep for discovery	operations/puppet	production	+69 -29

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Qgil	T153007 Technical Collaboration annual plan FY2017-18
Resolved	Qgil	T159313 Draft WMF annual plan program about technical events
Resolved	Qgil	T149300 Future of the Wikimedia Developer Summit
Resolved	• Rfarrand	T153996 Wikimedia Developer Summit 2017: Feedback Survey
Resolved	• Rfarrand	T141926 Wikimedia Developer Summit 2017
Resolved	Qgil	T141938 Prepare a program for Wikimedia Developer Summit 2017 to effectively address current high level movement needs
Resolved	greg	T147937 Facilitate Wikidev'17 main topic "How to manage our technical debt"
Resolved	Joe	T154658 Prepare and improve the datacenter switchover procedure
Resolved	Joe	T149617 Integrating MediaWiki (and other services) with dynamic configuration
Resolved	None	T156100 DNS: dynamically generate entries for service discovery
Resolved	Volans	T160178 MediaWiki Datacenter Switchover automation
Resolved	jcrespo	T161007 Decouple Mariadb semi-sync replication from $::mw_primary
Resolved	Volans	T160994 Create the failoid service as fallback for the DNS discovery

Event Timeline

Volans created this task.Jan 24 2017, 5:16 AM

Change 331789 had a related patch set uploaded (by Volans):
[WIP] DNS: service discovery

https://gerrit.wikimedia.org/r/331789

gerritbot added a project: Patch-For-Review.Jan 24 2017, 6:20 AM

Krinkle moved this task from Inbox, needs triage to Radar on the Performance-Team board.Jan 26 2017, 10:28 PM

We should probably divorce the RO/RW distinction from the core design here. Not all services will have an RW/RO distinction (I would expect most not to), and those will be things we try to eliminate with better (active/active) design over time. if a specific services needs a split into "active/passive RW + active/active RO", we can solve that by calling it two separate services at this level: foo-rw and foo-ro, with different active/passive rules and distinct failover.

if specific services needs a split into "active/passive RW + active/active RO", we can solve that by calling it two separate services at this level: foo-rw and foo-ro, with different active/passive rules and distinct failover.

+1 to using names throughout. Plain A/CNAME lookups are a lot easier to use, and encoding the ro/rw bit in the name calls out the distinction (where needed) clearly.

Change 340154 had a related patch set uploaded (by BBlack; owner: BBlack):
geo config structure changes for svc discovery

https://gerrit.wikimedia.org/r/340154

Change 340156 had a related patch set uploaded (by BBlack; owner: BBlack):
authdns: re-structure prep for discovery

https://gerrit.wikimedia.org/r/340156

Addshore unsubscribed.Feb 28 2017, 5:53 PM

Krinkle unsubscribed.Feb 28 2017, 8:24 PM

Change 340156 merged by BBlack:
authdns: re-structure prep for discovery

https://gerrit.wikimedia.org/r/340156

Change 340154 merged by BBlack:
geo config structure changes for discovery

https://gerrit.wikimedia.org/r/340154

Change 331789 merged by BBlack:
[operations/puppet] DNS: service discovery

https://gerrit.wikimedia.org/r/331789

Change 341564 had a related patch set uploaded (by bblack):
[operations/puppet] authdns lint support for full puppetized config

https://gerrit.wikimedia.org/r/341564

Change 341573 had a related patch set uploaded (by bblack):
[operations/dns] linting: remove config-geo-test

https://gerrit.wikimedia.org/r/341573

Change 341574 had a related patch set uploaded (by bblack):
[operations/dns] add first discovery records

https://gerrit.wikimedia.org/r/341574

Change 341564 merged by BBlack:
[operations/puppet] authdns lint support for full puppetized config

https://gerrit.wikimedia.org/r/341564

Joe mentioned this in T160178: MediaWiki Datacenter Switchover automation.Mar 10 2017, 3:33 PM

Joe added a subtask: T160178: MediaWiki Datacenter Switchover automation.

Change 341573 abandoned by BBlack:
linting: remove config-geo-test

https://gerrit.wikimedia.org/r/341573

Change 341574 merged by BBlack:
[operations/dns] add first discovery records mock lint data

https://gerrit.wikimedia.org/r/341574

Volans created subtask T160994: Create the failoid service as fallback for the DNS discovery.Mar 21 2017, 1:52 PM

Change 343926 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/puppet] restbase: use the dns discovery host for citoid

https://gerrit.wikimedia.org/r/343926

Change 343926 merged by Giuseppe Lavagetto:
[operations/puppet] restbase: use the dns discovery host for citoid

https://gerrit.wikimedia.org/r/343926

Volans closed subtask T160994: Create the failoid service as fallback for the DNS discovery as Resolved.Mar 23 2017, 2:05 PM

Joe closed this task as Resolved.Apr 3 2017, 6:38 AM

• mmodell awarded a token.Apr 4 2017, 2:27 AM

Volans closed subtask T160178: MediaWiki Datacenter Switchover automation as Resolved.May 3 2017, 5:20 PM

Krinkle edited projects, added Multiple-active-datacenters archived; removed Wikimedia-Multiple-active-datacenters.May 3 2017, 7:46 PM

Krinkle edited projects, added Sustainability (MediaWiki-MultiDC); removed Multiple-active-datacenters archived.May 3 2017, 7:58 PM

DNS: dynamically generate entries for service discoveryClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

DNS: dynamically generate entries for service discovery
Closed, ResolvedPublic
Actions

Related Objects
Search...