Page MenuHomePhabricator

Cloud DNS: proposal for new DNS service names
Closed, ResolvedPublic

Description

I think we should rename / establish our DNS service names as follow:

DNS auth for eqiad1:

  • ns0.openstack.eqiad1.wikimediacloud.org (currently cloud-ns0.wikimedia.org which is an additional IPv4 address usually assigned to cloudservices1003.wikimedia.org)
  • ns1.openstack.eqiad1.wikimediacloud.org (currently cloud-ns1.wikimedia.org which is an additional IPv4 address usually assigned to cloudservices1004.wikimedia.org)

DNS rec for eqiad1:

  • ns-recursor0.openstack.eqiad1.wikimediacloud.org (currently cloud-recursor0.wikimedia.org which resolves to the same IPv4 assigned to cloudservices1003.wikimedia.org)
  • ns-recursor1.openstack.eqiad1.wikimediacloud.org (currently cloud-recursor1.wikimedia.org which resolves to the same IPv4 assigned to cloudservices1004.wikimedia.org)

DNS auth for codfw1dev:

  • ns0.openstack.codfw1dev.wikimediacloud.org (currently codfw1dev-ns0.wikimedia.org which resolves to the same addresses assigned to cloudservices2002-dev.wikimedia.org)
  • ns1.openstack.codfw1dev.wikimediacloud.org (new service)

DNS rec for codf1dev:

  • ns-recursor0.openstack.codfw1dev.wikimediacloud.org (currently codfw1dev-recursor0.wikimedia.org)
  • ns-recursor1.openstack.codfw1dev.wikimediacloud.org (currently codfw1dev-recursor0.wikimedia.org)

Rationale:
We have the domain wikimediacloud.org, to hold internal APIs and other openstack-core services , a name that better reflect the nature of the services being provided. Designate (the actual service doing the DNS work) is a part of openstack, so a service FQDN living in the same DNS namespace as other core openstack components has the benefit of bringing some additional amount of coherence. As can be seen in the current situation, we don't have a common way of naming things between deployments (cloud-recursor0.wikimedia.org vs codfw1dev-recursor0.wikimedia.org) so moving the service FQDNs to the new domain seems like a great opportunity to introduce such homogenization.
More info: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/DNS_domain_usage#Resolution

Proposed plan:

  1. Do all the required tests with the ns0.openstack.codfw1dev.wikimediacloud.org FQDN, which is new and doesn't exists currently (i.e, introducing it won't break anything).
  2. introduce the FQDNs into WMF DNS servers and have them properly resolve. Play with them and make sure everything makes sense.
  3. Make sure designate can work with the new names.
  4. Update delegations (NS records) to include both (old and new) service FQDNs.
  5. At this point, Designate should handle requests in both FQDNs (shouldn't matter, it is really IPv4-based anyway)
  6. introduce the FQDNs to virtual machines in eqiad1. Lets have both FQDNs (old and new) co-exists for a while to make sure we don't break anything. Be ready to roll-back in case of failures.
  7. if we are happy, cleanup etc. We probably need to rebuild base images for VMs?

Note1: Introducing an auth FQDN in codfw1dev (i.e, ns0.openstack.codfw1dev.wikimediacloud.org) should help unblock T243556: Fix internal TLD in use in codfw1dev
Note2: Since designate is serving both auth/rec queries, I don't think we really need to split between auth/rec service names, but that's probably a debate for another iteration.

Event Timeline

Change 567453 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] cloud: introduce delegation for codfw1dev.wikimedia.cloud

https://gerrit.wikimedia.org/r/567453

Change 567453 merged by Arturo Borrero Gonzalez:
[operations/dns@master] cloud: introduce delegation for codfw1dev.wikimedia.cloud

https://gerrit.wikimedia.org/r/567453

Mentioned in SAL (#wikimedia-cloud) [2020-01-28T10:03:54Z] <arturo> delegated codfw1dev.wmcloud.org to designate @ codfw1dev ns0.openstack.codfw1dev.wikimediacloud.org (T242976 and T243766)

Mentioned in SAL (#wikimedia-cloud) [2020-01-28T10:11:00Z] <arturo> [codfw1dev] root@cloudcontrol2001-dev:~# openstack zone create --description "main DNS domain for public addresses" --email "root@wmflabs.org" --type PRIMARY --ttl 3600 codfw1dev.wmcloud.org. (T242976 and T243766)

I just discovered that codfw1dev-ns0.wikimedia.org exists indeed.

Change 567986 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] wikimediacloud.org: fix address of ns0.openstack.codfw1dev.wikimediacloud.org

https://gerrit.wikimedia.org/r/567986

Change 567986 merged by Arturo Borrero Gonzalez:
[operations/dns@master] wikimediacloud.org: fix address of ns0.openstack.codfw1dev.wikimediacloud.org

https://gerrit.wikimedia.org/r/567986

Mentioned in SAL (#wikimedia-cloud) [2020-01-28T17:24:16Z] <arturo> [codfw1dev] root@cloudcontrol2001-dev:~# designate server-create --name ns0.openstack.codfw1dev.wikimediacloud.org. (T243766)

aborrero moved this task from Needs discussion to Doing on the cloud-services-team (Kanban) board.

This was accepted by the WMCS team.

Will create some patches to implement this soon.

aborrero triaged this task as Medium priority.Feb 13 2020, 11:52 AM

Change 572023 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] wikimediacloud.org: introduce new service names for DNS

https://gerrit.wikimedia.org/r/572023

Change 572023 merged by Arturo Borrero Gonzalez:
[operations/dns@master] wikimediacloud.org: introduce new service names for DNS

https://gerrit.wikimedia.org/r/572023

Mentioned in SAL (#wikimedia-cloud) [2020-02-14T10:32:22Z] <arturo> running root@cloudcontrol1004:~# designate server-create --name ns0.openstack.eqiad1.wikimediacloud.org. (T243766)

Mentioned in SAL (#wikimedia-cloud) [2020-02-14T10:32:30Z] <arturo> running root@cloudcontrol1004:~# designate server-create --name ns1.openstack.eqiad1.wikimediacloud.org. (T243766)

Mentioned in SAL (#wikimedia-cloud) [2020-02-14T10:35:10Z] <arturo> running root@cloudcontrol2001-dev:~# designate server-create --name ns1.openstack.codfw1dev.wikimediacloud.org. (T243766)

Change 572209 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] eqiad1.wikimedia.cloud: introduce delegation to the new Designate FQDNs

https://gerrit.wikimedia.org/r/572209

Change 572209 merged by Arturo Borrero Gonzalez:
[operations/dns@master] eqiad1.wikimedia.cloud: introduce delegation to the new Designate FQDNs

https://gerrit.wikimedia.org/r/572209

Change 572213 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloud: refresh names for DNS servers in eqiad1/codfw1dev

https://gerrit.wikimedia.org/r/572213

Change 572213 merged by Andrew Bogott:
[operations/puppet@production] cloud: refresh names for DNS servers in eqiad1/codfw1dev

https://gerrit.wikimedia.org/r/572213

Change 574395 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hieradata: fix typo in cloud recursor FQDNs

https://gerrit.wikimedia.org/r/574395

Change 574395 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hieradata: fix typo in cloud recursor FQDNs

https://gerrit.wikimedia.org/r/574395

Given this proposal was accepted, I think this task can be closed. We can follow-up in the subtasks.