I think we should rename / establish our DNS service names as follow:
DNS auth for eqiad1:
- ns0.openstack.eqiad1.wikimediacloud.org (currently cloud-ns0.wikimedia.org which is an additional IPv4 address usually assigned to cloudservices1003.wikimedia.org)
- ns1.openstack.eqiad1.wikimediacloud.org (currently cloud-ns1.wikimedia.org which is an additional IPv4 address usually assigned to cloudservices1004.wikimedia.org)
DNS rec for eqiad1:
- ns-recursor0.openstack.eqiad1.wikimediacloud.org (currently cloud-recursor0.wikimedia.org which resolves to the same IPv4 assigned to cloudservices1003.wikimedia.org)
- ns-recursor1.openstack.eqiad1.wikimediacloud.org (currently cloud-recursor1.wikimedia.org which resolves to the same IPv4 assigned to cloudservices1004.wikimedia.org)
DNS auth for codfw1dev:
- ns0.openstack.codfw1dev.wikimediacloud.org (currently codfw1dev-ns0.wikimedia.org which resolves to the same addresses assigned to cloudservices2002-dev.wikimedia.org)
- ns1.openstack.codfw1dev.wikimediacloud.org (new service)
DNS rec for codf1dev:
- ns-recursor0.openstack.codfw1dev.wikimediacloud.org (currently codfw1dev-recursor0.wikimedia.org)
- ns-recursor1.openstack.codfw1dev.wikimediacloud.org (currently codfw1dev-recursor0.wikimedia.org)
Rationale:
We have the domain wikimediacloud.org, to hold internal APIs and other openstack-core services , a name that better reflect the nature of the services being provided. Designate (the actual service doing the DNS work) is a part of openstack, so a service FQDN living in the same DNS namespace as other core openstack components has the benefit of bringing some additional amount of coherence. As can be seen in the current situation, we don't have a common way of naming things between deployments (cloud-recursor0.wikimedia.org vs codfw1dev-recursor0.wikimedia.org) so moving the service FQDNs to the new domain seems like a great opportunity to introduce such homogenization.
More info: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/DNS_domain_usage#Resolution
Proposed plan:
- Do all the required tests with the ns0.openstack.codfw1dev.wikimediacloud.org FQDN,
which is new and doesn't exists currently (i.e, introducing it won't break anything). - introduce the FQDNs into WMF DNS servers and have them properly resolve. Play with them and make sure everything makes sense.
- Make sure designate can work with the new names.
- Update delegations (NS records) to include both (old and new) service FQDNs.
- At this point, Designate should handle requests in both FQDNs (shouldn't matter, it is really IPv4-based anyway)
- introduce the FQDNs to virtual machines in eqiad1. Lets have both FQDNs (old and new) co-exists for a while to make sure we don't break anything. Be ready to roll-back in case of failures.
- if we are happy, cleanup etc. We probably need to rebuild base images for VMs?
Note1: Introducing an auth FQDN in codfw1dev (i.e, ns0.openstack.codfw1dev.wikimediacloud.org) should help unblock T243556: Fix internal TLD in use in codfw1dev
Note2: Since designate is serving both auth/rec queries, I don't think we really need to split between auth/rec service names, but that's probably a debate for another iteration.