I'm opening this task as a follow up of a chat that happened on IRC in #wikimedia-serviceops on Nov. 24th to try to find what's the best solution to be the source of truth of the .svc.$dc.wmnet DNS records with the goal of being flexible enough to support the various current use cases and not require to repeat the information into multiple data sources.
The current options were discussed, without any clear winner so far.
Data requirements
- Netbox: allocated IPs for IPAM purposes, better if decorated with their DNS name
- DNS: IPs <-> DNS name mapping (both direct and reverse), ideally auto-generated from some datasource, plus some CNAMEs
- Puppet: IPs <-> DNS mapping with additional metadata that is needed to define the service in Puppet
Functional requirements
- Aim to have a single source of truth from where it's easy to check existing records and add new records
- Enforce the use of the same last octet in both eqiad and codfw subnets for a given service IP
- Support one-DC-only use case records were the IP is allocated in the DC that is primary for that service and the IP with the last octet in the other DC is just reserved and no DNS record is created for it.
Current use cases (as of Sep. 2021)
- Normal IP <-> DNS name 1:1 mappings for most services. Those could be automatically managed by Netbox with the current automation.
Special IP <-> DNS name many:1 mappings for some k8s-related services (notably staging), where the round robin DNS is used as a load-balancing system. Those cases aren't supported in Netbox unless we develop a custom plugin that holds some additional data to support this use case- CNAMEs:
- swift -> ms-fe (@fgiunchedi ?)
- kubestagemaster -> kubestagemaster1001 / kubestagemaster2001 (@akosiaris ?)
- staging -> kubestage1001 / kubestage2001
- termbox-test -> staging
- prometheus -> prometheus3001 / prometheus4001 / prometheus5001 (@fgiunchedi ?)
- SVC records that point to host IPs outside of the SVC subnets. Are those really required or just tech debt that should be fixed?
- ganeti01 (@MoritzMuehlenhoff ?)
- ganeti-test01 (@MoritzMuehlenhoff ?)
nfs-tools-project
DNS Records with non-standard TTL:
- prometheus CNAMEs. 5M (@fgiunchedi ?)
We have just one for oresrdb that has a 5M TTL instead of the default 1H and that's not currently supported by the Netbox automation for lack of a place where to store that information properly. This record too points to a host IP and is not using a service IP.