Page MenuHomePhabricator

Remove IPV6 dns records from new database hosts
Closed, ResolvedPublic

Description

When installing new hosts we are specifically asking on the template not to have AAAA records.
However, I saw that the hosts installed at T355350 and at T355269 have them
So that is:
db2196-db2220
es10[35-40]

The hosts at T355343 es[2035-2040] are not yet installed or provisioned, so those are ok. Please DO NOT add them there cc @Jhancock.wm

I've talked to @Volans and he'll kindly clean those records for us, but please in the future make sure to double check that the template we fill out indeed says AAAA records: N

Event Timeline

Marostegui renamed this task from Remove IPV6 dns records from new hosts to Remove IPV6 dns records from new database hosts.Feb 27 2024, 2:50 PM
Marostegui added a project: Data-Persistence.

Got the list of affected hosts with nodeset -S '","' -e "db[2196-2220],es10[35-40]" on a cumin host, then I run the following code on Netbox:

>>> import uuid
>>> request_id = uuid.uuid4()
>>> user = User.objects.get(username='volans')
>>> def update(d):
...     ip = d.primary_ip6
...     log = ip.to_objectchange('update')
...     log.request_id = request_id
...     log.user = user
...     ip.dns_name = ""
...     ip.save()
...     log.save()
...
>>> devices = Device.objects.filter(name__in=["db2196","db2197","db2198","db2199","db2200","db2201","db2202","db2203","db2204","db2205","db2206","db2207","db2208","db2209","db2210","db2211","db2212","db2213","db2214","db2215","db2216","db2217","db2218","db2219","db2220","es1035","es1036","es1037","es1038","es1039","es1040"])
>>> len(devices)
31
>>> [d.name for d in devices]
['db2196', 'db2197', 'db2198', 'db2199', 'db2200', 'db2201', 'db2202', 'db2203', 'db2204', 'db2205', 'db2206', 'db2207', 'db2208', 'db2209', 'db2210', 'db2211', 'db2212', 'db2213', 'db2214', 'db2215', 'db2216', 'db2217', 'db2218', 'db2219', 'db2220', 'es1035', 'es1036', 'es1037', 'es1038', 'es1039', 'es1040']
>>> for device in devices:
...     update(device)
...
>>>

The changes can be seen in https://netbox.wikimedia.org/extras/changelog/?request_id=e909b6ad-50e3-406a-b092-036e4742347b
And then I run the sre.dns.netbox cookbook to propagate the deleted records to the DNS.

I've also run the sre.dns.wipe-cache cookbook to clean the dns recursors's cache with:

while read line; do sudo cookbook sre.dns.wipe-cache "$line.codfw.wmnet"; sleep 1; done <<< $(nodeset -S '\n' -e "db[2196-2220]")

# and

while read line; do sudo cookbook sre.dns.wipe-cache "$line.eqiad.wmnet"; sleep 1; done <<< $(nodeset -S '\n' -e "es10[35-40]")

[N.B. this is fairly spammy on SAL, 2 logs for each host...]

Cleanup completed, leaving the task open for DCOps to prevent this from happening.

I've checked all the devices with names starting in db and es and the only ones with IPv6 AAAA records are: dbprov1004 and dbprov2004

Those should be fine, they are used for backups.

RobH claimed this task.

Cleanup completed, leaving the task open for DCOps to prevent this from happening.

I've emailed our team list regarding this to ensure future compliance to racking details (our next team meeting isn't until next week.) Apologies for the confusion and extra steps caused!