Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Andrew | T380679 Drop support for VMs with .wmflabs FQDNs | |||
| Resolved | taavi | T380678 Re-create deployment-cumin |
Event Timeline
Change #1095189 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:toolforge: mail: Drop support for .wmflabs VM names
Change #1095190 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] puppet_compiler: Drop support for .wmflabs VM names
Change #1095191 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:cumin: Drop support for .wmflabs VM names
Change #1095192 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: admin_scripts: Remove support for .wmflabs VM names
Change #1095193 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: puppet: Drop support for .wmflabs names
Change #1095189 merged by Majavah:
[operations/puppet@production] P:toolforge: mail: Drop support for .wmflabs VM names
I think the last vm with a .wmflabs A record is gone.
root@cloudcontrol1007:~# openstack recordset list --sudo-project-id noauth-project 114f1333-c2c1-44d3-beb4-ebed1a91742b +--------------------------------------+----------------------------------+-------+--------------------------------------------------+--------+--------+ | id | name | type | records | status | action | +--------------------------------------+----------------------------------+-------+--------------------------------------------------+--------+--------+ | 15bd9777-2146-47bc-b195-9cd9cff64910 | eqiad.wmflabs. | SOA | ns0.openstack.eqiad1.wikimediacloud.org. | ACTIVE | NONE | | | | | root.wmflabs.org. 1732918283 3600 600 86400 3600 | | | | 70f41f1a-e51e-47ec-b387-925774688c95 | eqiad.wmflabs. | NS | ns1.openstack.eqiad1.wikimediacloud.org. | ACTIVE | NONE | | | | | ns0.openstack.eqiad1.wikimediacloud.org. | | | | 4f5412e9-3872-46e1-8065-91d1fc253951 | tools-redis.tools.eqiad.wmflabs. | CNAME | redis.svc.tools.eqiad1.wikimedia.cloud. | ACTIVE | NONE | | 67d7a5c6-f1be-4f37-afab-e1034d26a4d2 | tools-redis.eqiad.wmflabs. | CNAME | redis.svc.tools.eqiad1.wikimedia.cloud. | ACTIVE | NONE | | f5f147bc-3828-4f75-b187-cf560b8bff36 | tools-db.tools.eqiad.wmflabs. | CNAME | tools.db.svc.eqiad.wmflabs. | ACTIVE | NONE | +--------------------------------------+----------------------------------+-------+--------------------------------------------------+--------+--------+
This leaves us with those service addresses to unpack, and a few dozen references to the old domain in puppet, most of them for deployment-prep.
Change #1113468 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] deployment-prep hiera: remove uses of .eqiad.wmflabs tld
Change #1113468 merged by Andrew Bogott:
[operations/puppet@production] deployment-prep hiera: remove uses of .eqiad.wmflabs tld
Change #1113855 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] dsh: remove librenms group entirely
Change #1113855 merged by Andrew Bogott:
[operations/puppet@production] dsh: remove librenms group entirely
Change #1118151 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] cloud-vps resolv.conf: remove .eqiad.wmflabs
Change #1095192 abandoned by Andrew Bogott:
[operations/puppet@production] openstack: admin_scripts: Remove support for .wmflabs VM names
Reason:
no longer needed tanks to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1118127 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1117630
Change #1095190 merged by Majavah:
[operations/puppet@production] puppet_compiler: Drop support for .wmflabs VM names
Change #1095191 merged by Majavah:
[operations/puppet@production] P:cumin: Drop support for .wmflabs VM names
I now think that the service names for .eqiad.wmflabs are good, because they will allow us to remove the resolv.conf entry for eqiad.wmflabs without breaking anything. Indeed, I think we should add two more.
I have a little test script to see how things behave with and without the resolv.conf entry:
andrew@abogott-nstesting:~$ cat nstest.sh #!/bin/bash host tools-redis host tools-db host enwiki.labsdb host enwiki.web.db.svc.eqiad.wmflabs host tools-redis.tools.eqiad.wmflabs
Currently, it produces this output:
andrew@abogott-nstesting:~$ ./nstest.sh tools-redis.tools.eqiad.wmflabs is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46 tools-db.tools.eqiad.wmflabs is an alias for tools.db.svc.eqiad.wmflabs. tools.db.svc.eqiad.wmflabs is an alias for tools.db.svc.wikimedia.cloud. tools.db.svc.wikimedia.cloud has address 172.16.0.168 enwiki.labsdb is an alias for s1.analytics.db.svc.wikimedia.cloud. s1.analytics.db.svc.wikimedia.cloud has address 172.20.255.2 enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.wikimedia.cloud. s1.web.db.svc.wikimedia.cloud has address 172.20.255.10 tools-redis.tools.eqiad.wmflabs is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46
If I remove tools.eqiad.wmflabs from the resolv.conf search line, we get this:
Host tools-redis not found: 3(NXDOMAIN) Host tools-db not found: 3(NXDOMAIN) enwiki.labsdb is an alias for s1.analytics.db.svc.wikimedia.cloud. s1.analytics.db.svc.wikimedia.cloud has address 172.20.255.2 enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.wikimedia.cloud. s1.web.db.svc.wikimedia.cloud has address 172.20.255.10 tools-redis.tools.eqiad.wmflabs is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46
So I propose to add cnames for tools-redis.tools.eqiad1.wikimedia.cloud -> redis.svc.tools.eqiad1.wikimedia.cloud and tools-db.tools.eqiad1.wikimedia.cloud -> tools.db.svc.eqiad.wmflabs
Are there other edge cases I'm missing?
note to self: turn on dns query logging and see if those domains are actually being used.
So I propose to add cnames for tools-redis.tools.eqiad1.wikimedia.cloud -> redis.svc.tools.eqiad1.wikimedia.cloud and tools-db.tools.eqiad1.wikimedia.cloud -> tools.db.svc.eqiad.wmflabs
That seems wrong, we should skip the middleman and do
tools-db.tools.eqiad1.wikimedia.cloud -> tools.db.svc.wikimedia.cloud
I've put a trace on the recursor to see if people are still using the short names 'tools-redis' and 'tools-db'. Of course due to resolv.conf behavior I can't literally detect queries for those standalone names, but I can query for what the resolver will query for after the first search suffix is added.
rec_control trace-regex '^tools-redis.tools.eqiad1.wikimedia.cloud\.$|^tools-db.tools.eqiad1.wikimedia.cloud\.$'
...so all I'm really detecting is whether users are searching for tools-redis', 'tools-db' OR 'tools-redis.tools.eqiad1.wikimedia.cloud', 'tools-db.tools.eqiad1.wikimedia.cloud' but if users were using the latter two they'd be getting NXDOMAIN so they probably aren't doing that a lot.
OK! I can already report that people are using those short names quite a lot. I see queries incoming from a variety of k8s workers. I also see queries for tools-redis coming in from tools-k8s-control-8.tools.eqiad1.wikimedia.cloud which suggests that we have infra code also using the short names.
So I'm voting to put in a shim and keep those short names working rather than spend a year hunting and eliminating uses.
I propose to add cnames for tools-redis.tools.eqiad1.wikimedia.cloud -> redis.svc.tools.eqiad1.wikimedia.cloud
SGTM.
we should skip the middleman and do
tools-db.tools.eqiad1.wikimedia.cloud -> tools.db.svc.wikimedia.cloud
I was thinking the same, but in the end it doesn't make much difference.
I've added those two new cnames. Now my test script looks like this:
andrew@abogott-nstesting:~$ sh ./nstest.sh tools-redis.tools.eqiad1.wikimedia.cloud is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46 tools-db.tools.eqiad1.wikimedia.cloud is an alias for tools.db.svc.wikimedia.cloud. tools.db.svc.wikimedia.cloud has address 172.16.0.168 enwiki.labsdb is an alias for s1.analytics.db.svc.wikimedia.cloud. s1.analytics.db.svc.wikimedia.cloud has address 172.20.255.2 enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.wikimedia.cloud. s1.web.db.svc.wikimedia.cloud has address 172.20.255.10 tools-redis.tools.eqiad.wmflabs is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46
and without eqiad.wmflabs in resolv.conf:
tools-redis.tools.eqiad1.wikimedia.cloud is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46 tools-db.tools.eqiad1.wikimedia.cloud is an alias for tools.db.svc.wikimedia.cloud. tools.db.svc.wikimedia.cloud has address 172.16.0.168 enwiki.labsdb is an alias for s1.analytics.db.svc.wikimedia.cloud. s1.analytics.db.svc.wikimedia.cloud has address 172.20.255.2 enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.wikimedia.cloud. s1.web.db.svc.wikimedia.cloud has address 172.20.255.10 tools-redis.tools.eqiad.wmflabs is an alias for redis.svc.tools.eqiad1.wikimedia.cloud. redis.svc.tools.eqiad1.wikimedia.cloud has address 172.16.2.46
I've added those two new cnames.
Where did you add them? I was expecting to find them in a tools.eqiad1.wikimedia.cloud zone.
For historical reasons (and also coding simplicity) most <something>.<project>.eqiad1.wikimedia.cloud records actually belong in eqiad1.wikimedia.cloud rather than in <project>.eqiad1.wikimedia.cloud. I am surprised that that works, but it does!
root@cloudcontrol1006:~# openstack zone list --all-projects | grep " eqiad1.wikimedia.cloud" | 67603ef4-3d64-40d6-90d3-5b7776a99034 | cloudinfra | eqiad1.wikimedia.cloud. | PRIMARY | 1739379080 | ACTIVE | NONE | root@cloudcontrol1006:~# openstack recordset list --all-projects 67603ef4-3d64-40d6-90d3-5b7776a99034 | grep CNAME | 752d1b78-3be5-49b2-a1d4-deb2dfa68cf8 | cloudinfra | k8s.toolsbeta.eqiad1.wikimedia.cloud. | CNAME | k8s.svc.toolsbeta.eqiad1.wikimedia.cloud. | ACTIVE | NONE | | 87d897f8-0125-46df-ad36-7c18a31a3456 | cloudinfra | k8s.tools.eqiad1.wikimedia.cloud. | CNAME | k8s.svc.tools.eqiad1.wikimedia.cloud. | ACTIVE | NONE | | b80705d0-d0e6-483a-bf02-dac4114cb282 | cloudinfra | tools-redis.tools.eqiad1.wikimedia.cloud. | CNAME | redis.svc.tools.eqiad1.wikimedia.cloud. | ACTIVE | NONE | | 8fbe0767-9598-45ce-982e-4eef86fd023e | cloudinfra | tools-db.tools.eqiad1.wikimedia.cloud. | CNAME | tools.db.svc.wikimedia.cloud. | ACTIVE | NONE |
proposed announcement email:
tl;dr: Minor change to DNS resolution[0] for toolforge and cloud-vps services on Monday. Should have no effect but please yell if you see things break. The whole story: Back in 2020[1] we stopped associated new VMs with the .wmflabs top-level domain; since then all VMs have been accessed via .wikimedia.cloud instead. As of a few weeks ago, the last remaining .wmflabs VM was deleted, so we're now cleaning up code and config that supported that domain. Right now if you try to resolve a stand-alone hostname (e.g. by typing "ping mycoolserver") the resolver will try three different fqdns: first 'mycoolserver.<projectname>.eqiad1.wikimedia.cloud.' and, failing that, 'mycoolserver.<projectname>.eqiad.wmflabs.' and, failing that, the simple name 'mycoolserver.'. After Monday, that second fallback won't happen, so it'll either be 'mycoolserver.<projectname>.eqiad1.wikimedia.cloud.' or 'mycoolserver.'. Most people don't use simple hostnames anyway; for those users this change will have no effect. Some toolforge applications still use the old-fashioned 'tools-redis' or 'tools-db' hostnames; we already have changes in place to resolve those correctly after the change. If there are single hostname edge cases that I don't know about and can't find, their behavior may change or break in surprising ways. Note that fully qualified service names (for example gurwiki.analytics.db.svc.eqiad.wmflabs) are unaffected by this update. I would love to eliminate them too, but it's unclear how to identify and remove all uses so that cleanup will be left for another day. [0] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1118151 [1] https://wikitech.wikimedia.org/wiki/News/2020_Phasing_out_the_.wmflabs_domain
For historical reasons (and also coding simplicity) most <something>.<project>.eqiad1.wikimedia.cloud records actually belong in eqiad1.wikimedia.cloud rather than in <project>.eqiad1.wikimedia.cloud. I am surprised that that works, but it does!
Ah I see, thanks!
proposed announcement email:
LGTM!
Change #1095193 merged by Majavah:
[operations/puppet@production] openstack: puppet: Drop support for .wmflabs names
Change #1118151 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps resolv.conf: remove .eqiad.wmflabs
No, more likely that just means that one of the CoreDNS pods has been scheduled on a control node.
Mentioned in SAL (#wikimedia-cloud-feed) [2025-02-18T15:01:37Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 (T380679)
Mentioned in SAL (#wikimedia-cloud-feed) [2025-02-18T15:03:47Z] <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 (T380679)
Mentioned in SAL (#wikimedia-cloud-feed) [2025-02-18T15:04:14Z] <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 (T380679)