As discussed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/566735, currently hostnames in this deployment look like:
$vmname.$project.codfw1dev.cloud
There should be a wikimedia label between codfw1dev and cloud.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | aborrero | T217891 CloudVPS: rework codfw deployments | |||
Resolved | None | T229441 CloudVPS: codfw1dev: missing bits | |||
Resolved | Krenair | T242607 Create in-cloud puppetmaster for codfw1dev | |||
Resolved | aborrero | T243556 Fix internal TLD in use in codfw1dev |
Event Timeline
I had a look a this today.
Puppet change seems rather simple, a cuple of hiera keys:
profile::openstack::codfw1dev::designate::domain_id_internal_forward: '8d67f102-5984-4e4d-a15a-e94ccd6d9eaf' profile::openstack::codfw1dev::nova::dhcp_domain: 'codfw1dev.cloud'
The current domain in use is
root@cloudcontrol2001-dev:~# designate domain-get 8d67f102-5984-4e4d-a15a-e94ccd6d9eaf --all-tenants +-------------+--------------------------------------+ | Field | Value | +-------------+--------------------------------------+ | description | None | | created_at | 2019-09-06T20:06:14.000000 | | updated_at | 2020-01-14T00:08:24.000000 | | email | abogott@wikimedia.org | | ttl | 3600 | | serial | 1578960500 | | id | 8d67f102-5984-4e4d-a15a-e94ccd6d9eaf | | name | codfw1dev.cloud. | +-------------+--------------------------------------+
The change could be to simply introduce the new domain, but @Andrew commented to me that we need a special hack to introduce the new domain for it to belong to the noauth tenant.
I decided that I didn't want this special hack and wanted the new codfw1dev.wikimedia.cloud domain to belong to the admin tenant (or wmflabsdotorg or whatever). But no special hacks.
Then @Andrew commented that we would need some updates to desginate-sink to be able to work with a zone that belongs to a project rather than to noauth.
Since this is something we would eventually do to eqiad1 as well, my proposal is to:
- identify what changes are required for designated-sink to work with zones that belong to proper projects rather than noauth.
- evaluate zones with noauth and consider moving them to a proper project. Some options:
- use admin project to hold all base domains. May not be read by novaobserver, so this option may be wrong.
- use wmflabsdotorg, which makes me nervous because it holds more than just the wmflabs.org domain which eventually will go away anyway.
- create a new infra project to hold base domain names. Naming things is hard, but perhaps something like the dns-infra project can hold all the new base domains <deployment>.wikimedia.cloud and .wmcloud.org.
- evaluate what would it take to do implement these changes (specially downtime, user impact). Do them first in codfw1dev.
I've re-read the code a bit, and it's not obvious to me that we need to specify a tenant to sink (outside of the implicit tenant associated with the zone id from the config). So for starters let's just try swapping in a tenant-owned zone and see if everything just works.
I agree let's not use the admin tenant, as IIRC it already has some special meaning, and novaobserver would not work (unless we fixed novaobserver to work in that tenant, don't recall why it doesn't there...)
Let's not expand the wmflabsdotorg tenant either.
We could use cloudinfra, which is already existing and used for things with a cloud-wide impact, and isn't a special case at a technical OpenStack level. Alternatively, yes, we could make a new one and transfer things over if necessary - dnsinfra (without the hyphen) sounds fine.
Mentioned in SAL (#wikimedia-cloud) [2020-01-27T12:44:31Z] <arturo> [codfw1dev] root@cloudcontrol2001-dev:~# openstack zone create --description "main DNS domain for VMs" --email "root@wmflabs.org" --type PRIMARY --ttl 3600 codfw1dev.wikimedia.cloud. T243556
Mentioned in SAL (#wikimedia-cloud) [2020-01-27T12:45:49Z] <arturo> [codfw1dev] manually move the new domain to the cloudinfra-codfw1dev project clouddb2001-dev: [designate]> update zones set tenant_id='cloudinfra-codfw1dev' where id = '4c75410017904858a5839de93c9e8b3d'; T243556
Change 567453 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] cloud: introduce delegation for codfw1dev.wikimedia.cloud
Change 567454 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] wmcs: codfw1dev: fix .cloud domain
Change 567453 merged by Arturo Borrero Gonzalez:
[operations/dns@master] cloud: introduce delegation for codfw1dev.wikimedia.cloud
Change 567454 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] wmcs: codfw1dev: fix .cloud domain
Ok, I got to this point:
- the new domain codfw1dev.wikimedia.cloud has been delegated to designate @ cloudservices2002-dev.wikimedia.org. I did this by means of the proposal at T243766 (ns0.openstack.codfw1dev.wikimediacloud.org)
- the new domain belongs to the cloudinfra-codfw1dev project
- with this patch https://gerrit.wikimedia.org/r/567454 openstack is instructed to issue FQDNs in the new domain for new VMs
- records per instance creation/deletion seems to work:
root@arturo-new-domain-test:~# host 172.16.128.12 12.128.16.172.in-addr.arpa domain name pointer arturo-new-domain-test.admin.codfw1dev.wikimedia.cloud. root@arturo-new-domain-test:~# host arturo-new-domain-test.admin.codfw1dev.wikimedia.cloud. arturo-new-domain-test.admin.codfw1dev.wikimedia.cloud has address 172.16.128.12 root@arturo-new-domain-test:~# host arturo-new-domain-test.codfw1dev.wikimedia.cloud arturo-new-domain-test.codfw1dev.wikimedia.cloud has address 172.16.128.12
- what doesn't work is puppet certs. Somehow the VM is requesting a certificate for the wrong FQDN:
root@labtestpuppetmaster2001:~# puppet cert list "arturo-new-domain-test.admin.wikimedia.cloud" (SHA256) 29:CC:18:B3:86:29:2B:1F:D5:6A:AE:67:C4:97:C9:40:2D:EC:DD:F7:F2:F3:34:C0:46:87:25:EC:D6:FA:87:10 ^^^ NOTE missing .codfw1dev. in there ^^^
Change 568473 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] CloudVPS: set domain for VM instances using nova-api metadata
Change 568493 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: puppet-enc: allow hostnames from the new domain
Change 568493 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: puppet-enc: allow hostnames from the new domain
Change 568473 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] CloudVPS: set domain for VM instances using nova-api metadata
Change 568535 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hieradata: openstack: introduce basic keys for codfw1dev
Change 568535 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hieradata: openstack: introduce basic keys for codfw1dev
Things are far better now:
- create a VM
- accept the puppet cert in the puppetmaster (labtestpuppetmaster2001)
- do the first puppet run
- ssh to the VM!
Basically, this is solved. The process is very similar to what we have in eqiad1. Auto-signing if appropriate would be another improvement, but I leave that for other phab task.