deployment-etcd-01.deployment-prep.eqiad.wmflabs is running Jessie and needs to be replaced with a Buster machine.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
betacluster: switch etcd to deployment-etcd02 | operations/mediawiki-config | master | +1 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | None | T197804 Puppet: forbid new Python2 code | |||
Open | None | T218426 Upgrade various Cloud VPS Python 2 scripts to Python 3 | |||
Resolved | BUG REPORT | • Bstorm | T218423 Add python 3 packages to openstack::clientpackages::common | ||
Resolved | MoritzMuehlenhoff | T232677 Remove support for Debian Jessie in Cloud Services | |||
Duplicate | None | T236575 "deployment-prep" Cloud VPS project jessie deprecation | |||
Resolved | None | T218729 Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster | |||
Resolved | taavi | T276462 Replace deployment-etcd-01 with a Buster host |
Event Timeline
The current instance is running etcd 2.2.1. Buster has 3.2.26 available.
etcdctl is convinced that it's running on wikimedia.cloud and refuses to connect, even when specifying a .wmflabs address:
root@deployment-etcd-01:~# etcdctl -C https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379 ls Error: client: etcd cluster is unavailable or misconfigured error #0: x509: certificate is valid for deployment-etcd-01.deployment-prep.eqiad.wmflabs, deployment-etcd-01, etcd.deployment-prep.eqiad.wmflabs, etcd.deployment-prep.eqiad.wmflabs, not deployment-etcd-01.deployment-prep.eqiad1.wikimedia.cloud
Looking at it it has conftool data:
root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys {"action":"get","node":{"dir":true,"nodes":[{"key":"/conftool","dir":true,"modifiedIndex":5,"createdIndex":5},{"key":"/test","value":"1","modifiedIndex":4,"createdIndex":4}]}} root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys/conftool {"action":"get","node":{"key":"/conftool","dir":true,"nodes":[{"key":"/conftool/v1","dir":true,"modifiedIndex":5,"createdIndex":5}],"modifiedIndex":5,"createdIndex":5}} root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys/conftool/v1 {"action":"get","node":{"key":"/conftool/v1","dir":true,"nodes":[{"key":"/conftool/v1/mediawiki-config","dir":true,"modifiedIndex":48,"createdIndex":48},{"key":"/conftool/v1/services","dir":true,"modifiedIndex":5,"createdIndex":5}],"modifiedIndex":5,"createdIndex":5}} root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys/conftool/v1/services {"action":"get","node":{"key":"/conftool/v1/services","dir":true,"nodes":[{"key":"/conftool/v1/services/cache_maps","dir":true,"modifiedIndex":19,"createdIndex":19},{"key":"/conftool/v1/services/scb","dir":true,"modifiedIndex":5,"createdIndex":5},{"key":"/conftool/v1/services/swift","dir":true,"modifiedIndex":23,"createdIndex":23},{"key":"/conftool/v1/services/thumbor","dir":true,"modifiedIndex":32,"createdIndex":32},{"key":"/conftool/v1/services/appserver","dir":true,"modifiedIndex":27,"createdIndex":27},{"key":"/conftool/v1/services/cache_text","dir":true,"modifiedIndex":11,"createdIndex":11},{"key":"/conftool/v1/services/dns","dir":true,"modifiedIndex":34,"createdIndex":34},{"key":"/conftool/v1/services/imagescaler","dir":true,"modifiedIndex":35,"createdIndex":35},{"key":"/conftool/v1/services/maps","dir":true,"modifiedIndex":46,"createdIndex":46},{"key":"/conftool/v1/services/parsoid","dir":true,"modifiedIndex":22,"createdIndex":22},{"key":"/conftool/v1/services/prometheus","dir":true,"modifiedIndex":38,"createdIndex":38},{"key":"/conftool/v1/services/videoscaler","dir":true,"modifiedIndex":14,"createdIndex":14},{"key":"/conftool/v1/services/cache_misc","dir":true,"modifiedIndex":6,"createdIndex":6},{"key":"/conftool/v1/services/eventbus","dir":true,"modifiedIndex":13,"createdIndex":13},{"key":"/conftool/v1/services/jobrunner","dir":true,"modifiedIndex":37,"createdIndex":37},{"key":"/conftool/v1/services/pdf","dir":true,"modifiedIndex":31,"createdIndex":31},{"key":"/conftool/v1/services/restbase","dir":true,"modifiedIndex":10,"createdIndex":10},{"key":"/conftool/v1/services/sca","dir":true,"modifiedIndex":16,"createdIndex":16},{"key":"/conftool/v1/services/testserver","dir":true,"modifiedIndex":28,"createdIndex":28},{"key":"/conftool/v1/services/api_appserver","dir":true,"modifiedIndex":47,"createdIndex":47},{"key":"/conftool/v1/services/aqs","dir":true,"modifiedIndex":36,"createdIndex":36},{"key":"/conftool/v1/services/cache_upload","dir":true,"modifiedIndex":9,"createdIndex":9},{"key":"/conftool/v1/services/elasticsearch","dir":true,"modifiedIndex":18,"createdIndex":18},{"key":"/conftool/v1/services/phabricator","dir":true,"modifiedIndex":24,"createdIndex":24}],"modifiedIndex":5,"createdIndex":5}}
Beta does not use etcd for db config. It is still used for other services?
Etcd v2 -> v3 migration looks annoying. The cluster needs to first be upgraded to 2.3, then 3.0 to 3.1 and only after that to 3.2.
Caller survey:
taavi@deployment-etcd-01:/var/log/nginx$ sudo cat etcd_access.log | cut -d ' ' -f1 | sort | uniq -c | sort 1098 172.16.4.16 13 172.16.5.46 142 172.16.1.115 3122 172.16.4.119 628 172.16.4.18 968 172.16.4.98 taavi@deployment-etcd-01:/var/log/nginx$ sudo cat etcd_access.log | cut -d ' ' -f1 | sort | uniq | sort | xargs -L 1 host 115.1.16.172.in-addr.arpa domain name pointer deployment-parsoid11.deployment-prep.eqiad1.wikimedia.cloud. 119.4.16.172.in-addr.arpa domain name pointer deployment-mediawiki-07.deployment-prep.eqiad1.wikimedia.cloud. 16.4.16.172.in-addr.arpa domain name pointer deployment-mwmaint01.deployment-prep.eqiad1.wikimedia.cloud. 18.4.16.172.in-addr.arpa domain name pointer deployment-deploy01.deployment-prep.eqiad1.wikimedia.cloud. 98.4.16.172.in-addr.arpa domain name pointer deployment-jobrunner03.deployment-prep.eqiad1.wikimedia.cloud. 46.5.16.172.in-addr.arpa domain name pointer deployment-etcd-01.deployment-prep.eqiad1.wikimedia.cloud.
Might be related: found some DNS records, _etcd._tcp.beta.wmflabs.org. and _etcd_server._tcp.beta.wmflabs.org., that are pointing to a non-existent instance that was deleted in early 2019: T218729#5140552.
Mentioned in SAL (#wikimedia-releng) [2021-03-05T13:40:20Z] <Majavah> create deployment-etcd02 and sign its puppet certificate T276462
Mentioned in SAL (#wikimedia-releng) [2021-03-05T17:50:13Z] <Majavah> switch deployment-prep hiera key etcd_host to use deployment-etcd02 ref T276462
Change 668751 had a related patch set uploaded (by Majavah; owner: Majavah):
[operations/mediawiki-config@master] betacluster: switch etcd to deployment-etcd02
progress update:
- deployment-etcd02 is now running etcd v3 and had conftool data imported from deployment-etcd-01
- I switched over deployment-prep global hiera key etcd_host to the new host.
- There is a mediawiki-config patch but it hasn's not been merged or deployed yet
Change 668751 merged by jenkins-bot:
[operations/mediawiki-config@master] betacluster: switch etcd to deployment-etcd02
Mentioned in SAL (#wikimedia-releng) [2021-03-05T19:14:59Z] <Majavah> beta cluster etcd was switched from deployment-etcd-01 to deployment-etcd02 ref T276462
Mentioned in SAL (#wikimedia-releng) [2021-03-05T19:30:30Z] <Majavah> shutdown deployment-etcd-01 to see if anything breaks, will delete if nothing has broken during next week T276462
Mentioned in SAL (#wikimedia-releng) [2021-03-11T16:49:07Z] <Majavah> delete deployment-etcd-01 T276462