Page MenuHomePhabricator

Replace deployment-etcd-01 with a Buster host
Closed, ResolvedPublic

Description

deployment-etcd-01.deployment-prep.eqiad.wmflabs is running Jessie and needs to be replaced with a Buster machine.

Event Timeline

taavi triaged this task as Medium priority.Mar 4 2021, 2:34 PM
taavi created this task.

The current instance is running etcd 2.2.1. Buster has 3.2.26 available.

etcdctl is convinced that it's running on wikimedia.cloud and refuses to connect, even when specifying a .wmflabs address:

root@deployment-etcd-01:~# etcdctl -C https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379 ls
Error:  client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate is valid for deployment-etcd-01.deployment-prep.eqiad.wmflabs, deployment-etcd-01, etcd.deployment-prep.eqiad.wmflabs, etcd.deployment-prep.eqiad.wmflabs, not deployment-etcd-01.deployment-prep.eqiad1.wikimedia.cloud

Looking at it it has conftool data:

root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys
{"action":"get","node":{"dir":true,"nodes":[{"key":"/conftool","dir":true,"modifiedIndex":5,"createdIndex":5},{"key":"/test","value":"1","modifiedIndex":4,"createdIndex":4}]}}
root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys/conftool
{"action":"get","node":{"key":"/conftool","dir":true,"nodes":[{"key":"/conftool/v1","dir":true,"modifiedIndex":5,"createdIndex":5}],"modifiedIndex":5,"createdIndex":5}}
root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys/conftool/v1
{"action":"get","node":{"key":"/conftool/v1","dir":true,"nodes":[{"key":"/conftool/v1/mediawiki-config","dir":true,"modifiedIndex":48,"createdIndex":48},{"key":"/conftool/v1/services","dir":true,"modifiedIndex":5,"createdIndex":5}],"modifiedIndex":5,"createdIndex":5}}
root@deployment-etcd-01:~# curl -L https://deployment-etcd-01.deployment-prep.eqiad.wmflabs:2379/v2/keys/conftool/v1/services
{"action":"get","node":{"key":"/conftool/v1/services","dir":true,"nodes":[{"key":"/conftool/v1/services/cache_maps","dir":true,"modifiedIndex":19,"createdIndex":19},{"key":"/conftool/v1/services/scb","dir":true,"modifiedIndex":5,"createdIndex":5},{"key":"/conftool/v1/services/swift","dir":true,"modifiedIndex":23,"createdIndex":23},{"key":"/conftool/v1/services/thumbor","dir":true,"modifiedIndex":32,"createdIndex":32},{"key":"/conftool/v1/services/appserver","dir":true,"modifiedIndex":27,"createdIndex":27},{"key":"/conftool/v1/services/cache_text","dir":true,"modifiedIndex":11,"createdIndex":11},{"key":"/conftool/v1/services/dns","dir":true,"modifiedIndex":34,"createdIndex":34},{"key":"/conftool/v1/services/imagescaler","dir":true,"modifiedIndex":35,"createdIndex":35},{"key":"/conftool/v1/services/maps","dir":true,"modifiedIndex":46,"createdIndex":46},{"key":"/conftool/v1/services/parsoid","dir":true,"modifiedIndex":22,"createdIndex":22},{"key":"/conftool/v1/services/prometheus","dir":true,"modifiedIndex":38,"createdIndex":38},{"key":"/conftool/v1/services/videoscaler","dir":true,"modifiedIndex":14,"createdIndex":14},{"key":"/conftool/v1/services/cache_misc","dir":true,"modifiedIndex":6,"createdIndex":6},{"key":"/conftool/v1/services/eventbus","dir":true,"modifiedIndex":13,"createdIndex":13},{"key":"/conftool/v1/services/jobrunner","dir":true,"modifiedIndex":37,"createdIndex":37},{"key":"/conftool/v1/services/pdf","dir":true,"modifiedIndex":31,"createdIndex":31},{"key":"/conftool/v1/services/restbase","dir":true,"modifiedIndex":10,"createdIndex":10},{"key":"/conftool/v1/services/sca","dir":true,"modifiedIndex":16,"createdIndex":16},{"key":"/conftool/v1/services/testserver","dir":true,"modifiedIndex":28,"createdIndex":28},{"key":"/conftool/v1/services/api_appserver","dir":true,"modifiedIndex":47,"createdIndex":47},{"key":"/conftool/v1/services/aqs","dir":true,"modifiedIndex":36,"createdIndex":36},{"key":"/conftool/v1/services/cache_upload","dir":true,"modifiedIndex":9,"createdIndex":9},{"key":"/conftool/v1/services/elasticsearch","dir":true,"modifiedIndex":18,"createdIndex":18},{"key":"/conftool/v1/services/phabricator","dir":true,"modifiedIndex":24,"createdIndex":24}],"modifiedIndex":5,"createdIndex":5}}

Beta does not use etcd for db config. It is still used for other services?

Etcd v2 -> v3 migration looks annoying. The cluster needs to first be upgraded to 2.3, then 3.0 to 3.1 and only after that to 3.2.

Caller survey:

taavi@deployment-etcd-01:/var/log/nginx$ sudo cat etcd_access.log | cut -d ' ' -f1 | sort | uniq -c  | sort 
   1098 172.16.4.16
     13 172.16.5.46
    142 172.16.1.115
   3122 172.16.4.119
    628 172.16.4.18
    968 172.16.4.98

taavi@deployment-etcd-01:/var/log/nginx$ sudo cat etcd_access.log | cut -d ' ' -f1 | sort | uniq | sort | xargs -L 1 host
115.1.16.172.in-addr.arpa domain name pointer deployment-parsoid11.deployment-prep.eqiad1.wikimedia.cloud.
119.4.16.172.in-addr.arpa domain name pointer deployment-mediawiki-07.deployment-prep.eqiad1.wikimedia.cloud.
16.4.16.172.in-addr.arpa domain name pointer deployment-mwmaint01.deployment-prep.eqiad1.wikimedia.cloud.
18.4.16.172.in-addr.arpa domain name pointer deployment-deploy01.deployment-prep.eqiad1.wikimedia.cloud.
98.4.16.172.in-addr.arpa domain name pointer deployment-jobrunner03.deployment-prep.eqiad1.wikimedia.cloud.
46.5.16.172.in-addr.arpa domain name pointer deployment-etcd-01.deployment-prep.eqiad1.wikimedia.cloud.

Might be related: found some DNS records, _etcd._tcp.beta.wmflabs.org. and _etcd_server._tcp.beta.wmflabs.org., that are pointing to a non-existent instance that was deleted in early 2019: T218729#5140552.

Mentioned in SAL (#wikimedia-releng) [2021-03-05T13:40:20Z] <Majavah> create deployment-etcd02 and sign its puppet certificate T276462

Mentioned in SAL (#wikimedia-releng) [2021-03-05T17:50:13Z] <Majavah> switch deployment-prep hiera key etcd_host to use deployment-etcd02 ref T276462

Change 668751 had a related patch set uploaded (by Majavah; owner: Majavah):
[operations/mediawiki-config@master] betacluster: switch etcd to deployment-etcd02

https://gerrit.wikimedia.org/r/668751

progress update:

  • deployment-etcd02 is now running etcd v3 and had conftool data imported from deployment-etcd-01
  • I switched over deployment-prep global hiera key etcd_host to the new host.
  • There is a mediawiki-config patch but it hasn's not been merged or deployed yet

Change 668751 merged by jenkins-bot:
[operations/mediawiki-config@master] betacluster: switch etcd to deployment-etcd02

https://gerrit.wikimedia.org/r/668751

Mentioned in SAL (#wikimedia-releng) [2021-03-05T19:14:59Z] <Majavah> beta cluster etcd was switched from deployment-etcd-01 to deployment-etcd02 ref T276462

Mentioned in SAL (#wikimedia-releng) [2021-03-05T19:30:30Z] <Majavah> shutdown deployment-etcd-01 to see if anything breaks, will delete if nothing has broken during next week T276462

Mentioned in SAL (#wikimedia-releng) [2021-03-11T16:49:07Z] <Majavah> delete deployment-etcd-01 T276462