Page MenuHomePhabricator

Upgrade cloudservices nodes to Debian Buster
Closed, ResolvedPublic

Description

Designate doesn't store any state on these boxes, and is an HA cluster, so that part should be simple.

PDNS has a local database which will need to be copied over to any newly-built host:

  • On old host: mysqldump pdns > pdns-dump.sql
  • stand up mysql
    1. cd /opt/wmf-mariadb104/
    2. ./scripts/mysql_install_db
  • Import data on new host: mysql> CREATE DATABASE pdns;
    1. mysql -p pdns < pdns-dump.sql
    2. # ^ empty password for this
    3. mysql > create database pdns;
  • bootstrap ip aliaser (this file will get filled in later)
    1. echo '{}' > /var/cache/labsaliaser/labs-ip-aliases.json
  • grants
    1. GRANT ALL PRIVILEGES ON pdns.* TO 'pdns'@'localhost' identified by '<password>
    2. GRANT ALL PRIVILEGES ON pdns.* TO 'pdns'@'ipv4' identified by '<password>
    3. GRANT ALL PRIVILEGES ON pdns.* TO 'pdns'@'ipv6' identified by '<password>

Event Timeline

Change 599133 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] codfw1dev: add cloudservices2003-dev to the designate host list

https://gerrit.wikimedia.org/r/599133

Change 599133 merged by Andrew Bogott:
[operations/puppet@production] codfw1dev: add cloudservices2003-dev to the designate host list

https://gerrit.wikimedia.org/r/599133

Change 599137 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] M5 grants: allow designate access via ipv6

https://gerrit.wikimedia.org/r/599137

Mentioned in SAL (#wikimedia-cloud) [2020-05-28T00:33:15Z] <andrewbogott> shutting down cloudservices2002-dev to see if we can live without it. This is in anticipation or rebuilding it entirely for T253780

Change 599324 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Prepare cloudservices2002-dev for debian Buster

https://gerrit.wikimedia.org/r/599324

Change 599137 merged by Andrew Bogott:
[operations/puppet@production] M5 grants: allow designate access via ipv6

https://gerrit.wikimedia.org/r/599137

Change 599324 merged by Andrew Bogott:
[operations/puppet@production] Prepare cloudservices2002-dev for debian Buster

https://gerrit.wikimedia.org/r/599324

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

cloudservices2002-dev.wikimedia.org

The log can be found in /var/log/wmf-auto-reimage/202005281401_andrew_62878_cloudservices2002-dev_wikimedia_org.log.

Completed auto-reimage of hosts:

['cloudservices2002-dev.wikimedia.org']

Of which those FAILED:

['cloudservices2002-dev.wikimedia.org']

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

cloudservices2002-dev.wikimedia.org

The log can be found in /var/log/wmf-auto-reimage/202005281404_andrew_65222_cloudservices2002-dev_wikimedia_org.log.

Completed auto-reimage of hosts:

['cloudservices2002-dev.wikimedia.org']

and were ALL successful.

Change 599354 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] labs-ip-alias-dump.py: upgrade to python3

https://gerrit.wikimedia.org/r/599354

Change 599354 merged by Andrew Bogott:
[operations/puppet@production] labs-ip-alias-dump.py: upgrade to python3

https://gerrit.wikimedia.org/r/599354

Change 599448 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] designate pools.yaml: Fix some hard-coded eqiad things

https://gerrit.wikimedia.org/r/599448

Change 599449 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Designate: designate doesn't need to write to the pdns db anymore

https://gerrit.wikimedia.org/r/599449

Change 599450 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] designate.conf.erb: remove pdns pool config

https://gerrit.wikimedia.org/r/599450

Change 599448 merged by Andrew Bogott:
[operations/puppet@production] designate pools.yaml: Fix some hard-coded eqiad things

https://gerrit.wikimedia.org/r/599448

Change 599450 merged by Andrew Bogott:
[operations/puppet@production] designate.conf.erb: remove pdns pool config

https://gerrit.wikimedia.org/r/599450

Change 599449 merged by Andrew Bogott:
[operations/puppet@production] Designate: designate doesn't need to write to the pdns db anymore

https://gerrit.wikimedia.org/r/599449

Change 600017 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] designate: allow mdns to listen on ipv6

https://gerrit.wikimedia.org/r/600017

Change 600095 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Designate: have mdns use tcp rather than udp for axfr

https://gerrit.wikimedia.org/r/600095

Change 600017 merged by Andrew Bogott:
[operations/puppet@production] designate: allow mdns to listen on ipv6

https://gerrit.wikimedia.org/r/600017

Change 600095 merged by Andrew Bogott:
[operations/puppet@production] Designate: have mdns use tcp rather than udp for axfr

https://gerrit.wikimedia.org/r/600095

Change 601551 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Rocky/Buster/Designate: a few live hacks to get things working on Buster

https://gerrit.wikimedia.org/r/601551

Change 601551 merged by Andrew Bogott:
[operations/puppet@production] Rocky/Buster/Designate: a few live hacks to get things working on Buster

https://gerrit.wikimedia.org/r/601551

Change 601711 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs resolv.conf: reduce timeout to 1s

https://gerrit.wikimedia.org/r/601711

Change 601714 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] wmcs vms: stop using ns1 for resolving

https://gerrit.wikimedia.org/r/601714

Andrew triaged this task as Medium priority.Jun 2 2020, 4:15 PM
Andrew moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

Change 601711 merged by Andrew Bogott:
[operations/puppet@production] wmcs resolv.conf: reduce timeout to 1s

https://gerrit.wikimedia.org/r/601711

Change 601714 merged by Andrew Bogott:
[operations/puppet@production] wmcs vms: stop using ns1 for resolving

https://gerrit.wikimedia.org/r/601714

Change 604010 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Horizon: put into maintenance mode for cloudservices rebuilds

https://gerrit.wikimedia.org/r/604010

Mentioned in SAL (#wikimedia-cloud) [2020-06-09T14:01:21Z] <arturo> icinga downtime everything cloud* lab* for 2h (T253780)

Change 604010 merged by Andrew Bogott:
[operations/puppet@production] Horizon: put into maintenance mode for cloudservices rebuilds

https://gerrit.wikimedia.org/r/604010

I'm pretty sure we can get an adequate dump with just

  1. mysqldump pdns > pdns-dump.sql

Mentioned in SAL (#wikimedia-cloud) [2020-06-09T14:09:51Z] <andrewbogott> stopping puppet, all designate services and all pdns services on cloudservices1004 for T253780

Change 604019 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move cloudservices1003/1004 to Debian Buster

https://gerrit.wikimedia.org/r/604019

Change 604019 merged by Andrew Bogott:
[operations/puppet@production] Move cloudservices1003/1004 to Debian Buster

https://gerrit.wikimedia.org/r/604019

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

cloudservices1004.wikimedia.org

The log can be found in /var/log/wmf-auto-reimage/202006091416_andrew_126076_cloudservices1004_wikimedia_org.log.

Completed auto-reimage of hosts:

['cloudservices1004.wikimedia.org']

and were ALL successful.

Change 604049 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps resolve.conf: move traffic from recursor0 to recursor1

https://gerrit.wikimedia.org/r/604049

Mentioned in SAL (#wikimedia-cloud) [2020-06-09T15:25:30Z] <arturo> icinga downtime everything cloud* lab* for 2h more (T253780)

Change 604049 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps resolve.conf: move traffic from recursor0 to recursor1

https://gerrit.wikimedia.org/r/604049

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

cloudservices1003.wikimedia.org

The log can be found in /var/log/wmf-auto-reimage/202006091617_andrew_222628_cloudservices1003_wikimedia_org.log.

Completed auto-reimage of hosts:

['cloudservices1003.wikimedia.org']

Of which those FAILED:

['cloudservices1003.wikimedia.org']

Change 604084 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloud-vps resolv.conf: restore use of both recursors

https://gerrit.wikimedia.org/r/604084

Change 604084 merged by Andrew Bogott:
[operations/puppet@production] cloud-vps resolv.conf: restore use of both recursors

https://gerrit.wikimedia.org/r/604084