Restart pending mysql hosts with old TLS cert
Closed, ResolvedPublic

Description

db1063.eqiad.wmnet
db1054.eqiad.wmnet
db1067.eqiad.wmnet
db1036.eqiad.wmnet
db1015.eqiad.wmnet
db1021.eqiad.wmnet
db1022.eqiad.wmnet

db2059.codfw.wmnet
db2035.codfw.wmnet
db2051.codfw.wmnet
db2061.codfw.wmnet
db2044.codfw.wmnet
db2052.codfw.wmnet
db2058.codfw.wmnet
db2066.codfw.wmnet
db2064.codfw.wmnet
db2065.codfw.wmnet
db2053.codfw.wmnet
db2041.codfw.wmnet
db2050.codfw.wmnet
db2045.codfw.wmnet
db2037.codfw.wmnet
db2046.codfw.wmnet
db2063.codfw.wmnet
db2036.codfw.wmnet
db2067.codfw.wmnet
db2060.codfw.wmnet
db2043.codfw.wmnet
db2054.codfw.wmnet
db2039.codfw.wmnet
jcrespo created this task.Dec 2 2016, 8:56 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 2 2016, 8:56 AM

Change 324862 had a related patch set uploaded (by Jcrespo):
Depool db1076 for maintenance

https://gerrit.wikimedia.org/r/324862

Change 324862 merged by Jcrespo:
Depool db1076 for maintenance

https://gerrit.wikimedia.org/r/324862

Mentioned in SAL (#wikimedia-operations) [2016-12-02T09:18:14Z] <jynus> mysql restart and upgrade for db1076 T152188

Change 324867 had a related patch set uploaded (by Jcrespo):
Repool db1076 with low load after maintenance

https://gerrit.wikimedia.org/r/324867

Change 324867 merged by Jcrespo:
Repool db1076 with low load after maintenance

https://gerrit.wikimedia.org/r/324867

Change 324869 had a related patch set uploaded (by Jcrespo):
Depool db1074 for maintenance and upgrade

https://gerrit.wikimedia.org/r/324869

Change 324869 merged by Jcrespo:
Depool db1074 for maintenance and upgrade

https://gerrit.wikimedia.org/r/324869

Change 324874 had a related patch set uploaded (by Jcrespo):
Repool db1076 with full load

https://gerrit.wikimedia.org/r/324874

Mentioned in SAL (#wikimedia-operations) [2016-12-02T10:28:44Z] <jynus> mysql restart and upgrade for db1074 T152188

Change 324874 merged by jenkins-bot:
Repool db1076 with full load

https://gerrit.wikimedia.org/r/324874

Change 324879 had a related patch set uploaded (by Jcrespo):
Repool db1074 with low load after maintenance

https://gerrit.wikimedia.org/r/324879

Change 324879 merged by jenkins-bot:
Repool db1074 with low load after maintenance

https://gerrit.wikimedia.org/r/324879

Change 324885 had a related patch set uploaded (by Jcrespo):
Depool db1060 for maintenance

https://gerrit.wikimedia.org/r/324885

Change 324885 merged by Jcrespo:
Depool db1060 for maintenance

https://gerrit.wikimedia.org/r/324885

Change 324886 had a related patch set uploaded (by Jcrespo):
Pool db1074 with full load after warmup

https://gerrit.wikimedia.org/r/324886

Change 324888 had a related patch set uploaded (by Jcrespo):
Really depool db1060, unlike 2 patches ago

https://gerrit.wikimedia.org/r/324888

Change 324886 merged by jenkins-bot:
Pool db1074 with full load after warmup

https://gerrit.wikimedia.org/r/324886

Change 324888 merged by jenkins-bot:
Really depool db1060, unlike 2 patches ago

https://gerrit.wikimedia.org/r/324888

jcrespo claimed this task.Dec 2 2016, 11:54 AM
jcrespo moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2016-12-02T11:56:38Z] <jynus> mysql restart for db1060 T152188

Change 324894 had a related patch set uploaded (by Jcrespo):
Repool db1060 after maintenance

https://gerrit.wikimedia.org/r/324894

Change 324894 merged by jenkins-bot:
Repool db1060 after maintenance

https://gerrit.wikimedia.org/r/324894

Change 324899 had a related patch set uploaded (by Jcrespo):
Enable new TLS certs on labsdb hosts

https://gerrit.wikimedia.org/r/324899

Change 324899 merged by Jcrespo:
Enable new TLS certs on labsdb hosts

https://gerrit.wikimedia.org/r/324899

jcrespo lowered the priority of this task from Normal to Low.Dec 2 2016, 2:06 PM
jcrespo moved this task from In progress to Backlog on the DBA board.

Change 324908 had a related patch set uploaded (by Jcrespo):
mariadb: Update dbstores to use the latest TLS certificate

https://gerrit.wikimedia.org/r/324908

Change 324908 merged by Jcrespo:
mariadb: Update dbstores to use the latest TLS certificate

https://gerrit.wikimedia.org/r/324908

Change 325273 had a related patch set uploaded (by Jcrespo):
Renew expired TLS certificate for eventlogging hosts

https://gerrit.wikimedia.org/r/325273

Change 325273 merged by Jcrespo:
Renew expired TLS certificate for eventlogging hosts

https://gerrit.wikimedia.org/r/325273

Mentioned in SAL (#wikimedia-operations) [2016-12-07T10:03:42Z] <jynus> restart and upgrade of dbstore1001 T152188

Change 325759 had a related patch set uploaded (by Jcrespo):
Fixes to the predump and bpipe mysql method of backups

https://gerrit.wikimedia.org/r/325759

Mentioned in SAL (#wikimedia-operations) [2016-12-07T14:19:05Z] <jynus> restart and upgrade of dbstore200[12] T152188

Change 325759 merged by Jcrespo:
backups: Fix & uniform predump and bpipe mysql method of backups

https://gerrit.wikimedia.org/r/325759

We finally have the backups again up and running, with one day of delay. Reminder: check that all complete ok.

Mentioned in SAL (#wikimedia-operations) [2016-12-09T13:47:27Z] <jynus> disable puppet on db1047, db1046 and dbstore1002 in preparation for restarts T152188

Mentioned in SAL (#wikimedia-operations) [2016-12-09T14:06:17Z] <jynus> restarting db1046 T152188

Change 326122 had a related patch set uploaded (by Jcrespo):
analytics-mariadb: Enable new certificates on eventlogging servers

https://gerrit.wikimedia.org/r/326122

Change 326122 merged by Jcrespo:
analytics-mariadb: Enable new certificates on eventlogging servers

https://gerrit.wikimedia.org/r/326122

Change 326123 had a related patch set uploaded (by Jcrespo):
eventlogging-mariadb: Add new TLS certs to eventlogging severs

https://gerrit.wikimedia.org/r/326123

Change 326123 merged by Jcrespo:
eventlogging-mariadb: Add new TLS certs to eventlogging severs

https://gerrit.wikimedia.org/r/326123

Mentioned in SAL (#wikimedia-operations) [2016-12-09T14:39:23Z] <jynus> restarting db1047 T152188

Mentioned in SAL (#wikimedia-operations) [2016-12-09T14:49:39Z] <jynus> restarting dbstore1002 T152188

Pending hosts:

db1063.eqiad.wmnet
db1054.eqiad.wmnet
db1067.eqiad.wmnet
db1036.eqiad.wmnet
db1015.eqiad.wmnet
db1021.eqiad.wmnet
db1022.eqiad.wmnet

db2059.codfw.wmnet
db2035.codfw.wmnet
db2051.codfw.wmnet
db2061.codfw.wmnet
db2044.codfw.wmnet
db2052.codfw.wmnet
db2058.codfw.wmnet
db2066.codfw.wmnet
db2064.codfw.wmnet
db2065.codfw.wmnet
db2053.codfw.wmnet
db2041.codfw.wmnet
db2050.codfw.wmnet
db2045.codfw.wmnet
db2037.codfw.wmnet
db2046.codfw.wmnet
db2063.codfw.wmnet
db2036.codfw.wmnet
db2067.codfw.wmnet
db2060.codfw.wmnet
db2043.codfw.wmnet
db2054.codfw.wmnet
db2039.codfw.wmnet
jcrespo updated the task description. (Show Details)Dec 14 2016, 6:21 PM

Pending hosts:

db1036.eqiad.wmnet: cacert
db1021.eqiad.wmnet: cacert
db1022.eqiad.wmnet: cacert
db1015.eqiad.wmnet: cacert
db2050.codfw.wmnet: cacert
db2045.codfw.wmnet: cacert
db2046.codfw.wmnet: cacert
db2036.codfw.wmnet: cacert
db2064.codfw.wmnet: cacert
db2043.codfw.wmnet: cacert

Mentioned in SAL (#wikimedia-operations) [2017-02-06T18:16:21Z] <jynus> preparing to reimage db2050 T152188

Mentioned in SAL (#wikimedia-operations) [2017-02-06T19:13:58Z] <jynus> preparing to reimage db2045 T152188

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db2050.codfw.wmnet', 'db2045.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201702062131_jynus_29631.log.

Completed auto-reimage of hosts:

['db2050.codfw.wmnet', 'db2045.codfw.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2017-02-07T10:34:38Z] <jynus> preparing db2046 for reimage T152188

Change 336391 had a related patch set uploaded (by Jcrespo):
mariadb: Depool db1036 for a quick reboot

https://gerrit.wikimedia.org/r/336391

Change 336391 merged by jenkins-bot:
mariadb: Depool db1036 for a quick reboot

https://gerrit.wikimedia.org/r/336391

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db2046.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201702071149_jynus_5780.log.

Completed auto-reimage of hosts:

['db2046.codfw.wmnet']

and were ALL successful.

Change 336402 had a related patch set uploaded (by Jcrespo):
mariadb: Depool db1021 for a quick reboot

https://gerrit.wikimedia.org/r/336402

Change 336402 merged by jenkins-bot:
mariadb: Depool db1021 for a quick reboot

https://gerrit.wikimedia.org/r/336402

Mentioned in SAL (#wikimedia-operations) [2017-02-07T14:56:24Z] <jynus> preparing db2036 for reimage T152188

Mentioned in SAL (#wikimedia-operations) [2017-02-07T16:08:17Z] <jynus> restarting and upgrading db2064 T152188

Change 336429 had a related patch set uploaded (by Jcrespo):
mariadb: Depool db1022 for maintenance

https://gerrit.wikimedia.org/r/336429

Change 336429 merged by Jcrespo:
mariadb: Depool db1022 for maintenance

https://gerrit.wikimedia.org/r/336429

Mentioned in SAL (#wikimedia-operations) [2017-02-07T17:07:47Z] <jynus> restarting and upgrading db1022 T152188

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db2036.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201702071725_jynus_1180.log.

Completed auto-reimage of hosts:

['db2036.codfw.wmnet']

Of which those FAILED:

set(['db2036.codfw.wmnet'])

Change 336453 had a related patch set uploaded (by Jcrespo):
mariadb: depool db1015 for maintenance

https://gerrit.wikimedia.org/r/336453

Change 336453 merged by jenkins-bot:
mariadb: depool db1015 for maintenance

https://gerrit.wikimedia.org/r/336453

Mentioned in SAL (#wikimedia-operations) [2017-02-07T18:55:55Z] <jynus> restarting and upgrading db1015 T152188

Mentioned in SAL (#wikimedia-operations) [2017-02-07T18:58:18Z] <jynus> preparing db2043 for reimage T152188

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db2043.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201702072134_jynus_5655.log.

Completed auto-reimage of hosts:

['db2043.codfw.wmnet']

and were ALL successful.

jcrespo closed this task as Resolved.Feb 8 2017, 12:24 AM

All hosts with the old expiring cert have been reimagened or (if scheduled for decomission), restarted:

sudo salt --output=txt -C 'G@cluster:mysql' cmd.run 'mysql -BN --skip-ssl -e "SELECT @@ssl_ca"' | grep cacert

Old, non-puppet certs can be removed and puppetization changed. To be done at T111654.