db1063.eqiad.wmnet db1054.eqiad.wmnet db1067.eqiad.wmnet db1036.eqiad.wmnet db1015.eqiad.wmnet db1021.eqiad.wmnet db1022.eqiad.wmnet db2059.codfw.wmnet db2035.codfw.wmnet db2051.codfw.wmnet db2061.codfw.wmnet db2044.codfw.wmnet db2052.codfw.wmnet db2058.codfw.wmnet db2066.codfw.wmnet db2064.codfw.wmnet db2065.codfw.wmnet db2053.codfw.wmnet db2041.codfw.wmnet db2050.codfw.wmnet db2045.codfw.wmnet db2037.codfw.wmnet db2046.codfw.wmnet db2063.codfw.wmnet db2036.codfw.wmnet db2067.codfw.wmnet db2060.codfw.wmnet db2043.codfw.wmnet db2054.codfw.wmnet db2039.codfw.wmnet
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Restricted Task | |||||
Open | None | T120532 Use user-specific passwords for accessing Analytics MariaDB replica databases | |||
Resolved | aaron | T88445 MediaWiki active/active datacenter investigation and work (tracking) | |||
Resolved | Krinkle | T270223 FY2021-2022: Enable basic Multi-DC operations for read traffic (tracking) | |||
Resolved | tstarling | T134809 App servers <=> mariadb SSL/TLS for cross-datacenter writes | |||
Resolved | LSobanski | T111653 Encrypt all the things | |||
Resolved | • jcrespo | T111654 Set up TLS for MariaDB replication | |||
Resolved | • jcrespo | T152188 Restart pending mysql hosts with old TLS cert | |||
Resolved | • Marostegui | T152364 db1047 out of disk space, eventlogging_sync spam |
Event Timeline
Change 324862 had a related patch set uploaded (by Jcrespo):
Depool db1076 for maintenance
Mentioned in SAL (#wikimedia-operations) [2016-12-02T09:18:14Z] <jynus> mysql restart and upgrade for db1076 T152188
Change 324867 had a related patch set uploaded (by Jcrespo):
Repool db1076 with low load after maintenance
Change 324869 had a related patch set uploaded (by Jcrespo):
Depool db1074 for maintenance and upgrade
Change 324874 had a related patch set uploaded (by Jcrespo):
Repool db1076 with full load
Mentioned in SAL (#wikimedia-operations) [2016-12-02T10:28:44Z] <jynus> mysql restart and upgrade for db1074 T152188
Change 324879 had a related patch set uploaded (by Jcrespo):
Repool db1074 with low load after maintenance
Change 324885 had a related patch set uploaded (by Jcrespo):
Depool db1060 for maintenance
Change 324886 had a related patch set uploaded (by Jcrespo):
Pool db1074 with full load after warmup
Change 324888 had a related patch set uploaded (by Jcrespo):
Really depool db1060, unlike 2 patches ago
Mentioned in SAL (#wikimedia-operations) [2016-12-02T11:56:38Z] <jynus> mysql restart for db1060 T152188
Change 324894 had a related patch set uploaded (by Jcrespo):
Repool db1060 after maintenance
Change 324899 had a related patch set uploaded (by Jcrespo):
Enable new TLS certs on labsdb hosts
Change 324908 had a related patch set uploaded (by Jcrespo):
mariadb: Update dbstores to use the latest TLS certificate
Change 324908 merged by Jcrespo:
mariadb: Update dbstores to use the latest TLS certificate
Change 325273 had a related patch set uploaded (by Jcrespo):
Renew expired TLS certificate for eventlogging hosts
Change 325273 merged by Jcrespo:
Renew expired TLS certificate for eventlogging hosts
Mentioned in SAL (#wikimedia-operations) [2016-12-07T10:03:42Z] <jynus> restart and upgrade of dbstore1001 T152188
Change 325759 had a related patch set uploaded (by Jcrespo):
Fixes to the predump and bpipe mysql method of backups
Mentioned in SAL (#wikimedia-operations) [2016-12-07T14:19:05Z] <jynus> restart and upgrade of dbstore200[12] T152188
Change 325759 merged by Jcrespo:
backups: Fix & uniform predump and bpipe mysql method of backups
We finally have the backups again up and running, with one day of delay. Reminder: check that all complete ok.
Mentioned in SAL (#wikimedia-operations) [2016-12-09T13:47:27Z] <jynus> disable puppet on db1047, db1046 and dbstore1002 in preparation for restarts T152188
Mentioned in SAL (#wikimedia-operations) [2016-12-09T14:06:17Z] <jynus> restarting db1046 T152188
Change 326122 had a related patch set uploaded (by Jcrespo):
analytics-mariadb: Enable new certificates on eventlogging servers
Change 326122 merged by Jcrespo:
analytics-mariadb: Enable new certificates on eventlogging servers
Change 326123 had a related patch set uploaded (by Jcrespo):
eventlogging-mariadb: Add new TLS certs to eventlogging severs
Change 326123 merged by Jcrespo:
eventlogging-mariadb: Add new TLS certs to eventlogging severs
Mentioned in SAL (#wikimedia-operations) [2016-12-09T14:39:23Z] <jynus> restarting db1047 T152188
Mentioned in SAL (#wikimedia-operations) [2016-12-09T14:49:39Z] <jynus> restarting dbstore1002 T152188
Pending hosts:
db1063.eqiad.wmnet db1054.eqiad.wmnet db1067.eqiad.wmnet db1036.eqiad.wmnet db1015.eqiad.wmnet db1021.eqiad.wmnet db1022.eqiad.wmnet db2059.codfw.wmnet db2035.codfw.wmnet db2051.codfw.wmnet db2061.codfw.wmnet db2044.codfw.wmnet db2052.codfw.wmnet db2058.codfw.wmnet db2066.codfw.wmnet db2064.codfw.wmnet db2065.codfw.wmnet db2053.codfw.wmnet db2041.codfw.wmnet db2050.codfw.wmnet db2045.codfw.wmnet db2037.codfw.wmnet db2046.codfw.wmnet db2063.codfw.wmnet db2036.codfw.wmnet db2067.codfw.wmnet db2060.codfw.wmnet db2043.codfw.wmnet db2054.codfw.wmnet db2039.codfw.wmnet
Pending hosts:
db1036.eqiad.wmnet: cacert db1021.eqiad.wmnet: cacert db1022.eqiad.wmnet: cacert db1015.eqiad.wmnet: cacert db2050.codfw.wmnet: cacert db2045.codfw.wmnet: cacert db2046.codfw.wmnet: cacert db2036.codfw.wmnet: cacert db2064.codfw.wmnet: cacert db2043.codfw.wmnet: cacert
Mentioned in SAL (#wikimedia-operations) [2017-02-06T18:16:21Z] <jynus> preparing to reimage db2050 T152188
Mentioned in SAL (#wikimedia-operations) [2017-02-06T19:13:58Z] <jynus> preparing to reimage db2045 T152188
Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db2050.codfw.wmnet', 'db2045.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201702062131_jynus_29631.log.
Completed auto-reimage of hosts:
['db2050.codfw.wmnet', 'db2045.codfw.wmnet']
and were ALL successful.
Mentioned in SAL (#wikimedia-operations) [2017-02-07T10:34:38Z] <jynus> preparing db2046 for reimage T152188
Change 336391 had a related patch set uploaded (by Jcrespo):
mariadb: Depool db1036 for a quick reboot
Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db2046.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201702071149_jynus_5780.log.
Change 336402 had a related patch set uploaded (by Jcrespo):
mariadb: Depool db1021 for a quick reboot
Mentioned in SAL (#wikimedia-operations) [2017-02-07T14:56:24Z] <jynus> preparing db2036 for reimage T152188
Mentioned in SAL (#wikimedia-operations) [2017-02-07T16:08:17Z] <jynus> restarting and upgrading db2064 T152188
Change 336429 had a related patch set uploaded (by Jcrespo):
mariadb: Depool db1022 for maintenance
Mentioned in SAL (#wikimedia-operations) [2017-02-07T17:07:47Z] <jynus> restarting and upgrading db1022 T152188
Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db2036.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201702071725_jynus_1180.log.
Completed auto-reimage of hosts:
['db2036.codfw.wmnet']
Of which those FAILED:
set(['db2036.codfw.wmnet'])
Change 336453 had a related patch set uploaded (by Jcrespo):
mariadb: depool db1015 for maintenance
Mentioned in SAL (#wikimedia-operations) [2017-02-07T18:55:55Z] <jynus> restarting and upgrading db1015 T152188
Mentioned in SAL (#wikimedia-operations) [2017-02-07T18:58:18Z] <jynus> preparing db2043 for reimage T152188
Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:
['db2043.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201702072134_jynus_5655.log.
All hosts with the old expiring cert have been reimagened or (if scheduled for decomission), restarted:
sudo salt --output=txt -C 'G@cluster:mysql' cmd.run 'mysql -BN --skip-ssl -e "SELECT @@ssl_ca"' | grep cacert
Old, non-puppet certs can be removed and puppetization changed. To be done at T111654.