Page MenuHomePhabricator

Upgrade Cumin masters to stretch
Closed, ResolvedPublic

Description

Now that Salt is gone, the Cumin masters should be reimaged to stretch.

When that has happened, we can also deploy a backport of OpenSSH 7.6, which now provides CA support in ssh-keygen:

ssh-keygen(1): allow ssh-keygen to use a key held in ssh-agent as
a CA when signing certificates. bz#2377

Event Timeline

Actually when looking at Racktables both neodymium and sarin had their warranty expired in January 2016, so they're pretty close to our usual five years lifespan. So I think it makes sense to not reimage the existing servers, but setup replacement hardware with stretch.

Mentioned in SAL (#wikimedia-operations) [2018-08-29T08:42:01Z] <volans> uploaded cumin_3.0.2-2+deb9u1 to apt.wikimedia.org stretch-wikimedia - T177385

Mentioned in SAL (#wikimedia-operations) [2018-08-29T08:56:45Z] <volans> uploaded python{,3}-conftool_1.0.2-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T177385

Mentioned in SAL (#wikimedia-operations) [2018-08-29T09:03:50Z] <volans> uploaded spicerack_0.0.2-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T177385

Change 460321 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Switch role for cumin2001 to role::cluster::management

https://gerrit.wikimedia.org/r/460321

Change 460323 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Enable cumin2001 as mysql maintenance client

https://gerrit.wikimedia.org/r/460323

Change 460321 merged by Muehlenhoff:
[operations/puppet@production] Switch role for cumin2001 to role::cluster::management

https://gerrit.wikimedia.org/r/460321

Change 460908 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] cumin: fix puppetdb query

https://gerrit.wikimedia.org/r/460908

Change 460908 merged by Volans:
[operations/puppet@production] cumin: fix puppetdb query

https://gerrit.wikimedia.org/r/460908

Change 461364 had a related patch set uploaded (by Volans; owner: Volans):
[operations/puppet@production] cumin: re-disable the urllib3 warning

https://gerrit.wikimedia.org/r/461364

Script wmf-auto-reimage was launched by jmm on cumin2001.codfw.wmnet for hosts:

['mw1298.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201809191200_jmm_12091.log.

Completed auto-reimage of hosts:

['mw1298.eqiad.wmnet']

and were ALL successful.

Change 461364 merged by Volans:
[operations/puppet@production] cumin: re-disable the urllib3 warning

https://gerrit.wikimedia.org/r/461364

Change 460323 merged by Jcrespo:
[operations/puppet@production] Enable cumin1001 and cumin2001 as mysql maintenance clients

https://gerrit.wikimedia.org/r/460323

I deployed the grants to the new hosts but 1) it needs to change for labs 2) some hosts had errors:

root@neodymium:~/software/dbtools$ /usr/local/sbin/mysql.py -BN -h db1115 zarcillo -e "SELECT instances.name FROM instances" | while read host; do echo "Deploying to $host..."; mysql.py -h $host < ~/new_grants.txt; done 
Deploying to db1061...
Deploying to db1062...
Deploying to db1063...
Deploying to db1064...
Deploying to db1065...
Deploying to db1066...
Deploying to db1067...
Deploying to db1068...
Deploying to db1069...
Deploying to db1070...
Deploying to db1071...
Deploying to db1072...
Deploying to db1073...
Deploying to db1074...
Deploying to db1075...
Deploying to db1076...
Deploying to db1077...
Deploying to db1078...
Deploying to db1079...
Deploying to db1080...
Deploying to db1081...
Deploying to db1082...
Deploying to db1083...
Deploying to db1084...
Deploying to db1085...
Deploying to db1086...
Deploying to db1087...
Deploying to db1088...
Deploying to db1089...
Deploying to db1090:3312...
Deploying to db1090:3317...
Deploying to db1091...
Deploying to db1092...
Deploying to db1093...
Deploying to db1094...
Deploying to db1095:3312...
Deploying to db1095:3313...
Deploying to db1096:3315...
Deploying to db1096:3316...
Deploying to db1097:3314...
Deploying to db1097:3315...
Deploying to db1098:3316...
Deploying to db1098:3317...
Deploying to db1099:3311...
Deploying to db1099:3318...
Deploying to db1100...
Deploying to db1101:3317...
Deploying to db1101:3318...
Deploying to db1102:3314...
Deploying to db1102:3315...
Deploying to db1103:3312...
Deploying to db1103:3314...
Deploying to db1104...
Deploying to db1105:3311...
Deploying to db1105:3312...
Deploying to db1106...
Deploying to db1107...
ERROR 1045 (28000) at line 5: Access denied for user 'root'@'10.%' (using password: YES)
Deploying to db1108...
Deploying to db1109...
Deploying to db1110...
Deploying to db1111...
Deploying to db1112...
Deploying to db1113:3315...
Deploying to db1113:3316...
Deploying to db1114...
Deploying to db1115...
Deploying to db1117:3321...
Deploying to db1117:3322...
Deploying to db1117:3323...
Deploying to db1117:3325...
Deploying to db1118...
ERROR 1064 (42000) at line 5: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IDENTIFIED BY X'
    WITH GRANT OPTION' at line 3
Deploying to db1119...
Deploying to db1120...
Deploying to db1121...
Deploying to db1122...
Deploying to db1123...
Deploying to db1124:3311...
Deploying to db1124:3313...
Deploying to db1124:3315...
Deploying to db1124:3318...
Deploying to db1125:3312...
Deploying to db1125:3314...
Deploying to db1125:3316...
Deploying to db1125:3317...
Deploying to db2033...
Deploying to db2034...
Deploying to db2035...
Deploying to db2036...
Deploying to db2037...
Deploying to db2038...
Deploying to db2039...
Deploying to db2040...
Deploying to db2041...
Deploying to db2042...
Deploying to db2043...
Deploying to db2044...
Deploying to db2045...
Deploying to db2046...
Deploying to db2047...
Deploying to db2048...
Deploying to db2049...
Deploying to db2050...
Deploying to db2051...
Deploying to db2052...
Deploying to db2053...
Deploying to db2054...
Deploying to db2055...
Deploying to db2056...
Deploying to db2057...
Deploying to db2058...
Deploying to db2059...
Deploying to db2060...
Deploying to db2061...
Deploying to db2062...
Deploying to db2063...
Deploying to db2065...
Deploying to db2066...
Deploying to db2067...
Deploying to db2068...
Deploying to db2069...
Deploying to db2070...
Deploying to db2071...
Deploying to db2072...
Deploying to db2073...
Deploying to db2074...
Deploying to db2075...
Deploying to db2076...
Deploying to db2077...
Deploying to db2078:3321...
Deploying to db2078:3322...
Deploying to db2078:3323...
Deploying to db2078:3325...
Deploying to db2079...
Deploying to db2080...
Deploying to db2081...
Deploying to db2082...
Deploying to db2083...
Deploying to db2084:3314...
Deploying to db2084:3315...
Deploying to db2085:3311...
Deploying to db2085:3318...
Deploying to db2086:3317...
Deploying to db2086:3318...
Deploying to db2087:3316...
Deploying to db2087:3317...
Deploying to db2088:3311...
Deploying to db2088:3312...
Deploying to db2089:3315...
Deploying to db2089:3316...
Deploying to db2090...
Deploying to db2091:3312...
Deploying to db2091:3314...
Deploying to db2092...
Deploying to db2093...
Deploying to db2094:3311...
Deploying to db2094:3313...
Deploying to db2094:3315...
Deploying to db2094:3318...
Deploying to db2095:3312...
Deploying to db2095:3314...
Deploying to db2095:3316...
Deploying to db2095:3317...
Deploying to dbstore1001:3311...
Deploying to dbstore1002...
Deploying to dbstore2001:3312...
Deploying to dbstore2001:3315...
Deploying to dbstore2001:3316...
Deploying to dbstore2001:3317...
Deploying to dbstore2001:3318...
Deploying to dbstore2002:3311...
Deploying to dbstore2002:3312...
Deploying to dbstore2002:3313...
Deploying to dbstore2002:3314...
Deploying to dbstore2002:3320...
Deploying to es1011...
Deploying to es1012...
Deploying to es1013...
Deploying to es1014...
Deploying to es1015...
Deploying to es1016...
Deploying to es1017...
Deploying to es1018...
Deploying to es1019...
Deploying to es2011...
Deploying to es2012...
Deploying to es2013...
Deploying to es2014...
Deploying to es2015...
Deploying to es2016...
Deploying to es2017...
Deploying to es2018...
Deploying to es2019...
Deploying to labsdb1004...
Deploying to labsdb1005...
Deploying to labsdb1009...
Deploying to labsdb1010...
Deploying to labsdb1011...
Deploying to labservices1001...
ERROR 2005 (HY000): Unknown MySQL server host 'labservices1001.eqiad.wmnet' (-2)
Deploying to labservices1002...
ERROR 2005 (HY000): Unknown MySQL server host 'labservices1002.eqiad.wmnet' (-2)
Deploying to labtestservices2001...
ERROR 2005 (HY000): Unknown MySQL server host 'labtestservices2001.codfw.wmnet' (-2)
Deploying to pc1004...
Deploying to pc1005...
Deploying to pc1006...
Deploying to pc2004...
Deploying to pc2005...
Deploying to pc2006...

db1107
db1118 (because it is MySQL 8.0 and has a different syntax)

labservices1001
labservices1002
labtestservices2001

(because the 3 have a wikimedia.org domain)

CC @Marostegui

Also need to change back the root password for labs hosts

labsdb*

This should be for the most part fixed, although showed some bugs on the implementation that may need to be fixed later, but for now mysqls can be queries from cumin1/2.

Volans claimed this task.
Volans removed a project: Patch-For-Review.

Migration has been completed and cumin[12]001 are fully in service since few weeks. Resolving.