Install Debian Buster on Hadoop
Closed, ResolvedPublic0 Estimated Story Points
Actions

Assigned To

Authored By

	elukey
	Aug 23 2019, 9:31 AM

Description

The upgrade to Debian buster for the Hadoop cluster(s) might be a bit more complicated than what we thought, due to the fact that openjdk-8 is not available on Debian Buster. In T229347 Andrew was able to install it on stat1005 since the openjdk-8 was present in Buster before its final release, but not now (so if we reimage we'll not find it for example).

The above becomes problematic due to the following constraints:

Spark 2.3 (our current version) doesn't support Java 11 (see also T229347#5394326). IIUC this is due to the Scala version used (2.11), that doesn't support Java 11 (https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html)
Support of scala 2.12+ for Java 11 is still incomplete - https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes
Spark 2.4 comes with scala 2.12 that offers experimental support for Java 11

Also, in stretch-backports we do have openjdk-11: https://packages.debian.org/stretch-backports/openjdk-11-jdk
Last but not the least, we'd also need to make sure that the HDFS/Yarn daemons work correctly on Buster and Java 11. CDH of course supports Java11 only from 6.3 onward: https://www.cloudera.com/documentation/enterprise/upgrade/topics/ug_jdk8.html

But it also true that CDH 6.3 ships with Spark 2.4, so either they support Java 11 as experimental feature or there is a way to make Spark 2.4 working: https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_63_packaging.html

Considerations:

I am not a scala/spark expert so what I wrote above might not be true, please double check and in case correct me :)
backporting openjdk-8 to buster is possible but it would require a big effort for the SRE team. The last backport of openjdk-8 for cassandra on Debian Jessie still needs to be maintained (application of patches for Debian Security Advisories, etc..), so it would be preferable not to go on that road again.

Details

Subject	Repo	Branch	Lines +/-
role::analytics_cluster::hadoop::worker: add gpu-users	operations/puppet	production	+1 -0
role::analytics_cluster::hadoop::worker: set linux 5.10 on GPU workers	operations/puppet	production	+9 -0
Update GPU settings for Hadoop workers to ROCm 3.8	operations/puppet	production	+1 -11
Set an-worker1098 as Hadoop GPU Buster worker	operations/puppet	production	+2 -2
Move an-worker1097 to the Hadoop gpu buster workers	operations/puppet	production	+2 -2
install_server: switch to partman's reuse-parts.cfg for hadoop workers	operations/puppet	production	+2 -2
Add specific settings for Hadoop workers on Buster with GPUs	operations/puppet	production	+6 -1
bigtop::hadoop: hdfs/mapred/yarn system users needs to be in grp hadoop	operations/puppet	production	+3 -0
bigtop::hadoop: avoid a dependency between hadoop-client and users	operations/puppet	production	+3 -3
bigtop::hadoop: create system users before installing hadoop-client	operations/puppet	production	+14 -1
Add an-worker111[7,8] to the Analytics Hadoop cluster	operations/puppet	production	+2 -2
Allocate fixed uid/gid for analytics-related system daemons	operations/puppet	production	+78 -6
druid::bigtop::hadoop::user: add fixed uid/gid from Buster onward	operations/puppet	production	+10 -0
bigtop: set uid/gid for hadoop user/groups for Buster	operations/puppet	production	+49 -4
admin: reserve gid/uid ftor various Hadoop daemons	operations/puppet	production	+82 -0
bigtop: add the hadoop/hdfs/mapred/yarn groups to the catalog	operations/puppet	production	+16 -4
bigtop: require hadoop users before installing daemon packages	operations/puppet	production	+17 -6
install_server: add custom reuse recipes for Hadoop test	operations/puppet	production	+10 -1
hadoop: set Buster for all worker nodes	operations/puppet	production	+0 -81
Set Buster for analytics1031	operations/puppet	production	+1 -0
profile::java::analytics: deploy openjdk-8 on Buster	operations/puppet	production	+26 -4

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T302086 Set scap minimum python version to 3.7
Resolved	None	T247045 Migrate all of production metal and VMs to Buster or later
Resolved	elukey	T234629 Move the Analytics infrastructure to Debian Buster
Resolved	• razzi	T231067 Install Debian Buster on Hadoop
Duplicate	None	T220542 Update R from 3.3.3 to 3.6.0 on stat and notebook machines
Resolved	elukey	T230724 Upgrade all SWAP users to JupyterLab 1.0
Resolved	MoritzMuehlenhoff	T233604 Create OpenJDK 8 packages for Buster
Declined	None	T275896 Review ROCm deployment procedures and current packages
Resolved	• razzi	T278421 Upgrade furud/flerovium to Debian Buster
Resolved	elukey	T278422 Upgrade the rest of the Hadoop test cluster to Buster
Resolved	• razzi	T278423 Upgrade the Hadoop masters to Debian Buster
Resolved	elukey	T278424 Upgrade the Hadoop coordinators to Debian Buster

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 667668 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set an-worker1098 as Hadoop GPU Buster worker

https://gerrit.wikimedia.org/r/667668

Change 667668 merged by Elukey:
[operations/puppet@production] Set an-worker1098 as Hadoop GPU Buster worker

https://gerrit.wikimedia.org/r/667668

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1098.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103011740_elukey_17551.log.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1098.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103011755_elukey_19122.log.

Completed auto-reimage of hosts:

['an-worker1098.eqiad.wmnet']

and were ALL successful.

Pausing this for a few days to let the MW history jobs to complete :)

Change 668085 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Update GPU settings for Hadoop workers to ROCm 3.8

https://gerrit.wikimedia.org/r/668085

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1099.eqiad.wmnet', 'an-worker1100.eqiad.wmnet', 'an-worker1101.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103031420_elukey_26330.log.

Change 668085 merged by Elukey:
[operations/puppet@production] Update GPU settings for Hadoop workers to ROCm 3.8

https://gerrit.wikimedia.org/r/668085

Completed auto-reimage of hosts:

['an-worker1099.eqiad.wmnet', 'an-worker1100.eqiad.wmnet', 'an-worker1101.eqiad.wmnet']

and were ALL successful.

Change 668106 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_cluster::hadoop::worker: set linux 5.10 on GPU workers

https://gerrit.wikimedia.org/r/668106

Change 668106 merged by Elukey:
[operations/puppet@production] role::analytics_cluster::hadoop::worker: set linux 5.10 on GPU workers

https://gerrit.wikimedia.org/r/668106

Change 668337 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_cluster::hadoop::worker: add gpu-users

https://gerrit.wikimedia.org/r/668337

Change 668337 merged by Elukey:
[operations/puppet@production] role::analytics_cluster::hadoop::worker: add gpu-users

https://gerrit.wikimedia.org/r/668337

elukey@cumin1001:~$ sudo cumin 'A:hadoop-worker' 'cat /etc/debian_version'
78 hosts will be targeted:
an-worker[1078-1128,1130-1132,1135-1138].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                                    
(26) an-worker[1096-1101,1117-1128,1130-1132,1135-1138].eqiad.wmnet,analytics1058.eqiad.wmnet                             
----- OUTPUT of 'cat /etc/debian_version' -----                                                                           
10.8                                                                                                                      
===== NODE GROUP =====                                                                                                    
(52) an-worker[1078-1095,1102-1116].eqiad.wmnet,analytics[1059-1077].eqiad.wmnet                                          
----- OUTPUT of 'cat /etc/debian_version' -----                                                                           
9.13

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1059.eqiad.wmnet', 'analytics1060.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103041047_elukey_17965.log.

Completed auto-reimage of hosts:

['analytics1060.eqiad.wmnet', 'analytics1059.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1061.eqiad.wmnet', 'analytics1062.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103041254_elukey_8281.log.

Completed auto-reimage of hosts:

['analytics1061.eqiad.wmnet', 'analytics1062.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1063.eqiad.wmnet', 'analytics1064.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103041339_elukey_21178.log.

Completed auto-reimage of hosts:

['analytics1063.eqiad.wmnet', 'analytics1064.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1065.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103041426_elukey_4245.log.

Completed auto-reimage of hosts:

['analytics1065.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1066.eqiad.wmnet', 'analytics1067.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103041511_elukey_15956.log.

Completed auto-reimage of hosts:

['analytics1066.eqiad.wmnet', 'analytics1067.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1068.eqiad.wmnet', 'analytics1069.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103041640_elukey_17903.log.

Completed auto-reimage of hosts:

['analytics1068.eqiad.wmnet', 'analytics1069.eqiad.wmnet']

and were ALL successful.

elukey@cumin1001:~$ sudo cumin 'A:hadoop-worker' 'cat /etc/debian_version'
78 hosts will be targeted:
an-worker[1078-1128,1130-1132,1135-1138].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                                        
(37) an-worker[1096-1101,1117-1128,1130-1132,1135-1138].eqiad.wmnet,analytics[1058-1069].eqiad.wmnet                          
----- OUTPUT of 'cat /etc/debian_version' -----                                                                               
10.8                                                                                                                          
===== NODE GROUP =====                                                                                                        
(41) an-worker[1078-1095,1102-1116].eqiad.wmnet,analytics[1070-1077].eqiad.wmnet                                              
----- OUTPUT of 'cat /etc/debian_version' -----                                                                               
9.13

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1070.eqiad.wmnet', 'analytics1071.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103050745_elukey_17225.log.

Completed auto-reimage of hosts:

['analytics1071.eqiad.wmnet', 'analytics1070.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1078.eqiad.wmnet', 'an-worker1079.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103050839_elukey_29542.log.

Completed auto-reimage of hosts:

['an-worker1079.eqiad.wmnet', 'an-worker1078.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1073.eqiad.wmnet', 'an-worker1086.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103051518_elukey_9071.log.

Completed auto-reimage of hosts:

['analytics1073.eqiad.wmnet', 'an-worker1086.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1074.eqiad.wmnet', 'analytics1075.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103080731_elukey_14418.log.

Remaining nodes to reimage and their racking:

an-worker1080.eqiad.wmnet:  /eqiad/A/4

an-worker1081.eqiad.wmnet:  /eqiad/A/7
an-worker1082.eqiad.wmnet:  /eqiad/A/7
an-worker1103.eqiad.wmnet:  /eqiad/A/7

an-worker1083.eqiad.wmnet:  /eqiad/B/2
an-worker1084.eqiad.wmnet:  /eqiad/B/2

analytics1072.eqiad.wmnet:  /eqiad/B/3

an-worker1085.eqiad.wmnet:  /eqiad/B/4

an-worker1087.eqiad.wmnet:  /eqiad/B/7

an-worker1088.eqiad.wmnet:  /eqiad/C/2
an-worker1104.eqiad.wmnet:  /eqiad/C/2

an-worker1089.eqiad.wmnet:  /eqiad/C/4
an-worker1090.eqiad.wmnet:  /eqiad/C/4
an-worker1105.eqiad.wmnet:  /eqiad/C/4
an-worker1106.eqiad.wmnet:  /eqiad/C/4
an-worker1107.eqiad.wmnet:  /eqiad/C/4
an-worker1108.eqiad.wmnet:  /eqiad/C/4

an-worker1091.eqiad.wmnet:  /eqiad/C/7
an-worker1109.eqiad.wmnet:  /eqiad/C/7
an-worker1110.eqiad.wmnet:  /eqiad/C/7

an-worker1111.eqiad.wmnet:  /eqiad/C/8

analytics1076.eqiad.wmnet:  /eqiad/D/2
an-worker1092.eqiad.wmnet:  /eqiad/D/2
an-worker1093.eqiad.wmnet:  /eqiad/D/2
an-worker1112.eqiad.wmnet:  /eqiad/D/2

an-worker1113.eqiad.wmnet:  /eqiad/D/5
an-worker1114.eqiad.wmnet:  /eqiad/D/5

analytics1077.eqiad.wmnet:  /eqiad/D/7
an-worker1094.eqiad.wmnet:  /eqiad/D/7
an-worker1095.eqiad.wmnet:  /eqiad/D/7
an-worker1115.eqiad.wmnet:  /eqiad/D/7
an-worker1116.eqiad.wmnet:  /eqiad/D/7

Completed auto-reimage of hosts:

['analytics1074.eqiad.wmnet', 'analytics1075.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1081.eqiad.wmnet', 'an-worker1082.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103080827_elukey_28297.log.

Completed auto-reimage of hosts:

['an-worker1081.eqiad.wmnet', 'an-worker1082.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1083.eqiad.wmnet', 'an-worker1084.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103080928_elukey_9830.log.

Completed auto-reimage of hosts:

['an-worker1083.eqiad.wmnet', 'an-worker1084.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1088.eqiad.wmnet', 'an-worker1104.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103081101_elukey_17596.log.

Completed auto-reimage of hosts:

['an-worker1088.eqiad.wmnet']

Of which those FAILED:

['an-worker1104.eqiad.wmnet']

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1105.eqiad.wmnet', 'an-worker1106.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103081358_elukey_20294.log.

Completed auto-reimage of hosts:

['an-worker1105.eqiad.wmnet', 'an-worker1106.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1107.eqiad.wmnet', 'an-worker1108.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103081459_elukey_13319.log.

Completed auto-reimage of hosts:

['an-worker1107.eqiad.wmnet', 'an-worker1108.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1109.eqiad.wmnet', 'an-worker1110.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103081624_elukey_30551.log.

Completed auto-reimage of hosts:

['an-worker1109.eqiad.wmnet', 'an-worker1110.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1113.eqiad.wmnet', 'an-worker1114.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103081720_elukey_20752.log.

Completed auto-reimage of hosts:

['an-worker1113.eqiad.wmnet', 'an-worker1114.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1115.eqiad.wmnet', 'an-worker1116.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103081817_elukey_11088.log.

Completed auto-reimage of hosts:

['an-worker1115.eqiad.wmnet', 'an-worker1116.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1094.eqiad.wmnet', 'an-worker1095.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103090710_elukey_27457.log.

Completed auto-reimage of hosts:

['an-worker1094.eqiad.wmnet', 'an-worker1095.eqiad.wmnet']

and were ALL successful.

Remaining nodes:

an-worker[1080,1085,1087,1089-1093,1102-1103,1111-1112].eqiad.wmnet,analytics[1072,1076-1077].eqiad.wmnet

an-worker1080.eqiad.wmnet:  /eqiad/A/4   JN
an-worker1102.eqiad.wmnet:  /eqiad/A/4

an-worker1103.eqiad.wmnet:  /eqiad/A/7

analytics1072.eqiad.wmnet:  /eqiad/B/3      JN

an-worker1085.eqiad.wmnet:  /eqiad/B/4

an-worker1087.eqiad.wmnet:  /eqiad/B/7

an-worker1089.eqiad.wmnet:  /eqiad/C/4
an-worker1090.eqiad.wmnet:  /eqiad/C/4     JN

an-worker1091.eqiad.wmnet:  /eqiad/C/7

an-worker1111.eqiad.wmnet:  /eqiad/C/8

analytics1076.eqiad.wmnet:  /eqiad/D/2
an-worker1092.eqiad.wmnet:  /eqiad/D/2
an-worker1093.eqiad.wmnet:  /eqiad/D/2
an-worker1112.eqiad.wmnet:  /eqiad/D/2

analytics1077.eqiad.wmnet:  /eqiad/D/7

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1092.eqiad.wmnet', 'an-worker1093.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103090829_elukey_11602.log.

Completed auto-reimage of hosts:

['an-worker1092.eqiad.wmnet', 'an-worker1093.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1076.eqiad.wmnet', 'an-worker1112.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103090923_elukey_24862.log.

Completed auto-reimage of hosts:

['analytics1076.eqiad.wmnet', 'an-worker1112.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1103.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091251_elukey_18026.log.

Completed auto-reimage of hosts:

['an-worker1103.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1102.eqiad.wmnet', 'an-worker1080.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091334_elukey_29991.log.

Completed auto-reimage of hosts:

['an-worker1102.eqiad.wmnet', 'an-worker1080.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1090.eqiad.wmnet', 'an-worker1089.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091428_elukey_18561.log.

Completed auto-reimage of hosts:

['an-worker1090.eqiad.wmnet', 'an-worker1089.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1072.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091530_elukey_18926.log.

Completed auto-reimage of hosts:

['analytics1072.eqiad.wmnet']

and were ALL successful.

(5) an-worker[1085,1087,1091,1111].eqiad.wmnet,analytics1077.eqiad.wmnet                                                          
----- OUTPUT of 'cat /etc/debian_version' -----                                                                                   
9.13

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1077.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091646_elukey_15244.log.

Completed auto-reimage of hosts:

['analytics1077.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1085.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091739_elukey_26525.log.

Completed auto-reimage of hosts:

['an-worker1085.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1087.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103091833_elukey_3899.log.

Completed auto-reimage of hosts:

['an-worker1087.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1091.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103092008_elukey_23450.log.

Completed auto-reimage of hosts:

['an-worker1091.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['an-worker1111.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202103100627_elukey_2055.log.

Completed auto-reimage of hosts:

['an-worker1111.eqiad.wmnet']

and were ALL successful.

elukey@cumin1001:~$ sudo cumin 'A:hadoop-worker' 'cat /etc/debian_version'
78 hosts will be targeted:
an-worker[1078-1128,1130-1132,1135-1138].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet
Confirm to continue [y/n]? y
===== NODE GROUP =====                                                                                             
(78) an-worker[1078-1128,1130-1132,1135-1138].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet                         
----- OUTPUT of 'cat /etc/debian_version' -----                                                                    
10.8

Next steps:

Complete the hadoop test upgrade (one worker remaining + masters)
Upgrade furud/flerovium
Upgrade hadoop masters
Upgrade hadoop coordinators (complicated, requires a failover)

I am going to open subtasks for all the above.

elukey mentioned this in T278421: Upgrade furud/flerovium to Debian Buster.Mar 25 2021, 10:53 AM

@razzi, FYI in ops sync today we decided that you could drive a few of these upgrade tasks in Q4, while Luca would drive the Hadoop coordinator node one. I've assigned them accordingly. :)

• fdans closed subtask T278422: Upgrade the rest of the Hadoop test cluster to Buster as Resolved.Apr 8 2021, 4:23 PM

elukey mentioned this in T281316: WDCM_Sqoop_Clients.R fails from stat1004 (again).Apr 28 2021, 6:02 AM

GoranSMilovanovic mentioned this in T281063: Wikidata Concepts Monitor: some datasets are empty.Apr 28 2021, 6:43 AM

• fdans closed subtask T278424: Upgrade the Hadoop coordinators to Debian Buster as Resolved.Apr 29 2021, 2:56 PM

• fdans closed subtask T278421: Upgrade furud/flerovium to Debian Buster as Resolved.

elukey mentioned this in T278423: Upgrade the Hadoop masters to Debian Buster.May 5 2021, 8:55 AM

Ottomata mentioned this in T284225: Create airflow instances for Platform Engineering and Research.Jun 3 2021, 2:21 PM

Ottomata moved this task from Q4 2020/2021 to Done on the Analytics-Clusters board.Jul 26 2021, 3:39 PM

• razzi closed this task as Resolved.Jul 28 2021, 6:23 PM

• razzi claimed this task.

• razzi closed subtask T278423: Upgrade the Hadoop masters to Debian Buster as Resolved.

Maintenance_bot removed a project: Patch-For-Review.Jul 28 2021, 7:10 PM