Page MenuHomePhabricator

Refresh 16 nodes in the Hadoop Analytics cluster
Closed, ResolvedPublic

Description

This task is blocked until the related rack/setup/deploy one is completed.

Nodes to refresh: analytics10[42-57]

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+5 -93
operations/puppetproduction+9 -5
operations/puppetproduction+6 -2
operations/puppetproduction+1 -1
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+9 -5
operations/puppetproduction+10 -10
operations/puppetproduction+14 -2
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+1 -5
operations/puppetproduction+2 -2
operations/puppetproduction+9 -1
operations/puppetproduction+21 -0
Show related patches Customize query in gerrit

Event Timeline

Change 630991 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Update rack settings for new Analytics Hadoop nodes in hiera

https://gerrit.wikimedia.org/r/630991

Change 630991 merged by Elukey:
[operations/puppet@production] Update rack settings for new Analytics Hadoop nodes in hiera

https://gerrit.wikimedia.org/r/630991

elukey triaged this task as High priority.
elukey added a project: Analytics-Kanban.

Change 631391 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add hadoop worker node role to an-worker1103

https://gerrit.wikimedia.org/r/631391

Change 631391 merged by Elukey:
[operations/puppet@production] Add hadoop worker node role to an-worker1103

https://gerrit.wikimedia.org/r/631391

Change 631434 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set an-worker110[45] as Hadoop workers

https://gerrit.wikimedia.org/r/631434

Change 631434 merged by Elukey:
[operations/puppet@production] Set an-worker110[45] as Hadoop workers

https://gerrit.wikimedia.org/r/631434

The plan is to add the 16 new nodes (expanding the cluster) progressively, and then remove the 16 old ones (shrinking the cluster) later on.

Change 631764 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set an-worker110[6-9] as Hadoop workers

https://gerrit.wikimedia.org/r/631764

Change 631764 merged by Elukey:
[operations/puppet@production] Set an-worker110[6-9] as Hadoop workers

https://gerrit.wikimedia.org/r/631764

Change 632202 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set an-worker111[02] as Hadoop workers

https://gerrit.wikimedia.org/r/632202

Change 632202 merged by Elukey:
[operations/puppet@production] Set an-worker111[02] as Hadoop workers

https://gerrit.wikimedia.org/r/632202

Change 632294 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set an-worker111[5-7] as Hadoop workers

https://gerrit.wikimedia.org/r/632294

Change 632294 merged by Elukey:
[operations/puppet@production] Set an-worker111[5-7] as Hadoop workers

https://gerrit.wikimedia.org/r/632294

All nodes are now in Hadoop, just closed the rack/setup/deploy task. I am going to update the docs on adding worker nodes, they probably need a refresh.

Next step is to remove analytics1042->57 in small steps.

During the first puppet run, datanode and nodemanager fail for different reasons:

2020-10-06 09:17:19,422 FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize spark_shuffle
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.spark.network.yarn.YarnShuffleService not found
2020-10-06 09:17:10,065 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.FileNotFoundException: /etc/hadoop/conf.analytics-hadoop/ssl/server.p12 (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)

Seems to be two missing require/dependency in puppet :)

Change 632650 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1042 from Analytics Hadoop

https://gerrit.wikimedia.org/r/632650

Change 632653 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::master::standby: improve hiera lookups

https://gerrit.wikimedia.org/r/632653

Change 632650 merged by Elukey:
[operations/puppet@production] Decommission analytics1042 from Analytics Hadoop

https://gerrit.wikimedia.org/r/632650

Change 632653 merged by Elukey:
[operations/puppet@production] profile::hadoop::master::standby: improve hiera lookups

https://gerrit.wikimedia.org/r/632653

Change 633140 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1044 from Hadoop

https://gerrit.wikimedia.org/r/633140

Change 633140 merged by Elukey:
[operations/puppet@production] Decommission analytics1044 from Hadoop

https://gerrit.wikimedia.org/r/633140

Change 633296 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove analytics1045 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633296

Change 633296 merged by Elukey:
[operations/puppet@production] Remove analytics1045 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633296

Change 633350 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1046 from Hadoop

https://gerrit.wikimedia.org/r/633350

Change 633350 merged by Elukey:
[operations/puppet@production] Decommission analytics1046 from Hadoop

https://gerrit.wikimedia.org/r/633350

Change 633385 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decom analytics1047 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633385

Change 633385 merged by Elukey:
[operations/puppet@production] Decom analytics1047 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633385

Change 633605 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove analytics1048 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633605

Change 633605 merged by Elukey:
[operations/puppet@production] Remove analytics1048 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633605

Change 633864 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1049 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633864

Change 633864 merged by Elukey:
[operations/puppet@production] Decommission analytics1049 from the Hadoop cluster

https://gerrit.wikimedia.org/r/633864

Change 634145 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1050 from the Hadoop cluster

https://gerrit.wikimedia.org/r/634145

Change 634145 merged by Elukey:
[operations/puppet@production] Decommission analytics1050 from the Hadoop cluster

https://gerrit.wikimedia.org/r/634145

Change 634474 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1051 from the Hadoop cluster

https://gerrit.wikimedia.org/r/634474

Change 634474 merged by Elukey:
[operations/puppet@production] Decommission analytics1051 from the Hadoop cluster

https://gerrit.wikimedia.org/r/634474

Change 634673 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1053 from the Hadoop cluster

https://gerrit.wikimedia.org/r/634673

Change 634673 merged by Elukey:
[operations/puppet@production] Decommission analytics1053 from the Hadoop cluster

https://gerrit.wikimedia.org/r/634673

Change 634766 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decom analytics1054 from Hadoop

https://gerrit.wikimedia.org/r/634766

Change 634766 merged by Elukey:
[operations/puppet@production] Decom analytics1054 from Hadoop

https://gerrit.wikimedia.org/r/634766

Change 634905 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove analytics1055 from the hadoop cluster

https://gerrit.wikimedia.org/r/634905

Change 634905 merged by Elukey:
[operations/puppet@production] Remove analytics1055 from the hadoop cluster

https://gerrit.wikimedia.org/r/634905

Change 635235 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1056 from the Hadoop cluster

https://gerrit.wikimedia.org/r/635235

Change 635235 merged by Elukey:
[operations/puppet@production] Decommission analytics1056 from the Hadoop cluster

https://gerrit.wikimedia.org/r/635235

Change 635507 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove analytics1052 from Hadoop HDFS Journal nodes

https://gerrit.wikimedia.org/r/635507

Change 635507 merged by Elukey:
[operations/puppet@production] Remove analytics1052 from Hadoop HDFS Journal nodes

https://gerrit.wikimedia.org/r/635507

Change 635521 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decommission analytics1052 from the Hadoop cluster

https://gerrit.wikimedia.org/r/635521

Change 635521 merged by Elukey:
[operations/puppet@production] Decommission analytics1052 from the Hadoop cluster

https://gerrit.wikimedia.org/r/635521

Change 635742 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Decom analytics1057 from the Hadoop cluster

https://gerrit.wikimedia.org/r/635742

Change 635742 merged by Elukey:
[operations/puppet@production] Decom analytics1057 from the Hadoop cluster

https://gerrit.wikimedia.org/r/635742

Change 635750 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: final clean up after the decommission of old nodes

https://gerrit.wikimedia.org/r/635750

Change 635750 merged by Elukey:
[operations/puppet@production] hadoop: final clean up after the decommission of old nodes

https://gerrit.wikimedia.org/r/635750

All old nodes with removed from Hadoop!