Page MenuHomePhabricator

Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one.
Closed, ResolvedPublic13 Estimated Story Points

Description

Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. This cluster will be used to test Kerberos settings/configurations.

Proposed configuration for analytics1028->41:

  • analytics1028/9 -> masters
  • analytics1030 -> coordinator
  • analytics1039 -> client/UI (one broken disk among the datanode ones)
  • analytics10[31-38, 40,41] -> workers

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+6 -6
operations/puppetproduction+9 -9
operations/puppetproduction+1 -0
operations/puppetproduction+4 -0
operations/puppetproduction+6 -6
operations/puppetproduction+3 -0
operations/puppetproduction+0 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+2 -0
operations/puppetproduction+5 -0
operations/puppetproduction+201 -1
operations/puppetproduction+2 -2
operations/puppetproduction+14 -11
operations/puppet/cdhmaster+6 -1
operations/puppetproduction+0 -30
operations/puppetproduction+18 -0
operations/puppetproduction+4 -0
operations/puppetproduction+2 -2
operations/puppetproduction+1 -0
operations/puppetproduction+100 -7
operations/puppetproduction+426 -0
operations/puppetproduction+4 -0
operations/puppetproduction+6 -0
operations/puppetproduction+6 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 18 2018, 9:31 PM
Milimetric triaged this task as Medium priority.Jan 3 2019, 6:36 PM
Milimetric raised the priority of this task from Medium to High.
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

Change 482613 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hive::client: add jdbc parameters

https://gerrit.wikimedia.org/r/482613

Change 482613 merged by Elukey:
[operations/puppet@production] profile::hive::client: add jdbc parameters

https://gerrit.wikimedia.org/r/482613

Change 482617 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::oozie::server: add jdbc parameter

https://gerrit.wikimedia.org/r/482617

Change 482617 merged by Elukey:
[operations/puppet@production] profile::oozie::server: add jdbc parameter

https://gerrit.wikimedia.org/r/482617

Change 482618 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hue: allow password definition via hiera

https://gerrit.wikimedia.org/r/482618

Change 482618 merged by Elukey:
[operations/puppet@production] profile::hue: allow password definition via hiera

https://gerrit.wikimedia.org/r/482618

elukey claimed this task.Jan 7 2019, 11:27 AM
elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 482645 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Introduce role::analytics_test_cluster

https://gerrit.wikimedia.org/r/482645

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1028.eqiad.wmnet', 'analytics1029.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901140908_elukey_156893.log.

Completed auto-reimage of hosts:

['analytics1028.eqiad.wmnet', 'analytics1029.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1030.eqiad.wmnet', 'analytics1031.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901140946_elukey_167120.log.

Completed auto-reimage of hosts:

['analytics1031.eqiad.wmnet', 'analytics1030.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1032.eqiad.wmnet', 'analytics1033.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901141026_elukey_178718.log.

Completed auto-reimage of hosts:

['analytics1032.eqiad.wmnet']

Of which those FAILED:

['analytics1032.eqiad.wmnet']

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1032.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901141108_elukey_189559.log.

Completed auto-reimage of hosts:

['analytics1032.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1033.eqiad.wmnet', 'analytics1034.eqiad.wmnet', 'analytics1035.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901141358_elukey_227412.log.

Completed auto-reimage of hosts:

['analytics1033.eqiad.wmnet', 'analytics1034.eqiad.wmnet', 'analytics1035.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1036.eqiad.wmnet', 'analytics1037.eqiad.wmnet', 'analytics1038.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901141450_elukey_242764.log.

Completed auto-reimage of hosts:

['analytics1036.eqiad.wmnet', 'analytics1038.eqiad.wmnet', 'analytics1037.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1039.eqiad.wmnet', 'analytics1040.eqiad.wmnet', 'analytics1041.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901141528_elukey_254836.log.

Completed auto-reimage of hosts:

['analytics1039.eqiad.wmnet']

Of which those FAILED:

['analytics1039.eqiad.wmnet']

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1039.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901150744_elukey_206278.log.

Completed auto-reimage of hosts:

['analytics1039.eqiad.wmnet']

and were ALL successful.

elukey updated the task description. (Show Details)Jan 15 2019, 8:42 AM
elukey updated the task description. (Show Details)Jan 15 2019, 8:47 AM

Change 482645 merged by Elukey:
[operations/puppet@production] Introduce role::analytics_test_cluster

https://gerrit.wikimedia.org/r/482645

elukey updated the task description. (Show Details)Jan 15 2019, 9:24 AM

Change 484374 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Configure analytics1028->41 as Hadoop Analytics test cluster

https://gerrit.wikimedia.org/r/484374

Today I have made all the partitions on the old hosts (wiping the old content) so we are now ready to bootstrap the cluster merging https://gerrit.wikimedia.org/r/484374

Change 484374 merged by Elukey:
[operations/puppet@production] Configure analytics1028->41 as Hadoop Analytics test cluster

https://gerrit.wikimedia.org/r/484374

Change 485000 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop:monitoring::journalnode: use contain

https://gerrit.wikimedia.org/r/485000

Change 485000 abandoned by Elukey:
profile::hadoop:monitoring::journalnode: use contain

Reason:
Seems not working with defines

https://gerrit.wikimedia.org/r/485000

Change 485006 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Fix typos in Hadoop Test cluster configuration

https://gerrit.wikimedia.org/r/485006

Change 485006 merged by Elukey:
[operations/puppet@production] Fix typos in Hadoop Test cluster configuration

https://gerrit.wikimedia.org/r/485006

Change 485012 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] network::constants: add Hadoop testing masters

https://gerrit.wikimedia.org/r/485012

Change 485012 merged by Elukey:
[operations/puppet@production] network::constants: add Hadoop testing masters

https://gerrit.wikimedia.org/r/485012

Change 485036 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::prometheus::analytics: add Hadoop test cluster metrics

https://gerrit.wikimedia.org/r/485036

Change 485036 merged by Elukey:
[operations/puppet@production] role::prometheus::analytics: add Hadoop test cluster metrics

https://gerrit.wikimedia.org/r/485036

Change 485061 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove host specific overrides for analytics1029

https://gerrit.wikimedia.org/r/485061

Change 485061 merged by Elukey:
[operations/puppet@production] Remove host specific overrides for analytics1029

https://gerrit.wikimedia.org/r/485061

Change 485070 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet/cdh@master] check_hdfs_active_namenode: find cluster name in the config

https://gerrit.wikimedia.org/r/485070

Change 485070 merged by Elukey:
[operations/puppet/cdh@master] check_hdfs_active_namenode: find cluster name in the config

https://gerrit.wikimedia.org/r/485070

Change 485167 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Assign the Hadoop coordinator role to analytics1030

https://gerrit.wikimedia.org/r/485167

Change 485167 merged by Elukey:
[operations/puppet@production] Assign the Hadoop coordinator role to analytics1030

https://gerrit.wikimedia.org/r/485167

Side notes related to the first puppet run of a coordinator:

Change 485190 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Assign role::analytics_test_cluster::hadoop::ui to analytics1039

https://gerrit.wikimedia.org/r/485190

Change 485190 merged by Elukey:
[operations/puppet@production] Assign role::analytics_test_cluster::hadoop::ui to analytics1039

https://gerrit.wikimedia.org/r/485190

Side notes related to the Hadoop UI role:

  • libssl1.0.0's dummy package does not work properly, hue cannot start since it tries to read libssl1.0.0's shared lib when starting.
  • libmysqlclient18 needs to be uploaded to the CDH component

We could, in theory, get the Cloudera hue package, modify the dependencies and re-upload to override it. This would allow us to avoid workarounds.

This is done, the only thing missing is decide how camus should be run, but this is probably more suited for T212259. Going also to create separate tasks for the puppet issues found.

elukey set the point value for this task to 13.Jan 22 2019, 11:26 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.

Change 486030 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Create a testing Analytics Druid cluster

https://gerrit.wikimedia.org/r/486030

Change 486030 merged by Elukey:
[operations/puppet@production] Create a testing Analytics Druid cluster

https://gerrit.wikimedia.org/r/486030

Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts:

['analytics1041.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901230935_elukey_255883.log.

Change 486041 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add cluster config for druid_test_analytics

https://gerrit.wikimedia.org/r/486041

Change 486041 merged by Elukey:
[operations/puppet@production] Add cluster config for druid_test_analytics

https://gerrit.wikimedia.org/r/486041

Change 486050 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add druid_test_analytics_eqiad to monitoring_groups

https://gerrit.wikimedia.org/r/486050

Change 486050 merged by Elukey:
[operations/puppet@production] Add druid_test_analytics_eqiad to monitoring_groups

https://gerrit.wikimedia.org/r/486050

Change 486053 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::test_analytics::worker: use zookeeper 3.4.9-3+deb9u1

https://gerrit.wikimedia.org/r/486053

Change 486053 merged by Elukey:
[operations/puppet@production] role::druid::test_analytics::worker: use zookeeper 3.4.9-3+deb9u1

https://gerrit.wikimedia.org/r/486053

Completed auto-reimage of hosts:

['analytics1041.eqiad.wmnet']

and were ALL successful.

Change 486057 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::test_analytics::worker: fix druid cluster

https://gerrit.wikimedia.org/r/486057

Change 486057 merged by Elukey:
[operations/puppet@production] role::druid::test_analytics::worker: fix druid cluster

https://gerrit.wikimedia.org/r/486057

Change 486061 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::test_analytics::worker: disable zookeeper monitoring

https://gerrit.wikimedia.org/r/486061

Change 486061 merged by Elukey:
[operations/puppet@production] role::druid::test_analytics::worker: disable zookeeper monitoring

https://gerrit.wikimedia.org/r/486061

elukey moved this task from Backlog to Done on the User-Elukey board.Feb 4 2019, 2:19 PM

Change 488260 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::hadoop::worker: deploy analytics users

https://gerrit.wikimedia.org/r/488260

Change 488260 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::hadoop::worker: deploy analytics users

https://gerrit.wikimedia.org/r/488260

Change 488265 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Fix system::role for analytics_test_cluster's roles

https://gerrit.wikimedia.org/r/488265

Change 488265 merged by Elukey:
[operations/puppet@production] Fix system::role for analytics_test_cluster's roles

https://gerrit.wikimedia.org/r/488265

Change 488266 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::hadoop::worker: deploy analytics-search-users

https://gerrit.wikimedia.org/r/488266

Change 488266 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::hadoop::worker: deploy analytics-search-users

https://gerrit.wikimedia.org/r/488266

Change 488367 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::hadoop::worker: add hadoop users

https://gerrit.wikimedia.org/r/488367

Change 488367 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::hadoop::worker: add hadoop users

https://gerrit.wikimedia.org/r/488367

Change 488375 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::hadoop: remove ssh keys for analytics-search users

https://gerrit.wikimedia.org/r/488375

Change 488375 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::hadoop: remove ssh keys for analytics-search users

https://gerrit.wikimedia.org/r/488375

Change 488382 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_cluster::hadoop::master|standby: remove more ssh keys

https://gerrit.wikimedia.org/r/488382

Change 488382 merged by Elukey:
[operations/puppet@production] role::analytics_cluster::hadoop::master|standby: remove more ssh keys

https://gerrit.wikimedia.org/r/488382

Nuria closed this task as Resolved.Feb 14 2019, 5:09 AM