Page MenuHomePhabricator

Decommission analytics-tool1001 and all the CDH leftovers
Closed, ResolvedPublic

Description

Now that an-tool1009 serves hue-next.wikimedia.org we should:

Event Timeline

Change 679847 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/dns@master] Decommission hue-next.wikimedia.org

https://gerrit.wikimedia.org/r/679847

Change 679847 merged by Elukey:

[operations/dns@master] Decommission hue-next.wikimedia.org

https://gerrit.wikimedia.org/r/679847

Change 679892 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Decommission analytics-tool1001

https://gerrit.wikimedia.org/r/679892

Change 679892 merged by Elukey:

[operations/puppet@production] Decommission analytics-tool1001

https://gerrit.wikimedia.org/r/679892

cookbooks.sre.hosts.decommission executed by elukey@cumin1001 for hosts: analytics-tool1001.eqiad.wmnet

  • analytics-tool1001.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox

Removed also the hue keytab from krb1001 and puppetmaster1001

Change 680167 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Remove cloudera-related packages

https://gerrit.wikimedia.org/r/680167

Change 680167 merged by Elukey:

[operations/puppet@production] Remove cloudera-related packages

https://gerrit.wikimedia.org/r/680167

root@apt1001:/srv/wikimedia# reprepro --delete clearvanished
Deleting vanished identifier 'buster-wikimedia|component/cloudera|amd64'.
Deleting vanished identifier 'buster-wikimedia|thirdparty/cloudera|amd64'.
Deleting vanished identifier 'buster-wikimedia|thirdparty/cloudera|i386'.
Deleting vanished identifier 'buster-wikimedia|thirdparty/cloudera|source'.
Deleting vanished identifier 'jessie-wikimedia|thirdparty/cloudera|amd64'.
Deleting vanished identifier 'jessie-wikimedia|thirdparty/cloudera|source'.
Deleting vanished identifier 'stretch-wikimedia|thirdparty/cloudera|amd64'.
Deleting vanished identifier 'stretch-wikimedia|thirdparty/cloudera|i386'.
Deleting vanished identifier 'stretch-wikimedia|thirdparty/cloudera|source'.
Deleting files no longer referenced...

Last step is to decide if we want to keep hue_next as database name for hue, or to rename it. The current database hue should be dropped from all db hosts and also from the backup config.

Change 683786 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::analytics_cluster::hadoop::ui: use the 'hue' db

https://gerrit.wikimedia.org/r/683786

Plan is:

  • downtime + disable-puppet + stop hue on an-tool1009
  • merge https://gerrit.wikimedia.org/r/683786
  • sudo mysqldump hue > hue_30042021.sql on an-coord1001
  • sudo mysqldump hue_next > hue_next_30042021.sql on an-coord1001
  • on an-coord1001's mysql:
    • DROP DATABASE hue
    • CREATE DATABASE hue DEFAULT CHARACTER SET utf8
  • sudo mysql hue < hue_next_30042021.sql on an-coord1001
  • check database, check replication on an-coord1002/db1108
  • run puppet on an-tool1009 and check that hue works as expected

Change 683786 merged by Elukey:

[operations/puppet@production] role::analytics_cluster::hadoop::ui: use the 'hue' db

https://gerrit.wikimedia.org/r/683786

Everything looks good! Also dropped the hue_next database so it is less confusing when inspecting what we run on the various db nodes (basically we now have only the hue database).

This will also work with the regular backup from data persistence, since they get the database hue from the meta instance on db1108.

I think that the cleanup is finally done!