⚓ T184832 Decommission labsdb1001 and labsdb1003

Subject	Repo	Branch	Lines +/-
Removing mgmt dns for decom host labsdb1001-3	operations/dns	master	+0 -8
mariadb: remove labsdb1001 & labsdb1003 special behavior	operations/puppet	production	+3 -12
toolschecker: remove labsdb1001 and labsdb1003	operations/puppet	production	+0 -32
prometheus: remove labsdb1001 and labsdb1003	operations/puppet	production	+0 -2
decom of labsdb100[13] production dns	operations/dns	master	+2 -4
decom of labsdb100[13]	operations/puppet	production	+0 -19
mariadb: Remove references to labsdb1001 and labsdb1003	operations/software	master	+0 -14
mariadb: Set as spares labsdb1001 and labsdb1003	operations/puppet	production	+5 -192

Status	Assigned	Task
Resolved	jcrespo	T140788 Labs databases rearchitecture (tracking)
Resolved	bd808	T166402 Program 7 Outcome 3: data services
Resolved	bd808	T142807 Migrate all users to new Wiki Replica cluster and decommission old hardware
Resolved	wiki_willy	T128821 reclaim and return all cisco servers
Resolved	• Cmjohnson	T184832 Decommission labsdb1001 and labsdb1003

bd808 created this task.Jan 12 2018, 9:59 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 12 2018, 9:59 PM

bd808 added parent tasks: T142807: Migrate all users to new Wiki Replica cluster and decommission old hardware, T128821: reclaim and return all cisco servers.Jan 12 2018, 10:00 PM

bd808 updated the task description. (Show Details)

• chasemp triaged this task as Medium priority.Jan 12 2018, 10:21 PM

• chasemp added projects: SRE, DC-Ops.

• Marostegui edited projects, added ops-eqiad, hardware-requests; removed DC-Ops.Jan 13 2018, 6:17 AM

• Marostegui subscribed.Jan 13 2018, 6:21 AM

• Marostegui mentioned this in T142807: Migrate all users to new Wiki Replica cluster and decommission old hardware.Jan 15 2018, 11:55 AM

Change 404323 had a related patch set uploaded (by BryanDavis; owner: Jcrespo):
[operations/puppet@production] mariadb: Set as spares labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/404323

gerritbot added a project: Patch-For-Review.Jan 15 2018, 9:58 PM

• Marostegui updated the task description. (Show Details)Jan 17 2018, 6:39 AM

Mentioned in SAL (#wikimedia-operations) [2018-01-17T06:40:21Z] <marostegui> Stop MySQL on labsdb1001 (already dead) and labsdb1003 - T184832

• Marostegui updated the task description. (Show Details)Jan 17 2018, 6:43 AM

Change 404323 merged by Marostegui:
[operations/puppet@production] mariadb: Set as spares labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/404323

• Marostegui updated the task description. (Show Details)Jan 17 2018, 6:46 AM

Mentioned in SAL (#wikimedia-operations) [2018-01-17T06:47:17Z] <marostegui> Remove labsdb1001 and labsdb1003 from tendril - T184832

I believe this is now ready for @Cmjohnson to proceed.

thanks @Marostegui

jcrespo mentioned this in T183029: Stop managing account creation for labsdb1001 and 1003 through the maintain-dbusers script.Jan 19 2018, 10:59 AM

Change 405275 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] mariadb: Remove references to labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/405275

Change 405275 merged by Jcrespo:
[operations/software@master] mariadb: Remove references to labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/405275

bd808 moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.Jan 21 2018, 8:03 PM

• Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Jan 26 2018, 7:10 PM

• Marostegui mentioned this in T128821: reclaim and return all cisco servers.Jan 31 2018, 12:15 PM

In T184832#3905347, @Marostegui wrote:

I believe this is now ready for @Cmjohnson to proceed.

Ideally, they should come to me until they are ready for the on-site steps. I didn't notice this until now, stealing. (Chris can totally do these, but he typically has on-site specific stuff in his queue, so I try to do the remote accessible decom steps for all sites when possible.)

Once I finish all the remote capable steps, I'll assign it over to Chris.

RobH updated the task description. (Show Details)Feb 5 2018, 10:44 PM

So while these appear to not be in use, the puppet repo was NOT cleared of references before escalation to DC ops, as the lifecycle steps state it should be.

modules/mariadb/files/check_mariadb.py: if host.startswith('labsdb1001') or host.startswith('labsdb1003'):
modules/role/files/mariadb/check_private_data_report:if [ "$HOSTNAME" == "labsdb1001" ] || [ "$HOSTNAME" == "labsdb1003" ]
modules/role/files/prometheus/mysql-labs_eqiad.yaml: - labsdb1001:9104
modules/toollabs/files/toolschecker.py:@check('/labsdb/labsdb1001')
modules/toollabs/files/toolschecker.py:def labsdb_check_labsdb1001():
modules/toollabs/files/toolschecker.py: return db_query_check('labsdb1001.eqiad.wmnet')
modules/toollabs/files/toolschecker.py:@check('/labsdb/labsdb1001rw')
modules/toollabs/files/toolschecker.py:def labsdb_check_labsdb1001rw():
modules/toollabs/files/toolschecker.py: return db_read_write_check('labsdb1001.eqiad.wmnet', 's52524__rwtest')
modules/toollabs/manifests/checker.pp: 'labsdb_labsdb1001' => {
modules/toollabs/manifests/checker.pp: path => '/labsdb/labsdb1001',
modules/toollabs/manifests/checker.pp: 'labsdb_labsdb1001rw' => {
modules/toollabs/manifests/checker.pp: path => '/labsdb/labsdb1001rw',

A quick grep of the repo shows the above for labsdb1001. I assume that since @Marostegui was decommissioning the host before esclation to dc-ops, he is likely the person knowlegable on what these refernces need to change to? (Or perhaps @bd808.)

I'll continue with the decom, but that step was skipped and those items should be cleaned up so they don't just add to the cruft in the repo.

Change 408446 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom of labsdb100[13]

https://gerrit.wikimedia.org/r/408446

Change 408448 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom of labsdb100[13] production dns

https://gerrit.wikimedia.org/r/408448

Change 408446 merged by RobH:
[operations/puppet@production] decom of labsdb100[13]

https://gerrit.wikimedia.org/r/408446

Change 408448 merged by RobH:
[operations/dns@master] decom of labsdb100[13] production dns

https://gerrit.wikimedia.org/r/408448

Ok, now ready for onsite wipe. it looks like labsdb1003 may also have a disk shelf, please ensure all disks are wiped.

Change 408469 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] mariadb: remove labsdb1001 & labsdb1003 special behavior

https://gerrit.wikimedia.org/r/408469

Change 408470 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolschecker: remove labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/408470

Change 408471 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] prometheus: remove labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/408471

Change 408471 merged by Madhuvishy:
[operations/puppet@production] prometheus: remove labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/408471

Change 408470 merged by Madhuvishy:
[operations/puppet@production] toolschecker: remove labsdb1001 and labsdb1003

https://gerrit.wikimedia.org/r/408470

In T184832#3947771, @RobH wrote:

A quick grep of the repo shows the above for labsdb1001. I assume that since @Marostegui was decommissioning the host before esclation to dc-ops, he is likely the person knowlegable on what these refernces need to change to? (Or perhaps @bd808.)

I'll continue with the decom, but that step was skipped and those items should be cleaned up so they don't just add to the cruft in the repo.

Thanks for taking care of those referenced Rob.
I thought that was all done by the cloud-services-team. From the start it was a bit unclear who was responsible for decommissioning these hosts (or at least it was unclear to me).
DBAs did some of the stuff we normally do for decommissioning databases, but I assumed the rest was done by cloud-services-team so there was clearly a misunderstanding there.
Thanks again for taking care of all those pending things.

No worries, I just didn't want to remove all the old references directly. I wasn't sure which needed removal, and which required update to new hosts. BD went ahead and pulled it out though, so all good!

RobH moved this task from Backlog to Reclaim (Spares/Decommission) on the hardware-requests board.Feb 9 2018, 10:27 PM

Change 408469 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] mariadb: remove labsdb1001 & labsdb1003 special behavior

https://gerrit.wikimedia.org/r/408469

RobH edited projects, added decommission-hardware; removed Patch-For-Review, hardware-requests.May 3 2018, 11:00 PM

• Marostegui moved this task from Backlog to pending onsite steps (eqiad) on the decommission-hardware board.Jun 14 2018, 6:26 AM

BTW, I can still see on racktables a labsdb1002-array1- not sure if a mistake on the application or it really is still there on reality, but that should be removed too (along with labsdb1001/3-array).

As per the steps completed above, looks like 1001 and 1003 are down but not unracked.

• Cmjohnson moved this task from Decommission to UnRacking Tasks on the ops-eqiad board.Aug 1 2018, 2:25 PM

Change 453152 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing mgmt dns for decom host labsdb1001-3

https://gerrit.wikimedia.org/r/453152

Change 453152 merged by Cmjohnson:
[operations/dns@master] Removing mgmt dns for decom host labsdb1001-3

https://gerrit.wikimedia.org/r/453152

• Cmjohnson updated the task description. (Show Details)Aug 16 2018, 3:51 PM

• Cmjohnson closed this task as Resolved.Aug 28 2018, 4:35 PM

• Cmjohnson updated the task description. (Show Details)

• Marostegui awarded a token.Aug 28 2018, 4:57 PM

zhuyifei1999 mentioned this in T213252: toolscheckerctl fails to stop/start checks.Jan 9 2019, 2:26 PM