⚓ T125827 Decom db2001-db2009

Subject	Repo	Branch	Lines +/-
DNS/Decom: Remove DNS entries for db200[1-9]	operations/dns	master	+1 -36
decom of db2001-db2009	operations/dns	master	+1 -18
decom of db2001-db2009	operations/puppet	production	+0 -57
Retire db2008 and db2009 as x1 nodes	operations/puppet	production	+1 -9
remove db2007 from site.pp, done with testing	operations/puppet	production	+0 -5
temp. setup to use db2007 for RT upgrade test	operations/puppet	production	+43 -0
Remove all mentions to db1027, db2008 and db2009 from mediawiki	operations/mediawiki-config	master	+0 -6
remove db200[1-7] from DHCP	operations/puppet	production	+0 -35

@RobH I need a second confirmation that these servers are "assigned to me" administratively, and not to fundrising or someone else (as sometimes they reuse the same name). I made a first check and they seem to be idle and not in use on my monitoring. In case that is true, I will start decomm'ing them and reusing their parts (please confirm that is also ok, that they are ours and not leases/donations.

jcrespo assigned this task to RobH.Mar 8 2016, 9:12 AM

@RobH @mark I think there is a mistake on the 5-year planing. I made a comment on the spreadsheet. Luckily, most of these do not need replacement.

Peachey88 subscribed.Mar 9 2016, 6:07 AM

In T125827#2098420, @jcrespo wrote:

@RobH @mark I think there is a mistake on the 5-year planing. I made a comment on the spreadsheet. Luckily, most of these do not need replacement.

Do you mean they don't need replacement because we already have enough capacity and don't need these old systems?

Comment on Sheet from @jcrespo:

Are you sure this data is right? It makes no sense we bought db2001 2 years after db2012, and papul told me they out of warranty.

The sheet had an incorrect date, and instead listed the hw warranty in the purchase date. It's been fixed. However, racktables has the correct info on these, linking to old purchase ticket https://rt.wikimedia.org/Ticket/Display.html?id=1600 (which was an order for 11 systems.)

db2001-2011 are old Poweredge R510s, originally purchased for Tampa. At the time of the migration, they were still under warranty, and thus were shipped to codfw.

As such, these are well past their warranty expiration.

I'm not sure who else would ever have db systems assigned to them, as far as I know these are for your use in the db cluster. If you don't have a use for them, I'd think we would decommission them at their age, but we would need to check with @mark.

Assigning back to Jaime for his input (I imagine he'll advise we decommission them.) If so, should we expand it to all the db R510s or just db2001-2008?

Do you mean they don't need replacement because we already have enough capacity and don't need these old systems?

Yes.

I would like to decommission the first 8 for parts (mainly disks)- their use has been shifted to newer machines.

I have in use:
db2009: x1 remote server (there are more replicas on the same datacenter)
db2010: m1 remote server
db2011: m2 remote server

for misc/non core servers. Being these 3 backup servers, and having parts available for them, I am not in a hurry to replace them (they do not really need performance, and we can use dbstore for them if it was needed.

It doesn't have to be all of them, just enough to avoid purchasing new disks. Having out-of warranty spare servers can be helpful for not critical missions. These 3 services will have to be replaced eventually (without hurry).

jcrespo moved this task from Triage to Backlog on the DBA board.Mar 10 2016, 7:35 PM

I'm editing this task because I'm taking db2008.codfw.wmnet back in usage for T130098 so it must not be decommissioned.

Volans renamed this task from Investigate/decom db2001-db2008 to Investigate/decom db2001-db2007.Mar 16 2016, 10:27 AM

Volans updated the task description. (Show Details)

jcrespo removed jcrespo as the assignee of this task.Mar 18 2016, 3:47 PM

Maybe edit site.pp so that the actually unused ones are removed from puppet but the one still used is still in it. Then there is less ambiguity and we can move forward with the decom by revoking puppet certs and salt keys and shutting them down to save energy.

Dzahn set Security to None.Mar 18 2016, 7:04 PM

Dzahn added a subscriber: Papaul.

Change 278338 had a related patch set uploaded (by Dzahn):
remove db200[1-7] from DHCP

https://gerrit.wikimedia.org/r/278338

gerritbot added a project: Patch-For-Review.Mar 18 2016, 7:11 PM

"the actually unused ones are removed from puppet"

It was like that, until Moritz readded a bunch to get security updates.

Please wait until I see the final destination of all of these.

Moritz said he is adding the updates because the servers are up. This might be a catch 22.

So what's the actual blocker? Is there really one since Moritz says he only added so they get updates?

I need to see the destination of the disks to have at least working complete servers before the failover.

Some es2 hosts and these have to be checked to try to solve codfw disk issues (I cannot remember the ticket numbers).

Got it, thanks for explaining.

Change 278338 abandoned by Dzahn:
remove db200[1-7] from DHCP

https://gerrit.wikimedia.org/r/278338

Andrew triaged this task as Medium priority.Apr 14 2016, 7:58 PM

Andrew removed a project: Patch-For-Review.

jcrespo renamed this task from Investigate/decom db2001-db2007 to Investigate/decom db2001-db2009.Apr 29 2016, 3:41 PM

jcrespo updated the task description. (Show Details)

db2008 and db2009 are in theory still in use, but ready to be decommed as they have been substituted by the larger db2033.

Change 286172 had a related patch set uploaded (by Jcrespo):
Retire db2008 and db2009 as x1 nodes

https://gerrit.wikimedia.org/r/286172

gerritbot added a project: Patch-For-Review.Apr 29 2016, 3:47 PM

Change 288945 had a related patch set uploaded (by Jcrespo):
Remove all mentions to db1027, db2008 and db2009 from mediawiki

https://gerrit.wikimedia.org/r/288945

Change 288945 merged by Jcrespo:
Remove all mentions to db1027, db2008 and db2009 from mediawiki

https://gerrit.wikimedia.org/r/288945

After we talked on IRC i am using db2007 to test upgrading RT (T119112) which involves a schema change. It's in Icinga as a host, but mariadb/mysql was already removed and this is not part of the cluster in any way. The mariadb-server that is currently on it is installed by me and i will kill it again as well.

Change 289725 had a related patch set uploaded (by Dzahn):
temp. setup to use db2007 for RT upgrade test

https://gerrit.wikimedia.org/r/289725

Change 289725 merged by Dzahn:
temp. setup to use db2007 for RT upgrade test

https://gerrit.wikimedia.org/r/289725

Dzahn mentioned this in rOPUP7fcd9539965d: temp. setup to use db2007 for RT upgrade test.May 19 2016, 8:21 PM

I noticed this ticket when checking for db servers without base::firewall enabled: Summarising:

db2008/db2009 were removed from mediawiki in https://gerrit.wikimedia.org/r/#/c/288945/, the change to remove them from site.pp is pending in https://gerrit.wikimedia.org/r/#/c/286172/
db2007 is currently used for tests by Daniel
db2006 is not present in site.pp, puppet or salt, but the box is currently still powered on. As such, it can probably simply be unracked and decomissioned.
db2001-db2005 are not present in site.pp, but are managed via puppet/salt. It's also listed in wmf-config

I would remove them all when Daniel finishes his work.

jcrespo updated the task description. (Show Details)Jun 2 2016, 10:08 AM

jcrespo removed a project: Patch-For-Review.

Change 292397 had a related patch set uploaded (by Dzahn):
remove db2007 from site.pp, done with testing

https://gerrit.wikimedia.org/r/292397

gerritbot added a project: Patch-For-Review.Jun 2 2016, 6:09 PM

Change 292397 merged by Dzahn:
remove db2007 from site.pp, done with testing

https://gerrit.wikimedia.org/r/292397

Dzahn mentioned this in rOPUPe38162aa400a: remove db2007 from site.pp, done with testing.Jun 2 2016, 6:16 PM

11:23 < mutante> !log db2007 shutdown, schedule eternal downtime
11:24 < mutante> !log db2007, revoke puppet cert, delete salt key, nuke from stored configs / icinga

Dzahn mentioned this in rOPUP000862876f1e: remove db2007 from site.pp, done with testing.Jun 17 2016, 6:06 PM

Dzahn mentioned this in rOPUPcb92bf7fad31: temp. setup to use db2007 for RT upgrade test.

Dzahn mentioned this in rOPUPfc4f0b4948c7: temp. setup to use db2007 for RT upgrade test.

jcrespo mentioned this in rOPUP2397e9b3ecf3: Retire db2008 and db2009 as x1 nodes.Jun 17 2016, 6:10 PM

jcrespo mentioned this in rOPUPc3aa66215a1e: Retire db2008 and db2009 as x1 nodes.

jcrespo mentioned this in rOPUP431f6f4cd1ba: Retire db2008 and db2009 as x1 nodes.Aug 11 2016, 2:09 PM

Change 286172 merged by Jcrespo:
Retire db2008 and db2009 as x1 nodes

https://gerrit.wikimedia.org/r/286172

jcrespo mentioned this in rOPUPa3c18a11f37d: Retire db2008 and db2009 as x1 nodes.Aug 11 2016, 2:14 PM

jcrespo renamed this task from Investigate/decom db2001-db2009 to Decom db2001-db2009.Aug 11 2016, 3:17 PM

jcrespo edited projects, added ops-codfw; removed Patch-For-Review.

jcrespo updated the task description. (Show Details)

Restricted Application added a subscriber: Southparkfan. · View Herald TranscriptAug 11 2016, 3:17 PM

jcrespo updated the task description. (Show Details)Aug 11 2016, 3:18 PM

@Papaul ,@RobH These servers are ready to go; icinga/puppet/salt-wiped. DNS and tftboot are still active.

I set on you the decision of its final destination.

I'm re-assigning this to @mark for his approval to decommission db2001-db2009. All 9 of these systems had their warranties expire on 2014-11-10. These are old Dell PowerEdge R510 systems, shipped from their initial use in our Tampa deployment.

All 9 of these systems are located in rack a6-codfw. This will free up 18U of space.

Please advise if we can decommission these entirely, or if we need to reclaim to spare, and assign back to me. I'll triage/next steps from there.

Thanks!

RobH added a project: hardware-requests.Aug 11 2016, 3:25 PM

RobH moved this task from Backlog to Reclaim (Spares/Decommission) on the hardware-requests board.

jcrespo mentioned this in T143846: unaccepted salt keys.Sep 6 2016, 10:41 AM

Approved.

jcrespo reassigned this task from mark to RobH.Feb 23 2017, 1:56 PM

RobH updated the task description. (Show Details)Mar 7 2017, 6:16 PM

Switch ports disabled, diff below since the port info will be needed once these systems are unracked.

[edit interfaces ge-6/0/0]

enable;

+ disable;
[edit interfaces ge-6/0/1]

enable;

+ disable;
[edit interfaces ge-6/0/2]

enable;

+ disable;
[edit interfaces ge-6/0/3]

enable;

+ disable;
[edit interfaces ge-6/0/4]

enable;

+ disable;
[edit interfaces ge-6/0/5]

enable;

+ disable;
[edit interfaces ge-6/0/6]

enable;

+ disable;
[edit interfaces ge-6/0/7]

enable;

+ disable;
[edit interfaces ge-6/0/8]

enable;

+ disable;

RobH updated the task description. (Show Details)Mar 7 2017, 6:25 PM

Change 341582 had a related patch set uploaded (by robh):
[operations/puppet] decom of db2001-db2009

https://gerrit.wikimedia.org/r/341582

gerritbot added a project: Patch-For-Review.Mar 7 2017, 6:33 PM

Change 341582 merged by RobH:
[operations/puppet] decom of db2001-db2009

https://gerrit.wikimedia.org/r/341582

Change 341585 had a related patch set uploaded (by robh):
[operations/dns] decom of db2001-db2009

https://gerrit.wikimedia.org/r/341585

Change 341585 merged by RobH:
[operations/dns] decom of db2001-db2009

https://gerrit.wikimedia.org/r/341585

Ok, this is now ready for on-site disk wipes of all the systems. Assigning to @Papaul for followup.

Disk wipe in progress

Papaul updated the task description. (Show Details)Mar 15 2017, 2:46 PM

Papaul updated the task description. (Show Details)Mar 15 2017, 3:22 PM

Change 342841 had a related patch set uploaded (by Papaul):
[operations/dns] DNS/Decom: Remove DNS entries for db200[1-9]

https://gerrit.wikimedia.org/r/342841

gerritbot added a project: Patch-For-Review.Mar 15 2017, 3:31 PM

Volans unsubscribed.Mar 15 2017, 3:31 PM

switch port information
All servers are in row A rack A6
db2001 ge-6/0/0
db2002 ge-6/0/1
db2003 ge-6/0/2
db2004 ge-6/0/3
db2005 ge-6/0/4
db2006 ge-6/0/5
db2007 ge-6/0/6
db2008 ge-6/0/7
db2009 ge-6/0/8

Papaul reassigned this task from Papaul to RobH.Mar 15 2017, 3:34 PM

robh@asw-a-codfw# show | compare
[edit interfaces ge-6/0/0]

description db2001;

[edit interfaces ge-6/0/1]

description db2002;

[edit interfaces ge-6/0/2]

description db2003;

[edit interfaces ge-6/0/3]

description db2004;

[edit interfaces ge-6/0/4]

description db2005;

[edit interfaces ge-6/0/5]

description db2006;

[edit interfaces ge-6/0/6]

description db2007;

[edit interfaces ge-6/0/7]

description db2008;

[edit interfaces ge-6/0/8]

description db2009;

{master:2}[edit]
robh@asw-a-codfw# commit comment T125827

switch port description removal done.

Change 342841 merged by RobH:
[operations/dns] DNS/Decom: Remove DNS entries for db200[1-9]

https://gerrit.wikimedia.org/r/342841

RobH closed this task as Resolved.Mar 15 2017, 8:13 PM

RobH updated the task description. (Show Details)

jcrespo mentioned this in T161712: codfw: (1) spare pool system for temp allocation as database failover.Mar 29 2017, 3:14 PM

Decom db2001-db2009
Closed, ResolvedPublic
Actions

Description

Details

Related Objects

Event Timeline

	jcrespo
	Feb 4 2016, 2:57 PM

Decom db2001-db2009Closed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Decom db2001-db2009
Closed, ResolvedPublic
Actions