Page MenuHomePhabricator

Reduce the number of appservers we're using in eqiad
Closed, ResolvedPublic

Description

We must decommission more of the older appservers for:

  • Free rack space in eqiad
  • Decommission hardware that is more than 5 years old
  • Test a number of cores/memory equal to what we have in codfw

Event Timeline

Joe claimed this task.
Joe raised the priority of this task from to Medium.
Joe updated the task description. (Show Details)
Joe added subscribers: jcrespo, gerritbot, Joe and 2 others.
Joe renamed this task from Reduce the number of appservers we're using in eqiad preparing for decommission to Reduce the number of appservers we're using in eqiad.Feb 8 2016, 5:54 PM
Joe set Security to None.
Joe added a subscriber: ori.

I depooled mw1025-1050 for now setting all of them to 'inactive'.

I'll wait tomorrow to merge the patches to make that definitive.

I just depooled mw1051-69 as well, the cluster still seems unimpressed...

Change 275374 had a related patch set uploaded (by Giuseppe Lavagetto):
appservers: decommission permanently mw1026-69

https://gerrit.wikimedia.org/r/275374

Change 275374 merged by Giuseppe Lavagetto:
appservers: decommission permanently mw1026-69

https://gerrit.wikimedia.org/r/275374

I have removed every reference to mw1026-1069 from puppet and conftool, and shut down the machines. I'' also opening a separated ticket for decommissioning

Change 275383 had a related patch set uploaded (by Giuseppe Lavagetto):
scap: remove decommissioned appservers from the scap dsh group

https://gerrit.wikimedia.org/r/275383

Change 275383 merged by Giuseppe Lavagetto:
scap: remove decommissioned appservers from the scap dsh group

https://gerrit.wikimedia.org/r/275383

Joe changed the task status from Open to Stalled.Mar 7 2016, 10:38 AM

I think we can reduce the pool size further, but it's already smaller than the current pool in codfw

Change 275756 had a related patch set uploaded (by Giuseppe Lavagetto):
Remove decommissioned appservers

https://gerrit.wikimedia.org/r/275756

Change 275756 abandoned by Giuseppe Lavagetto:
Remove decommissioned appservers

Reason:
Already done in I040c2e27b750ea1906b989b2380a10bbd23f7906

https://gerrit.wikimedia.org/r/275756

Joe changed the task status from Stalled to Open.Apr 1 2016, 6:41 AM

I will adjust the weights in various clusters, and start removing more servers today, up to the point where I don't feel comfortable removing more.

I want to have all the mw* clusters to an average utilization of around 20% at least.

Mentioned in SAL [2016-04-01T07:00:27Z] <_joe_> depooling mw1070-89 from the appserver cluster. T126242

I am waiting until we switch back mediawiki from codfw before I definitively decommission the last batch of appservers I removed.

Change 285604 had a related patch set uploaded (by Giuseppe Lavagetto):
mediawiki: remove decommissioned appservers

https://gerrit.wikimedia.org/r/285604

Change 285605 had a related patch set uploaded (by Giuseppe Lavagetto):
dhcp: remove entries for decommissioned appservers

https://gerrit.wikimedia.org/r/285605

Mentioned in SAL [2016-04-27T08:40:50Z] <_joe_> stopping puppet on mw10[7-8][0-9] and mw112[1-9]/mw1130 for T126242

Change 285604 merged by Giuseppe Lavagetto:
mediawiki: remove decommissioned appservers

https://gerrit.wikimedia.org/r/285604

Mentioned in SAL [2016-04-27T09:56:30Z] <_joe_> clean puppet certs and facts on mw10[7-8][0-9] and mw112[1-9]/mw1130 for T126242

Mentioned in SAL [2016-04-27T10:01:34Z] <_joe_> shutting down mw10[7-8][0-9] and mw112[1-9]/mw1130 for T126242

I think I can close this task as resolved, the subtasks aren't real blockers, more of "related tickets"

Change 285605 merged by Cmjohnson:
dhcp: remove entries for decommissioned appservers

https://gerrit.wikimedia.org/r/285605