Page MenuHomePhabricator

eqiad: Physical moves for MediaWiki servers
Closed, ResolvedPublicRequest

Description

Hi @wkandek - in preparation for an upcoming install for some very heavy ms-be servers, @Cmjohnson is looking to free up some rack space at the bottom of our cabinets to install them. Would it be possible if you could designate someone from Service-Ops to work with Chris on having the following servers moved to a different rack location at eqiad?

  • mw1281
  • mw1282
  • mw1283
  • mw1313
  • mw1314
  • mw1315
  • mw1316
  • mw1317
  • mw1318
  • mw1267
  • mw1268

The initial timeframe that we're thinking is probably sometime during the first week of November., but that can be adjusted based on availability. Much appreciated in advance.

Thanks,
Willy

Event Timeline

wiki_willy added a parent task: Unknown Object (Task).Oct 21 2020, 5:33 PM

Looking at mw1267/1268 in rack A7: Can we move them from the bottom of the rack (above the ms-be) up to unit 20 and 27, staying in the same rack? That would be easiest for us.

https://netbox.wikimedia.org/dcim/racks/7/

Similarly for mw1313-1318, can they be moved into the free units higher up in the same rack? https://netbox.wikimedia.org/dcim/racks/15/

++ @Cmjohnson for his feedback to @Dzahn 's question on the MW server moves

Looking at mw1267/1268 in rack A7: Can we move them from the bottom of the rack (above the ms-be) up to unit 20 and 27, staying in the same rack? That would be easiest for us.
https://netbox.wikimedia.org/dcim/racks/7/

Similarly for mw1313-1318, can they be moved into the free units higher up in the same rack?
https://netbox.wikimedia.org/dcim/racks/15/

Change 637572 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] scap: replace proxy for eqiad A7, mw1268->mw1269

https://gerrit.wikimedia.org/r/637572

Mentioned in SAL (#wikimedia-operations) [2020-10-29T22:21:36Z] <mutante> replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically (T266164)

Change 637572 merged by Dzahn:
[operations/puppet@production] scap: replace proxy for eqiad A7, mw1268->mw1269

https://gerrit.wikimedia.org/r/637572

++ @Cmjohnson for his feedback to @Dzahn 's question on the MW server moves

We talked on IRC and:

  • mw1267,mw1268 need to move out of A7 completely
  • we picked A8 as the destination
  • too many servers to depool them all at once, we are starting with the 2 servers in A7, caring about the other rack later
  • mw1267 is a regular appserver and depooled already
  • mw1268 is special because it's a scap proxy, needed the extra change above and I just moved that role to mw1269, so that needs to stay where it is
  • mw1268 also depooled now

Tomorrow Chris will move them to A8 and there will be another change needed to update comments in the repo and then we can pool them again.

mw13* will be later, tbd.

Change 637576 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: move mw1267,mw1268 from rack A7 to rack A8

https://gerrit.wikimedia.org/r/637576

Mentioned in SAL (#wikimedia-operations) [2020-10-30T14:14:16Z] <cmjohnson1> moving mw1267 and mw168 to rack A8 eqiad T266164

@Dzahn I moved mw1267 and 1268 to rack A8 and confirmed they're up. Updated netbox

Mentioned in SAL (#wikimedia-operations) [2020-10-30T20:59:17Z] <mutante> mw1267,mw1268 - scap pull and repool - back to prod - T266164

Change 637576 merged by Dzahn:
[operations/puppet@production] site: move mw1267,mw1268 from rack A7 to rack A8

https://gerrit.wikimedia.org/r/637576

One rack done, the other rack we will continue from week of Nov 16.

Krinkle renamed this task from eqiad: Physical Moves for MediaWiki Servers to eqiad: Physical moves for MediaWiki servers.Nov 3 2020, 4:17 AM

@Cmjohnson icinga reports that mw1267's mgmt is down, can you check? It also reports that PS redundancy is not good :(

@Cmjohnson I'm back. we can continue with these second batch. Ideally in 2 batches. Let's talk what times are best.

@Cmjohnson So the servers at the bottom of B7 (mw1313-mw1318), should we move them to B4? That has the space and no other mw servers yet. Would that work for your purposes?

@Dzahn I would need to move them to a 1G rack, (B1,B3,B5,B6 and B8)

Mentioned in SAL (#wikimedia-operations) [2020-11-18T20:15:30Z] <mutante> mw1317,mw1318 - downtimed and depooled - they are physically moving from B7 to B5 (T266164)

Dzahn updated the task description. (Show Details)

Change 641834 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: update locations for mw1317, mw1318

https://gerrit.wikimedia.org/r/641834

Change 641834 merged by Dzahn:
[operations/puppet@production] site: update locations for mw1317, mw1318

https://gerrit.wikimedia.org/r/641834

@Dzahn

Is it possible to move
mw1281,82 and 83? I need this space for the an-workers on 10G. I can move them to A8.

I can definitely help on this @Dzahn, lemme know if you need a pair of extra hands :)

Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: move_to_other_rack

mw1281.eqiad.wmnet

Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: move_to_other_rack

mw1282.eqiad.wmnet

Icinga downtime for 4:00:00 set by dzahn@cumin1001 on 1 host(s) and their services with reason: move_to_other_rack

mw1283.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2020-12-09T18:05:49Z] <mutante> mw1281,mw1282,mw1283 shut down for T266164

@Cmjohnson Yes. I just depooled mw1281-1283, downtimed them and then shut them down physically. You can move them.

@Dzahn completed the move and mw1281-83 are up

@Cmjohnson Thank you. Repooled and receiving traffic again. Monitoring looks good.

Change 647326 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: update comment about location of mw1281-mw1283

https://gerrit.wikimedia.org/r/647326

Change 647326 merged by Dzahn:
[operations/puppet@production] site: update comment about location of mw1281-mw1283

https://gerrit.wikimedia.org/r/647326

@Cmjohnson Should this stay open for mw1313-mw1316 or did we solve the issue by moving other servers now?

Assigning to you to find out if we need to keep this open or not. If it's done, feel free to close as resolved, if we still need to do keep it open and move more serves, just assign it back. Thanks

per Willy the remaining ones are also listed on T267065 which is a wider task about the same thing. Suggesting to call it resolved then because the remaining ones are duplicates.