Page MenuHomePhabricator

Audit eqiad row B WMCS servers
Closed, ResolvedPublic

Description

Audit the full set of servers in eqiad row B to ensure that any and all hardware we are ready to decomm is actively marked as such with phabricator tasks flagged for the DCOps folks.

Event Timeline

bd808 triaged this task as High priority.Mar 24 2020, 5:51 PM

For scope here, the only current racks that are 10G enabled are B2, B4 and B7. Immediate decomms in other racks will not help for the current situation.

I have confirmed already that in those racks, all of the WMCS hardware is in service at this time (or at least not ready to be decommed until replaced).

From the top of B2:
I've confirmed U34-20 are all in service. (cloudvirts, ms-be1020, logstash1027).
below that is db1099 (in service)
ms-be1047 and 58, in service
ms-be1058
Some cloud and analytics servers that are all in service.
cp1080 is in service
cp1079 also in service
Lastly, a hadoop worker that is in service (and not on the current analytics refresh ticket).

@bd808 If you think it would be useful, I can make a spreadsheet with all of this instead of summaries, but I highly suspect there are no quick wins here from my quick checking so far. Moving cloudvirts in there did a good job of decommissioning old things efficiently from the racks.

That said, at the top of rack B4, ruthenium is decommissioned and still in the rack T216062

I'll spreadsheet it out in case there are more.

And that's done. There are two servers. One server is decommed but not removed (ruthenium). The other has an open decom ticket waiting on DCops (rhodium).
Our servers are all in service. However, cloudelastic1002 may not need to be racked where it is. According to T194186, it was racked in Row B only for redundancy. In this case, the redundancy is good, but it does seem like a server that could be moved to another row to free ports in a pinch.

I believe all others need to remain in row B and are currently in service (or are to-be-replaced by the upcoming purchases in the case of cloudvirts).

bd808 added a subscriber: wiki_willy.

@wiki_willy, @Bstorm did the audit that was mentioned in our meeting on 2020-03-24. The good/bad news is that there is not much that we can find to move out of Row B, but there are 2 decom ready hosts from the core SRE group. She also spotted one host that could move out of row B entirely if there is another place for it to land.

@bd808 - we did a rack space, power, and 10g port check in row B, and I think we should be good installing these from a data center standpoint....as long as we're able to free up the 12x 10g ports from Jason's idea. Now that we only need 33x 10g ports to install these, we should have ~38x 10g ports free after the 12x are released back. Here's the amount of available rack space info below in these 10g racks, which will cover the 21u needed:

B2 - 11u
B4 - 9u
B7 - 6u

With that said, we'll still need to figure out a better long term solution, for anything that needs to be installed after the Cloudvirt refresh and Ceph expansion.

Thanks,
Willy

we did a rack space, power, and 10g port check in row B, and I think we should be good installing these from a data center standpoint....as long as we're able to free up the 12x 10g ports from Jason's idea.

In that case, I really hope that T248425: Test using trunked interfaces for cloudvirts works out ok!

bd808 reassigned this task from wiki_willy to Bstorm.