We'll need to plug in the SSDs that were in graphite2001 and got replaced with newer/bigger SSDs (new ssds were received in T157153).
Papaul, assuming the old SSDs are not wiped yet, can we connect the four SSDs to a spare system and boot it? It should work as-in I think, none of those SSDs in codfw were found faulty yet.
A potential issue is when the new spare systemis booted up, it will attempt to use the IP of the existing graphite2001 system, so the below checklist should be followed. Since the old system (graphite2001) was in row B, and the spare system is in row C, we should be fairly safe for IP mismatch. The network port will be disabled, and the row switch stack the spare system is on cannot route row B IP addresses.
- - select spare sysetm WMF6406 for use, as it has SFF bays.
- - install extra drive trays into WMF6406, you may have to use spare onsite trays, or steal from another spare system (like WMF6407, but please note where they come from, since taking from the other spare system will take up both systems for use.)
- - install SSDs (all 4) into spare system.
- - determine if ssd installation was correct by attempting boot - this will take a few attempts and the system will NOT have a network connection enabled, just the mgmt connection.
Basically we need to try to setup the old SSDs in a spare system to boot them up. Once they are booting up and the raid is assembled, we can then attempt to copy over the data.
Once SSDs are working, the following steps must be taken:
- - add new dns entries for the temp use of graphite2003 (this spare system with the old ssds) - https://gerrit.wikimedia.org/r/#/c/345177/
The rest of the checklist wasn't done, since we never enabled networking or let this come fully online. Instead we copied the data via usb memory stick and serial console. Since the system was never online fully, its now been reclaimed as part of T162900