Page MenuHomePhabricator

setup backup1001.eqiad.wmnet
Closed, InvalidPublic

Description

This task will setup a new backup system in eqiad, backup1001.

This host will use a spare server wmf4750, and also use parts from helium.

Helium currently has an H800 external raid array controller, plus a MD1200 disk shelf (helium-array). We'll need the H800 removed from helium, and installed into wmf4750.

Downtime/decom of helium will need to be scheduled with @akosiaris, as soon as we yank out the H800 the old system is useless.

  • - apply hostname label to wmf4750 as backup1001.
  • - unrack backup1001 (d3-eqiad), prepare to move it into the space currently occupied by helium (a8-eqiad).
  • - @Cmjohnson to coordinate with @akosiaris on when to decom/yank parts from helium.
  • - migrate helium out of its current rackspace, move where there is room to wipe disks (doesn't matter where) and place backup1001 in helium's old spot (easier to move the 1U server than move the MD1200 2U disk shelf and find space for it in the same rack where backup1001 currently resides.)
  • - @Cmjohnson pulls H800 controller out of helium, connects disk shelves, ensures disks are detected and no data loss happens (it should allow the config, stored on the H800, to simply build and use the existing raid array in its shelf.)
  • - OS installation of backup1001, on its internal disks, NOT the md1200 disks.
  • - handoff to @akosiaris for service deployment of backup service

Details

Related Gerrit Patches:

Event Timeline

RobH triaged this task as Medium priority.Mar 15 2018, 5:27 PM
RobH created this task.
Cmjohnson updated the task description. (Show Details)Mar 19 2018, 6:48 PM

Moved this server to u11 on A8 once @akosiaris and I figure out a day/time to make the move I will relocated helium array to u 9/10.

RobH mentioned this in Unknown Object (Task).Mar 20 2018, 4:31 PM

Change 421295 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Introduce backup1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/421295

Change 421295 merged by Alexandros Kosiaris:
[operations/dns@master] Introduce backup1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/421295

Unfortunately wmf4750 will not do after all. After we powered off and unracked helium we figured out the raid card was too big for the space available in the R430. We need either a different server from the spares or a new server :-(

Change 421301 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Revert "Introduce backup1001.eqiad.wmnet"

https://gerrit.wikimedia.org/r/421301

Change 421301 merged by Alexandros Kosiaris:
[operations/dns@master] Revert "Introduce backup1001.eqiad.wmnet"

https://gerrit.wikimedia.org/r/421301

akosiaris closed this task as Invalid.Sep 25 2018, 7:58 AM

This was impossible to happen, a new box was procured in T196478