Page MenuHomePhabricator

(Need By: 2021-04-30) rack/setup/install backup200[4-7]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of backup200[4-7]

Hostname / Racking / Installation Details

hostname:backup2004-2007
Racking Proposal: Anywhere on 10G racks if full system, ideally not sharing a rack (for sure) or row (preferred).
Networking/Subnet/VLAN/IP: 10G, production-codfw-network.
Partitioning/Raid: Software RAID1 for (2) OS SSDs and HW RAID 6 with writeback for (24) HDs. The recipe is the same as all other backup hosts: custom/backup-format.cfg
OS Distro: Buster

Per host setup checklist

backup2004: A4 U1/2 xe-4/0/0

  • - receive in system on procurement task T264672 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - update firmware of bios and idrac to latest revision
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

backup2005:

  • - receive in system on procurement task T264672 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - update firmware of bios and idrac to latest revision
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

backup2006: rack C2 U1/2 xe/2/0/0

  • - receive in system on procurement task T264672 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - update firmware of bios and idrac to latest revision
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

backup2007: D7 xe-7/0/2

  • - receive in system on procurement task T264672 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - update firmware of bios and idrac to latest revision
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH renamed this task from (Need By: TBD) rack/setup/install backup200[4-7] to (Need By: 2021-04-30) rack/setup/install backup200[4-7].Mar 12 2021, 5:49 PM
RobH assigned this task to Papaul.
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH edited subscribers, added: jcrespo, LSobanski; removed: RobH.
RobH added a parent task: Unknown Object (Task).Mar 12 2021, 5:51 PM
RobH mentioned this in Unknown Object (Task).

I am planning on moving some servers in B4 and C4 to make room on the bottom of the racking to be able to rack backup2005 and 2006

Change 681117 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mediabackup: Setup the storage hosts

https://gerrit.wikimedia.org/r/681117

Change 681777 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add new backup node MAC address, partman recipe, role insetup

https://gerrit.wikimedia.org/r/681777

Change 681777 merged by Papaul:

[operations/puppet@production] Add new backup node MAC address, partman recipe, role insetup

https://gerrit.wikimedia.org/r/681777

@jcrespo backup2004 and 2007 are ready for OS install, you can take over

Papaul updated the task description. (Show Details)

Change 683762 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] DHCP Add MAC address for backup200[5,6]

https://gerrit.wikimedia.org/r/683762

Change 683762 merged by Papaul:

[operations/puppet@production] DHCP Add MAC address for backup200[5,6]

https://gerrit.wikimedia.org/r/683762

@jcrespo all 4 nodes are ready for OS install good luck.

Change 681117 merged by Jcrespo:

[operations/puppet@production] mediabackup: Setup the storage hosts

https://gerrit.wikimedia.org/r/681117

Change 693856 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] reimage: Avoid automatic reimage of backup[12]00[123]

https://gerrit.wikimedia.org/r/693856

Change 693856 merged by Jcrespo:

[operations/puppet@production] reimage: Avoid automatic reimage of backup[12]00[123]

https://gerrit.wikimedia.org/r/693856

Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts:

backup2004.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202105241015_jynus_29846_backup2004_codfw_wmnet.log.

Completed auto-reimage of hosts:

['backup2004.codfw.wmnet']

Of which those FAILED:

['backup2004.codfw.wmnet']

These hosts have the same issue that backup2003 had: T274185#6883969 we need to make manually the first SSD bootable for it to boot after reimage. @Papaul, I tested this needed to be done on backup2004 and already fixed it there, could you help me do it with backup2005,6,7? ( F2 > Devices > iDRAC > physical disks > operation: Make bootable > Go.). Then I can take over to do a batch reimage.

What is the best place to document this for the future?- as it likely takes not much time extra compared to setting up the full RAID configuration?

Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts:

backup2004.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202105241613_jynus_12901_backup2004_codfw_wmnet.log.

I've documented the requirement at: https://wikitech.wikimedia.org/wiki/Raid_setup but please let me know if there is a better place/preferred location.

Completed auto-reimage of hosts:

['backup2004.codfw.wmnet']

and were ALL successful.

@jcrespo first SSD is set to bootable

Thank you very much! This will help me speed up the reimages.

Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts:

['backup2005.codfw.wmnet', 'backup2006.codfw.wmnet', 'backup2007.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202105250759_jynus_8094.log.

Completed auto-reimage of hosts:

['backup2005.codfw.wmnet', 'backup2006.codfw.wmnet', 'backup2007.codfw.wmnet']

and were ALL successful.

jcrespo updated the task description. (Show Details)
jcrespo added a subscriber: Volans.

Thank you very much, @Papaul and @Volans your help managing these servers saved me hours of work getting those prepared! CC @LSobanski