Page MenuHomePhabricator

(Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes)
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ms-be2057.codfw.wmnet.

Need By: Please note this is a try-and-buy test of the new R740xd2, and blocks the buying of MANY ms-be systems in Q1 (both codfw and eqiad). So this should be a high priority to get it racked and remotely accessible so testing for both OS installation, partitioning, and system testing can take place ASAP.

Test Server Notes: Please do NOT throw out the box for this host, as it is a try and buy system. We expect to keep this host, but until we are sure, we should keep the box in storage.

Hostname / Racking / Installation Details

Hostnames: ms-be2057
Racking Proposal: 10G rack, no other restriction other than ideally a rack with the least shared ms-be systems.
Networking/Subnet/VLAN/IP: 10g, internal vlan
Partitioning/Raid: Setup identical to existing ms-be, namely each disk in single raid0 array. SFF/SDD disks must be the first two arrays (i.e. sda/sdb from the OS perspective)
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ms-be2057: B4 U 12/13 xe-4/0/10

  • - receive in system on procurement task <enter task # here> & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH created this task.Aug 11 2020, 6:07 PM
Restricted Application added a project: Operations. · View Herald TranscriptAug 11 2020, 6:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a parent task: Unknown Object (Task).Aug 11 2020, 6:08 PM
wiki_willy renamed this task from (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet to (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes).Aug 11 2020, 6:09 PM
RobH assigned this task to fgiunchedi.EditedAug 11 2020, 6:09 PM
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH added subscribers: Papaul, fgiunchedi.

@fgiunchedi: What racking restrictions and what OS did you have for this incoming test system? Please comment and then reassign this from you to @Papaul, as he will be racking this host when it arrives.

I just gave it the next ms-be name in sequence.

RobH updated the task description. (Show Details)Aug 11 2020, 6:10 PM
RobH updated the task description. (Show Details)Aug 11 2020, 6:12 PM
fgiunchedi reassigned this task from fgiunchedi to Papaul.Aug 12 2020, 8:27 AM
fgiunchedi updated the task description. (Show Details)

@fgiunchedi: What racking restrictions and what OS did you have for this incoming test system? Please comment and then reassign this from you to @Papaul, as he will be racking this host when it arrives.

I just gave it the next ms-be name in sequence.

Thanks @RobH, naming and racking plan look good to me! I've updated the task description, all yours @Papaul. Thanks!

Papaul updated the task description. (Show Details)Aug 17 2020, 5:34 PM
RobH updated the task description. (Show Details)Aug 17 2020, 5:35 PM
Papaul updated the task description. (Show Details)Thu, Aug 27, 5:09 PM
Papaul updated the task description. (Show Details)Thu, Aug 27, 5:18 PM
Papaul updated the task description. (Show Details)Thu, Aug 27, 10:30 PM
 member ge-1/0/21 { ... }
+    member xe-4/0/10;
[edit interfaces]
+   xe-4/0/10 {
+       description ms-be2057;
+   }
Papaul updated the task description. (Show Details)Thu, Aug 27, 10:39 PM

Change 622903 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add production DNS for ms-be2057

https://gerrit.wikimedia.org/r/622903

Change 622903 merged by Papaul:
[operations/dns@master] DNS: Add production DNS for ms-be2057

https://gerrit.wikimedia.org/r/622903

Papaul updated the task description. (Show Details)Thu, Aug 27, 10:49 PM

Change 622905 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for ms-be2057

https://gerrit.wikimedia.org/r/622905

Change 622905 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for ms-be2057

https://gerrit.wikimedia.org/r/622905

Change 622908 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add ms-be2057 to site.pp with role insetup

https://gerrit.wikimedia.org/r/622908

Change 622908 abandoned by Papaul:
[operations/puppet@production] Add ms-be2057 to site.pp with role insetup

Reason:
ms-be2057 is already part of site.pp

https://gerrit.wikimedia.org/r/622908

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

ms-be2057.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202008272331_pt1979_32345_ms-be2057_codfw_wmnet.log.

Papaul updated the task description. (Show Details)Fri, Aug 28, 12:03 AM

The install is failing at the first puppet run; because the server doesn't have the role spare::system

00:50:16 | ms-be2057.codfw.wmnet | Still waiting for a succesful Puppet run after 35.0 minutes.Either it has not finished yet or the puppet run had errors. You may have to fix the puppet role or reinstall with spare::system first. Check the log file. The path to it was printed at the start of the script.

Adding the server back with the role insetup.

Change 622908 restored by Papaul:
[operations/puppet@production] Add ms-be2057 to site.pp with role insetup

https://gerrit.wikimedia.org/r/622908

Change 622908 merged by Papaul:
[operations/puppet@production] Add ms-be2057 to site.pp with role insetup

https://gerrit.wikimedia.org/r/622908

Completed auto-reimage of hosts:

['ms-be2057.codfw.wmnet']

and were ALL successful.

Papaul closed this task as Resolved.Fri, Aug 28, 1:26 AM
Papaul updated the task description. (Show Details)

@fgiunchedi All yours have fun