Page MenuHomePhabricator

(Need By: TBD) rack/setup/install moss-be100[12]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

Hostnames: moss-be100[12]
Racking Proposal: As row and rack diverse as possible
Networking/Subnet/VLAN/IP: 10G private VLAN
Partitioning/Raid: Existing recipe and raid configuration as ms-be
OS Distro: Buster

Per host setup checklist

moss-be1001:

  • - receive in system on procurement task T275177 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update: idrac, bios, raid, network
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm). https://gerrit.wikimedia.org/r/c/operations/puppet/+/697990
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

moss-be1002:

  • - receive in system on procurement task T275177 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update: idrac, bios, raid, network
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm). https://gerrit.wikimedia.org/r/c/operations/puppet/+/697990
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH mentioned this in Unknown Object (Task).
RobH removed a subscriber: RobH.

moss-be1001 B4 U30 port13 id5345
moss-be1002 C2 U21 port33 id5344

Jclark-ctr updated the task description. (Show Details)
Jclark-ctr added a subscriber: Jclark-ctr.

@Jclark-ctr moss-be1001 cables are wrong, the ports you have them connected to are already labeled for cloudcephosd1016 but I see that the server is not connected to the switch and also listed as decommission in Netbox (unsure about that status as well). I am confused about what's going on with this switch and available ports. Can you let me know which ports are available?

Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: RobH.

@RobH if you have time to do the installs that would be great, assign back to me if you're busy.

Change 697990 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] moss-be100[12] setup info

https://gerrit.wikimedia.org/r/697990

Change 697990 merged by RobH:

[operations/puppet@production] moss-be100[12] setup info

https://gerrit.wikimedia.org/r/697990

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106031527_robh_2852.log.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106031527_robh_2852.log.

First imaging attempt failed due to the raid disks not having been setup in a bunch of individual raid0s in advance; fixing it now just echoing result so the above script start without end has context.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106031710_robh_31308.log.

Completed auto-reimage of hosts:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']

Of which those FAILED:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']
17:23:22 | moss-be1001.eqiad.wmnet | Unable to run wmf-auto-reimage-host: The host moss-be1001.eqiad.wmnet should have rebooted into the newly installed Operating System but appears to have rebooted instead into the Debian installer again. Manual intervention required.
17:25:52 | moss-be1002.eqiad.wmnet | Unable to run wmf-auto-reimage-host: The host moss-be1002.eqiad.wmnet should have rebooted into the newly installed Operating System but appears to have rebooted instead into the Debian installer again. Manual intervention required.

fixing and will rerun

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106031731_robh_16029.log.

Completed auto-reimage of hosts:

['moss-be1001.eqiad.wmnet', 'moss-be1002.eqiad.wmnet']

and were ALL successful.

RobH updated the task description. (Show Details)
RobH added a subscriber: fgiunchedi.

@fgiunchedi these are now ready for your use!