Page MenuHomePhabricator

(Need By: TBD) rack/setup/install maps10[05-10].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of maps10[05-10].eqiad.wmnet. 4 of these were purchased to replace maps1001-1004, and 2 as expansion.

Hostname / Racking / Installation Details

Hostnames: maps10[05-10]
Racking Proposal: These are replacing (4) and expanding (2) the maps footprint in eqiad. Please ensure maps100[5-8] have one server per row, and then place maps10[09-10] in any two different rows. Ensure none of the maps10[05-10] have more than 1 host per rack. (End result will be two rows with 1 maps host each, and 2 rows with 2 maps hosts each, no rack having more than one maps host.)
Networking/Subnet/VLAN/IP: 1G networking, single port connection, production VLAN
Partitioning/Raid: Hardware RAID
OS Distro: Stretch

Per host setup checklist

maps1005:

  • - receive in system on procurement task T257950 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

maps1006:

  • - receive in system on procurement task T257950 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

maps1007:

  • - receive in system on procurement task T257950 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

maps1008:

  • - receive in system on procurement task T257950 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

maps1009:

  • - receive in system on procurement task T257950 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

maps1010:

  • - receive in system on procurement task T257950 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).Aug 12 2020, 5:29 PM
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.

Change 631512 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns for maps servers

https://gerrit.wikimedia.org/r/631512

Change 631512 merged by Cmjohnson:
[operations/dns@master] Adding production dns for maps servers

https://gerrit.wikimedia.org/r/631512

Cmjohnson added a subscriber: wiki_willy.

These are mostly ready to turn over, the h/w raid has not been setup. I am not sure which raid configuration is needed. @wiki_willy can you track down the service owner and update task please

Hi @Cmjohnson - I think @RKemper might be the owner of these machines:

These are mostly ready to turn over, the h/w raid has not been setup. I am not sure which raid configuration is needed. @wiki_willy can you track down the service owner and update task please

chatted with Ryan in IRC, Raid10 is needed. I will get that set up and ready for the initial install/puppet

Change 632511 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Add maps1005-1010 to site.pp and add mac addresses to dhcpd file

https://gerrit.wikimedia.org/r/632511

Change 632511 merged by Cmjohnson:
[operations/puppet@production] Add maps1005-1010 to site.pp and add mac addresses to dhcpd file

https://gerrit.wikimedia.org/r/632511

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

maps1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202010061825_cmjohnson_13741_maps1005_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

['maps1006.eqiad.wmnet', 'maps1007.eqiad.wmnet', 'maps1008.eqiad.wmnet', 'maps1009.eqiad.wmnet', 'maps1010.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202010061830_cmjohnson_31197.log.

Completed auto-reimage of hosts:

['maps1005.eqiad.wmnet']

Of which those FAILED:

['maps1005.eqiad.wmnet']

Completed auto-reimage of hosts:

['maps1006.eqiad.wmnet', 'maps1007.eqiad.wmnet', 'maps1008.eqiad.wmnet', 'maps1009.eqiad.wmnet', 'maps1010.eqiad.wmnet']

Of which those FAILED:

['maps1006.eqiad.wmnet', 'maps1007.eqiad.wmnet', 'maps1008.eqiad.wmnet', 'maps1009.eqiad.wmnet', 'maps1010.eqiad.wmnet']

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

maps1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202010071403_cmjohnson_7589_maps1005_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['maps1005.eqiad.wmnet']

Of which those FAILED:

['maps1005.eqiad.wmnet']

@RobH Can you look into these please, I get them to do the initial install but I am getting an error I haven't seen before.

IPMI Password:
14:03:20 | maps1005.eqiad.wmnet | Removed from Puppet
14:03:20 | maps1005.eqiad.wmnet | WARNING: Unable to remove from Debmonitor, got: 404
14:03:20 | maps1005.eqiad.wmnet | Set Boot Device to pxe
14:03:21 | maps1005.eqiad.wmnet | Power cycling
14:03:22 | maps1005.eqiad.wmnet | Chassis Power Control: Cycle
14:07:29 | maps1005.eqiad.wmnet | Still waiting for reboot after 5.0 minutes
14:07:29 | maps1005.eqiad.wmnet | Uptime checked
14:07:29 | maps1005.eqiad.wmnet | Host up (Debian installer)
14:12:02 | maps1005.eqiad.wmnet | Still waiting for reboot after 5.0 minutes
14:14:04 | maps1005.eqiad.wmnet | Uptime checked
14:14:04 | maps1005.eqiad.wmnet | Host up
14:14:04 | maps1005.eqiad.wmnet | Unable to run wmf-auto-reimage-host: Unable to find certificate fingerprint in:
sh: puppet: not found
14:14:04 | maps1005.eqiad.wmnet | REIMAGE END | retcode=2

Can anything be done to help unblock this task? This capacity is needed for the cluster as it's quite short on resources.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

maps1006.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202010292022_robh_4060_maps1006_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['maps1006.eqiad.wmnet']

Of which those FAILED:

['maps1006.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

maps1006.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202010292038_robh_19239_maps1006_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['maps1006.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

maps1009.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202010292105_robh_13925_maps1009_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['maps1009.eqiad.wmnet']

Of which those FAILED:

['maps1009.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

maps1009.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202010292133_robh_6670_maps1009_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['maps1009.eqiad.wmnet']

Of which those FAILED:

['maps1009.eqiad.wmnet']

Can anything be done to help unblock this task? This capacity is needed for the cluster as it's quite short on resources.

All but maps1009 are now staged, ready for their actual roles to be applied in puppet. maps1009 I'm still working on.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

maps1009.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011061953_robh_28745_maps1009_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['maps1009.eqiad.wmnet']

and were ALL successful.

all hosts installed and set to staged.

Change 644603 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] maps: remove no-longer-accurate insetup role

https://gerrit.wikimedia.org/r/644603

Change 644603 merged by Hnowlan:
[operations/puppet@production] maps: remove no-longer-accurate insetup role

https://gerrit.wikimedia.org/r/644603