Page MenuHomePhabricator

(Need By: TBD) rack/setup/install dumpsdata100[45]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of dumpsdata100[45]

Hostname / Racking / Installation Details

Hostnames: dumpsdata100[45]
Racking Proposal: Each in its own rack, in separate racks from the other three dumpsdata hosts
Networking/Subnet/VLAN/IP: Internal vlan, single 10G connection.
Partitioning/Raid: HW raid, 1 12-disk raid10 volume; two disks in raid 1 for the OS. Partman recipe should be dumpsdata100X.cfg for the initial install, and reset afterwards to dumpsdata100X-no-data-format.cfg once there is data on the arrays.
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

dumpsdata1004:

  • - receive in system on procurement task T280149 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

dumpsdata1005:

  • - receive in system on procurement task T280149 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH mentioned this in Unknown Object (Task).
RobH unsubscribed.
RobH added a subscriber: wiki_willy.

Please note the original ask for networking was:

Networking/Subnet/VLAN/IP: Internal vlan, 10G for one host and 1G (for now) for the other. If the 10G connected host fails we'll need to be able to switch to 10G on the other.

I have shifted this, as moving a host from 1G to 10G is really not an ideal backup solution for failover, and thus would dictate this request requires both systems be on 10G.

Networking/Subnet/VLAN/IP: Internal vlan, single 10G connection.

I'm pinging in @wiki_willy so he is aware that I've upgraded this ask from what @ArielGlenn requested in the initial ask. If this does need to shift back to 1 host on 10G and 1 host on 1G, it would mean shuffling hosts around in racks at a failover of the primary host, which is a lot of DC onsite overhead.

dumpsdata1004 A2. u11. id#11002. port31
dumpsdata1005. C2 u12. id#11003 port35

these need idrac setups and should be completed by early next week (week of 9 AUG)

Both iDracs are setup and they're accessible, needs f/w update and non data center specific work

Change 711182 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] setup dumpsdata1004-5, dhpd, site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/711182

Change 711182 merged by Cmjohnson:

[operations/puppet@production] setup dumpsdata1004-5, dhpd, site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/711182

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

dumpsdata1004.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202108101810_cmjohnson_11269_dumpsdata1004_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

dumpsdata1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202108101823_cmjohnson_12221_dumpsdata1005_eqiad_wmnet.log.

Change 711185 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] update partman receipe used for dumpsdtat1004 and 1005 to partman/custom/dumpsdata100X.cfg

https://gerrit.wikimedia.org/r/711185

Change 711185 merged by Cmjohnson:

[operations/puppet@production] update partman receipe used for dumpsdtat1004/1005 to dumpsdata100X.cfg

https://gerrit.wikimedia.org/r/711185

Completed auto-reimage of hosts:

['dumpsdata1004.eqiad.wmnet']

Of which those FAILED:

['dumpsdata1004.eqiad.wmnet']

Completed auto-reimage of hosts:

['dumpsdata1005.eqiad.wmnet']

Of which those FAILED:

['dumpsdata1005.eqiad.wmnet']

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

dumpsdata1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202108101848_cmjohnson_15934_dumpsdata1005_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

dumpsdata1004.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202108101848_cmjohnson_15916_dumpsdata1004_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['dumpsdata1004.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['dumpsdata1005.eqiad.wmnet']

and were ALL successful.

Cmjohnson updated the task description. (Show Details)

all tasks completed