Page MenuHomePhabricator

(Need by: TBD) rack/setup/install wdqs101[123].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of wdqs10[123] ordered via T227755.

Hostname / Racking / Installation Details

Hostnames: wdqs1011, wdqs1012, wdqs1013
Racking Proposal:

WDQS is composed of 2 independent clusters, nodes in each cluster should be spread across rows as much as possible

Current configuration:

  • public cluster:
    • wdqs1004: A6
    • wdqs1005: D3
    • wdqs1006: A1
  • private cluster:
    • wdqs1003: A4 (to be decommissioned)
    • wdqs1007: B1
    • wdqs1008: D1

proposed racking for new servers:

  • wdqs1011: public cluster, row B or C
  • wdqs1012: private cluster, row A
  • wdqs1013: private cluster, row C

Networking/Subnet/IP: 1G NIC is good enough, same subnet as current servers (private eqiad)
Partitioning/Raid: RAID10: raid10-gpt-srv-lvm-ext4-8disks.cfg

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

wdqs1011:

  • - receive in system on procurement task T227755
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

wdqs1012:

  • - receive in system on procurement task T227755
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

wdqs1013:

  • - receive in system on procurement task T227755
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH created this task.Feb 27 2020, 3:45 PM
Restricted Application added a project: Operations. · View Herald TranscriptFeb 27 2020, 3:45 PM
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.Feb 27 2020, 3:45 PM
RobH updated the task description. (Show Details)
RobH added a parent task: Unknown Object (Task).
RobH renamed this task from (Due by: TBD) rack/setup/install wdqs101[123].eqiad.wmnet to (Need by: TBD) rack/setup/install wdqs101[123].eqiad.wmnet.Feb 27 2020, 6:20 PM
Jclark-ctr updated the task description. (Show Details)Mar 2 2020, 11:08 PM
RobH removed a subscriber: RobH.Mar 2 2020, 11:47 PM
Jclark-ctr closed this task as Resolved.Mar 2 2020, 11:49 PM
Jclark-ctr reassigned this task from Jclark-ctr to Cmjohnson.
Jclark-ctr added a subscriber: Jclark-ctr.

host name, Rack , Switch port

wdqs1011: B5 ,34
wdqs1012: A5, 19
wdqs1013:C5 , 39

Jclark-ctr updated the task description. (Show Details)Mar 2 2020, 11:50 PM
Gehel reopened this task as Open.Mar 5 2020, 7:03 PM
Gehel added a subscriber: Gehel.

Re-opening since it looks (from the checklist above and from the status in netbox) that this isn't completed yet.

These were racked out of order, fixing the physical locations and corresponding port numbers

wdqs1011: A5, 19
wdqs1012: B5 ,34
wdqs1013:C5 , 39

Cmjohnson updated the task description. (Show Details)Mar 6 2020, 8:05 PM

Change 577692 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns wdqs10[1-3]

https://gerrit.wikimedia.org/r/577692

Change 577692 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns wdqs10[1-3]

https://gerrit.wikimedia.org/r/577692

Cmjohnson updated the task description. (Show Details)Mar 10 2020, 11:51 AM
Cmjohnson updated the task description. (Show Details)Mar 10 2020, 11:58 AM

Change 578509 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Add production dns for wdqs10[123]

https://gerrit.wikimedia.org/r/578509

Change 578509 merged by Cmjohnson:
[operations/dns@master] Add production dns for wdqs10[123]

https://gerrit.wikimedia.org/r/578509

Change 578514 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] updating dhcpd file with new wdqs10[123]

https://gerrit.wikimedia.org/r/578514

Cmjohnson updated the task description. (Show Details)Mar 10 2020, 12:29 PM

Change 578515 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding wdqs101[1-3] to site.pp role:spare

https://gerrit.wikimedia.org/r/578515

Change 578514 merged by Cmjohnson:
[operations/puppet@production] updating dhcpd file with new wdqs10[123]

https://gerrit.wikimedia.org/r/578514

Change 578515 merged by Cmjohnson:
[operations/puppet@production] Adding wdqs101[1-3] to site.pp role:spare

https://gerrit.wikimedia.org/r/578515

Cmjohnson updated the task description. (Show Details)Mar 10 2020, 12:37 PM

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

wdqs1011.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003101300_cmjohnson_212342_wdqs1011_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

wdqs1012.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003101301_cmjohnson_212505_wdqs1012_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

wdqs1013.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003101302_cmjohnson_212618_wdqs1013_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['wdqs1012.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['wdqs1013.eqiad.wmnet']

and were ALL successful.

Cmjohnson closed this task as Resolved.Mar 10 2020, 1:27 PM
Cmjohnson updated the task description. (Show Details)

@Gehel I am resolving this task, if there are any issues please re-open and ping me.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

stat1008.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003101907_cmjohnson_17764_stat1008_eqiad_wmnet.log.

Change 578996 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] use new role(insetup) on a few hosts in setup

https://gerrit.wikimedia.org/r/578996

Change 578996 merged by Dzahn:
[operations/puppet@production] use new role(insetup) on a few hosts in setup

https://gerrit.wikimedia.org/r/578996