Page MenuHomePhabricator

rack/setup/install cp30[50-65].esams.wmnet
Closed, ResolvedPublic

Description

This task will track the racking/setup/installation of 16 new cp nodes in esams, ordered via T230619.

Common Specifications

Hostnames: cp3050+
Racking Proposal: These will be racked according to the layout detailed on this google sheet.
Networking/Subnet/VLAN/IP: 10G single connection, internal subnet.
Partitioning/Raid: These will have dual SSD for OS, and a single NVMe SSD for caching. This will likely require a new partman recipie be written to support this.
Initial Puppet Role: Traffic will need to advise if we use role::spare or a specific role for these for initial installation.

Individual Server Checklists

cp3050

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3051

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3052

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3053

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3054

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3055

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3056

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3057

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3058

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3059

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3061

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3062

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3060

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3063

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname

[x]x - network port setup (description, enable, vlan)

    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3064

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

cp3065

  • - receive in system on procurement task T230619
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - disable embedded NIC
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare or perhaps with actual role, up to traffic to comment above)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Related Objects

StatusSubtypeAssignedTask
Resolvedwiki_willy
ResolvedPapaul

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@BBlack here is the information for the CP servers in rack 15

cp3055 : xe-5/0/15
cp3056: xe-5/0/16
cp3057: xe-5/0/17
cp3058: xe-5/0/18
cp3059: xe-5/0/19
cp3060: xe-5/0/20

Change 545658 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Basic install for new esams hosts

https://gerrit.wikimedia.org/r/545658

Change 545662 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] basic DNS entries for new esams hosts

https://gerrit.wikimedia.org/r/545662

Change 545662 merged by BBlack:
[operations/dns@master] basic DNS entries for new esams hosts

https://gerrit.wikimedia.org/r/545662

Change 545691 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] cp30[56][0-9]: add hiera/conftool data

https://gerrit.wikimedia.org/r/545691

Change 545693 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] acme_chief: Grant new esams cp hosts access to the unified certificate

https://gerrit.wikimedia.org/r/545693

Change 545693 merged by Vgutierrez:
[operations/puppet@production] acme_chief: Grant new esams cp hosts access to the unified certificate

https://gerrit.wikimedia.org/r/545693

Change 545658 merged by BBlack:
[operations/puppet@production] Basic install for new esams hosts

https://gerrit.wikimedia.org/r/545658

Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts:

['cp3055.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910240419_vgutierrez_126191.log.

Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts:

['cp3056.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910240445_vgutierrez_131290.log.

Change 545701 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] install_server: Fix MAC addresses for new esams boxes

https://gerrit.wikimedia.org/r/545701

Change 545701 merged by Vgutierrez:
[operations/puppet@production] install_server: Fix MAC addresses for new esams boxes

https://gerrit.wikimedia.org/r/545701

Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts:

['cp3055.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910240704_vgutierrez_158045.log.

Completed auto-reimage of hosts:

['cp3055.esams.wmnet']

Of which those FAILED:

['cp3055.esams.wmnet']

Change 545706 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] hiera: Provide storage configuration for ats-backend on cp3055

https://gerrit.wikimedia.org/r/545706

Change 545706 merged by Vgutierrez:
[operations/puppet@production] hiera: Provide storage configuration for ats-backend on cp3055

https://gerrit.wikimedia.org/r/545706

Change 545711 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] hiera: Provide ats storage config for new esams upload hosts

https://gerrit.wikimedia.org/r/545711

Change 545712 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] hiera: Provide varnish storage config for new cp text hosts

https://gerrit.wikimedia.org/r/545712

Change 545711 merged by Vgutierrez:
[operations/puppet@production] hiera: Provide ats storage config for new esams upload hosts

https://gerrit.wikimedia.org/r/545711

Change 545712 merged by Vgutierrez:
[operations/puppet@production] hiera: Provide varnish storage config for new cp text hosts

https://gerrit.wikimedia.org/r/545712

Change 545691 merged by Vgutierrez:
[operations/puppet@production] cp30[56][0-9]: add hiera/conftool data

https://gerrit.wikimedia.org/r/545691

Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts:

['cp3055.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910240912_vgutierrez_187246.log.

Completed auto-reimage of hosts:

['cp3055.esams.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3060.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241036_ema_206514.log.

Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts:

['cp3057.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241041_vgutierrez_207264.log.

Completed auto-reimage of hosts:

['cp3060.esams.wmnet']

Of which those FAILED:

['cp3060.esams.wmnet']

Change 545814 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp3060: add to cache::nodes

https://gerrit.wikimedia.org/r/545814

Change 545814 merged by Ema:
[operations/puppet@production] cp3060: add to cache::nodes

https://gerrit.wikimedia.org/r/545814

Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts:

['cp3059.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241109_vgutierrez_214246.log.

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3060.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241109_ema_214315.log.

Completed auto-reimage of hosts:

['cp3060.esams.wmnet']

Of which those FAILED:

['cp3060.esams.wmnet']

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3060.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241110_ema_214404.log.

Completed auto-reimage of hosts:

['cp3057.esams.wmnet']

and were ALL successful.

Change 545819 had a related patch set uploaded (by Vgutierrez; owner: Vgutierrez):
[operations/puppet@production] hiera: Add cp305[5,7,9] to cache::nodes

https://gerrit.wikimedia.org/r/545819

Completed auto-reimage of hosts:

['cp3059.esams.wmnet']

and were ALL successful.

Change 545820 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] esams cp nodes: size storage correctly

https://gerrit.wikimedia.org/r/545820

Change 545819 merged by Vgutierrez:
[operations/puppet@production] hiera: Add cp305[5,7,9] to cache::nodes

https://gerrit.wikimedia.org/r/545819

Change 545820 merged by BBlack:
[operations/puppet@production] esams cp nodes: size storage correctly

https://gerrit.wikimedia.org/r/545820

Mentioned in SAL (#wikimedia-operations) [2019-10-24T12:25:20Z] <ema> cp3060: powercycle -- NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [charon:1226] T233242

Completed auto-reimage of hosts:

['cp3060.esams.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-10-24T13:17:50Z] <ema> set ats-be weights on new esams upload nodes T233242

Change 545852 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cp3056: add to cache::nodes

https://gerrit.wikimedia.org/r/545852

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3056.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241350_ema_246247.log.

Change 545852 merged by Ema:
[operations/puppet@production] cp3056: add to cache::nodes

https://gerrit.wikimedia.org/r/545852

Change 545857 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] Add new esams cp hosts to cache::nodes

https://gerrit.wikimedia.org/r/545857

Change 545857 merged by Ema:
[operations/puppet@production] Add new esams cp hosts to cache::nodes

https://gerrit.wikimedia.org/r/545857

Mentioned in SAL (#wikimedia-operations) [2019-10-24T14:16:31Z] <ema> power-cycle cp3056, stuck rebooting into d-i T233242

Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts:

['cp3058.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241418_ema_251786.log.

@BBlack here is the information for the CP servers in rack 16

cp3061 : xe-6/0/15
cp3062: xe-6/0/16
cp3063: xe-6/0/17
cp3064: xe-6/0/18
cp3065: xe-6/0/19

Completed auto-reimage of hosts:

['cp3058.esams.wmnet']

and were ALL successful.

Change 545880 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] esams: mgmt dns for rack 16

https://gerrit.wikimedia.org/r/545880

Change 545880 merged by BBlack:
[operations/dns@master] esams: mgmt dns for rack 16

https://gerrit.wikimedia.org/r/545880

Change 545893 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] Add dhcp macaddrs for esams rack 16 hosts

https://gerrit.wikimedia.org/r/545893

Change 545893 merged by BBlack:
[operations/puppet@production] Add dhcp macaddrs for esams rack 16 hosts

https://gerrit.wikimedia.org/r/545893

Change 545895 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/dns@master] esams mgmt dns for rack 14

https://gerrit.wikimedia.org/r/545895

Change 545895 merged by BBlack:
[operations/dns@master] esams mgmt dns for rack 14

https://gerrit.wikimedia.org/r/545895

Change 545908 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] esams: macaddrs for all new rack 14 hosts

https://gerrit.wikimedia.org/r/545908

Change 545908 merged by BBlack:
[operations/puppet@production] esams: macaddrs for all new rack 14 hosts

https://gerrit.wikimedia.org/r/545908

Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts:

['cp3061.esams.wmnet', 'cp3062.esams.wmnet', 'cp3063.esams.wmnet', 'cp3064.esams.wmnet', 'cp3065.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241835_bblack_44406.log.

Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts:

['cp3050.esams.wmnet', 'cp3051.esams.wmnet', 'cp3052.esams.wmnet', 'cp3053.esams.wmnet', 'cp3054.esams.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201910241835_bblack_44412.log.

Completed auto-reimage of hosts:

['cp3062.esams.wmnet', 'cp3063.esams.wmnet', 'cp3061.esams.wmnet', 'cp3064.esams.wmnet', 'cp3065.esams.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['cp3053.esams.wmnet', 'cp3050.esams.wmnet', 'cp3051.esams.wmnet', 'cp3052.esams.wmnet', 'cp3054.esams.wmnet']

and were ALL successful.

BBlack updated the task description. (Show Details)

As a batch these servers are complete in general. Note cp3056 had an early hardware issue that prevented progress, but this is tracked separately in: T236497