Page MenuHomePhabricator

(Need By: TBD) rack/setup/install rdb20[09|10]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of rdb2009 & rdb2010. These hosts will replace rdb200[56].

Hostname / Racking / Installation Details

Hostnames: rdb2009, rdb2010
Racking Proposal: rdb2009: C3, rdb2010 D5
Networking/Subnet/VLAN/IP: rdb2009: 10.192.32.0/24 rdb2010: 10.192.48.0/24
Partitioning/Raid: Is this hardware or software raid and what raid levels should be applied to each disk? What are the partitioning requirements and is there an existing partman recipe?
OS Distro: Stretch

Per host setup checklist

rdb2009: row C rack c3 U17 ge-3/0/15

  • - receive in system on procurement task T264354 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

rdb2010: Row D rack D5 U5 ge-5/0/4

  • - receive in system on procurement task T264354 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@jijiki is it okay to rack rdb2009 in C3 and not in C5?

Change 641441 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: ADD production DNS for rdb2009 and rdb2010

https://gerrit.wikimedia.org/r/641441

Change 641441 merged by Papaul:
[operations/dns@master] DNS: ADD production DNS for rdb2009 and rdb2010

https://gerrit.wikimedia.org/r/641441

Change 641475 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for rdb200[9][10]

https://gerrit.wikimedia.org/r/641475

Change 641475 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for rdb200[9][10]

https://gerrit.wikimedia.org/r/641475

Change 641478 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add rdb200[9][10] to site.pp

https://gerrit.wikimedia.org/r/641478

Change 641478 merged by Papaul:
[operations/puppet@production] Add rdb200[9][10] to site.pp

https://gerrit.wikimedia.org/r/641478

Change 641480 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] FIX typo on rdb2010

https://gerrit.wikimedia.org/r/641480

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

rdb2009.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011171852_pt1979_12694_rdb2009_codfw_wmnet.log.

Change 641480 merged by Papaul:
[operations/puppet@production] FIX typo on rdb2010

https://gerrit.wikimedia.org/r/641480

@Volans another error on on the auto-reimage

2020-11-17 19:35:48 [ERROR] (pt1979) wmf-auto-reimage::check_uptime: Unable to determine uptime of host 'rdb2009.codfw.wmnet': Warning: Permanently added the ECDSA host key for IP address '2620:0:860:103:10:192:32:8' to the list of known hosts.
1234.86 59152.42
19:17:39 | rdb2009.codfw.wmnet | Still waiting for reboot after 5.0 minutes
19:22:41 | rdb2009.codfw.wmnet | Still waiting for reboot after 10.0 minutes
19:27:43 | rdb2009.codfw.wmnet | Still waiting for reboot after 15.0 minutes
19:32:46 | rdb2009.codfw.wmnet | Still waiting for reboot after 20.0 minutes
19:37:48 | rdb2009.codfw.wmnet | Still waiting for reboot after 25.0 minutes

Completed auto-reimage of hosts:

['rdb2009.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

rdb2010.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011171941_pt1979_25569_rdb2010_codfw_wmnet.log.

Completed auto-reimage of hosts:

['rdb2010.codfw.wmnet']

and were ALL successful.

@Volans another error on on the auto-reimage

2020-11-17 19:35:48 [ERROR] (pt1979) wmf-auto-reimage::check_uptime: Unable to determine uptime of host 'rdb2009.codfw.wmnet': Warning: Permanently added the ECDSA host key for IP address '2620:0:860:103:10:192:32:8' to the list of known hosts.
1234.86 59152.42
19:17:39 | rdb2009.codfw.wmnet | Still waiting for reboot after 5.0 minutes
19:22:41 | rdb2009.codfw.wmnet | Still waiting for reboot after 10.0 minutes
19:27:43 | rdb2009.codfw.wmnet | Still waiting for reboot after 15.0 minutes
19:32:46 | rdb2009.codfw.wmnet | Still waiting for reboot after 20.0 minutes
19:37:48 | rdb2009.codfw.wmnet | Still waiting for reboot after 25.0 minutes

@Papaul thanks for letting me know. Unfortunately it seems that at that time the queue size on Puppetdb in codfw was huge, see grafana:

Screenshot 2020-11-19 at 17.25.26.png (1×1 px, 178 KB)

cc @jbond as it could be related to the changes in puppet made the same day although at first sight I think the times don't correlate.