Page MenuHomePhabricator

(Need By: TBD) rack/setup/install rdb20[09|10]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of rdb2009 & rdb2010. These hosts will replace rdb200[56].

Hostname / Racking / Installation Details

Hostnames: rdb2009, rdb2010
Racking Proposal: rdb2009: C3, rdb2010 D5
Networking/Subnet/VLAN/IP: rdb2009: 10.192.32.0/24 rdb2010: 10.192.48.0/24
Partitioning/Raid: Is this hardware or software raid and what raid levels should be applied to each disk? What are the partitioning requirements and is there an existing partman recipe?
OS Distro: Stretch

Per host setup checklist

rdb2009: row C rack c3 U17 ge-3/0/15

  • - receive in system on procurement task T264354 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

rdb2010: Row D rack D5 U5 ge-5/0/4

  • - receive in system on procurement task T264354 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH created this task.Oct 28 2020, 10:07 PM
Restricted Application added a project: SRE. · View Herald TranscriptOct 28 2020, 10:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.Oct 28 2020, 10:07 PM
RobH removed a subscriber: RobH.
Papaul claimed this task.Nov 9 2020, 4:43 PM
Papaul added a subscriber: jijiki.Nov 9 2020, 4:46 PM

@jijiki is it okay to rack rdb2009 in C3 and not in C5?

jijiki added a comment.Nov 9 2020, 6:40 PM

@Papaul That is fine, thank you!

Papaul updated the task description. (Show Details)Nov 9 2020, 7:20 PM
Papaul updated the task description. (Show Details)Nov 16 2020, 6:43 PM
Papaul updated the task description. (Show Details)Nov 16 2020, 8:28 PM

Change 641441 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: ADD production DNS for rdb2009 and rdb2010

https://gerrit.wikimedia.org/r/641441

Change 641441 merged by Papaul:
[operations/dns@master] DNS: ADD production DNS for rdb2009 and rdb2010

https://gerrit.wikimedia.org/r/641441

Papaul updated the task description. (Show Details)Nov 17 2020, 3:58 PM
Papaul updated the task description. (Show Details)Nov 17 2020, 4:37 PM
Papaul updated the task description. (Show Details)Nov 17 2020, 5:56 PM

Change 641475 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for rdb200[9][10]

https://gerrit.wikimedia.org/r/641475

Change 641475 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for rdb200[9][10]

https://gerrit.wikimedia.org/r/641475

Change 641478 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add rdb200[9][10] to site.pp

https://gerrit.wikimedia.org/r/641478

Change 641478 merged by Papaul:
[operations/puppet@production] Add rdb200[9][10] to site.pp

https://gerrit.wikimedia.org/r/641478

Papaul updated the task description. (Show Details)Nov 17 2020, 6:44 PM

Change 641480 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] FIX typo on rdb2010

https://gerrit.wikimedia.org/r/641480

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

rdb2009.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011171852_pt1979_12694_rdb2009_codfw_wmnet.log.

Change 641480 merged by Papaul:
[operations/puppet@production] FIX typo on rdb2010

https://gerrit.wikimedia.org/r/641480

Papaul added a subscriber: Volans.Nov 17 2020, 7:38 PM

@Volans another error on on the auto-reimage

2020-11-17 19:35:48 [ERROR] (pt1979) wmf-auto-reimage::check_uptime: Unable to determine uptime of host 'rdb2009.codfw.wmnet': Warning: Permanently added the ECDSA host key for IP address '2620:0:860:103:10:192:32:8' to the list of known hosts.
1234.86 59152.42
19:17:39 | rdb2009.codfw.wmnet | Still waiting for reboot after 5.0 minutes
19:22:41 | rdb2009.codfw.wmnet | Still waiting for reboot after 10.0 minutes
19:27:43 | rdb2009.codfw.wmnet | Still waiting for reboot after 15.0 minutes
19:32:46 | rdb2009.codfw.wmnet | Still waiting for reboot after 20.0 minutes
19:37:48 | rdb2009.codfw.wmnet | Still waiting for reboot after 25.0 minutes

Completed auto-reimage of hosts:

['rdb2009.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

rdb2010.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011171941_pt1979_25569_rdb2010_codfw_wmnet.log.

Papaul updated the task description. (Show Details)Nov 17 2020, 7:44 PM

Completed auto-reimage of hosts:

['rdb2010.codfw.wmnet']

and were ALL successful.

Papaul updated the task description. (Show Details)Nov 17 2020, 8:22 PM
Papaul closed this task as Resolved.Nov 17 2020, 8:27 PM

@jijiki this is complete

Volans added a subscriber: jbond.Nov 19 2020, 4:27 PM

@Volans another error on on the auto-reimage

2020-11-17 19:35:48 [ERROR] (pt1979) wmf-auto-reimage::check_uptime: Unable to determine uptime of host 'rdb2009.codfw.wmnet': Warning: Permanently added the ECDSA host key for IP address '2620:0:860:103:10:192:32:8' to the list of known hosts.
1234.86 59152.42
19:17:39 | rdb2009.codfw.wmnet | Still waiting for reboot after 5.0 minutes
19:22:41 | rdb2009.codfw.wmnet | Still waiting for reboot after 10.0 minutes
19:27:43 | rdb2009.codfw.wmnet | Still waiting for reboot after 15.0 minutes
19:32:46 | rdb2009.codfw.wmnet | Still waiting for reboot after 20.0 minutes
19:37:48 | rdb2009.codfw.wmnet | Still waiting for reboot after 25.0 minutes

@Papaul thanks for letting me know. Unfortunately it seems that at that time the queue size on Puppetdb in codfw was huge, see grafana:

cc @jbond as it could be related to the changes in puppet made the same day although at first sight I think the times don't correlate.