Page MenuHomePhabricator

(Need by: TBD) rack/setup/install kubernetes10[07-14].eqiad.wmnet
Closed, ResolvedPublic

Description

Please note these systems were ordered on T233639, and DC-Ops needs feedback from serviceops on the split between mw and kubenetes systems ordered on that task.

serviceops: Please update this task to denote how many of the 37 total systems are to be used for eqiad kubenetes, and how much will be used for eqiad mw systems. Please also fill out the racking proposal, detailing how you want these new systems split across the datacenter. (Are these replacing existing hosts or adding to the cluster?)

Racking Proposal:
1 hosts in each of the following racks: A 1, A 4, B 3, B 5, C 3, C 7, D 1, D 6

Hostnames: kubernetes10[07-14].eqiad.wmnet

This checklist should be duplicated for EVERY SINGLE HOST:

  • - receive in system on procurement task T233639
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - add to site.pp role(insetup)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - resolve this task once all task are completed

Event Timeline

RobH triaged this task as Medium priority.Jan 3 2020, 5:09 PM
RobH created this task.
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
wiki_willy renamed this task from rack/setup/install new eqiad kubenetes systems to (No Need By Date Provided) rack/setup/install new eqiad kubenetes systems.Jan 3 2020, 7:01 PM
jijiki renamed this task from (No Need By Date Provided) rack/setup/install new eqiad kubenetes systems to (No Need By Date Provided) rack/setup/install kubernetes10[08-15].eqiad.wmnet.Jan 9 2020, 2:46 PM
jijiki updated the task description. (Show Details)
jijiki updated the task description. (Show Details)
jijiki renamed this task from (No Need By Date Provided) rack/setup/install kubernetes10[08-15].eqiad.wmnet to (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet.Jan 9 2020, 3:47 PM
jijiki updated the task description. (Show Details)
jijiki updated the task description. (Show Details)Jan 9 2020, 3:53 PM
wiki_willy reassigned this task from Joe to Jclark-ctr.Jan 9 2020, 5:09 PM
jijiki updated the task description. (Show Details)Jan 10 2020, 3:36 PM

@Jclark-ctr Can you provide a date that is convenient for you for racking these? Thank you!

++ @Cmjohnson / @Jclark-ctr - just following up on Effie's previous comment, can you guys decide on a doable turnover date for this one? Thanks, Willy

RobH renamed this task from (No Need By Date Provided) rack/setup/install kubernetes10[07-14].eqiad.wmnet to (Need by: TBD) rack/setup/install kubernetes10[07-14].eqiad.wmnet.Feb 24 2020, 9:10 PM

Hi @jijiki - chatted with John on this a bit earlier today. He'll prioritize getting these racked, along with a couple other installs, in early March. Thanks, Willy

RobH removed a subscriber: RobH.Mar 3 2020, 6:00 PM

name rack_name position. Switchport
kubernetes1007 A3 8 16
kubernetes1008 A5 17 20
kubernetes1009 B3 18 32
kubernetes1010 B5 34 35
kubernetes1011 C3 7 12
kubernetes1012 C5 6 5
kubernetes1013 D3 11 7
kubernetes1014 D6 30 29

Jclark-ctr updated the task description. (Show Details)Mar 7 2020, 3:53 PM
wiki_willy added a subscriber: Christopher.

Change 589068 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Add mgmt dns for kubernetes100[7-14]

https://gerrit.wikimedia.org/r/589068

Change 589068 merged by Cmjohnson:
[operations/dns@master] Add mgmt dns for kubernetes100[7-14]

https://gerrit.wikimedia.org/r/589068

Cmjohnson updated the task description. (Show Details)Apr 15 2020, 6:24 PM
Cmjohnson updated the task description. (Show Details)Mon, May 4, 3:55 PM

Change 594224 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns ipv4 only kubernetes1007-1014

https://gerrit.wikimedia.org/r/594224

Change 594224 merged by Cmjohnson:
[operations/dns@master] Adding production dns ipv4 only kubernetes1007-1014

https://gerrit.wikimedia.org/r/594224

Cmjohnson updated the task description. (Show Details)Mon, May 4, 4:16 PM

Change 594236 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding new kubernetes servers kubernetes100[7-9]|101[0-4] to site.pp

https://gerrit.wikimedia.org/r/594236

Change 594237 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Add netboot.cfg and dhcpd file for kubernetes100[7-9]|1010-14

https://gerrit.wikimedia.org/r/594237

Change 594236 abandoned by Cmjohnson:
Adding new kubernetes servers kubernetes100[7-9]|101[0-4] to site.pp

https://gerrit.wikimedia.org/r/594236

Change 594237 abandoned by Cmjohnson:
Add netboot.cfg and dhcpd file for kubernetes100[7-9]|1010-14

https://gerrit.wikimedia.org/r/594237

Change 594927 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding kubernetes1007=1014 to netboot.cfg and dhcpd file

https://gerrit.wikimedia.org/r/594927

Change 594927 merged by Cmjohnson:
[operations/puppet@production] Adding kubernetes1007=1014 to netboot.cfg and dhcpd file

https://gerrit.wikimedia.org/r/594927

Change 594928 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding new kubernetes nodes to site.pp role insetup

https://gerrit.wikimedia.org/r/594928

Change 594928 merged by Cmjohnson:
[operations/puppet@production] Adding new kubernetes nodes to site.pp role insetup

https://gerrit.wikimedia.org/r/594928

Cmjohnson updated the task description. (Show Details)Thu, May 7, 11:29 AM
Cmjohnson updated the task description. (Show Details)Thu, May 7, 12:32 PM

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1007.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071424_cmjohnson_150680_kubernetes1007_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1008.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071425_cmjohnson_150788_kubernetes1008_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1009.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071426_cmjohnson_150891_kubernetes1009_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1010.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071435_cmjohnson_153022_kubernetes1010_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1011.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071439_cmjohnson_154505_kubernetes1011_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['kubernetes1009.eqiad.wmnet']

Of which those FAILED:

['kubernetes1009.eqiad.wmnet']

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1009.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071442_cmjohnson_157086_kubernetes1009_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['kubernetes1008.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['kubernetes1010.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['kubernetes1011.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['kubernetes1009.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1012.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071502_cmjohnson_161792_kubernetes1012_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1013.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071503_cmjohnson_161861_kubernetes1013_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1014.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071503_cmjohnson_161889_kubernetes1014_eqiad_wmnet.log.

Cmjohnson updated the task description. (Show Details)Thu, May 7, 3:13 PM

Completed auto-reimage of hosts:

['kubernetes1012.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['kubernetes1013.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1014.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005071533_cmjohnson_168475_kubernetes1014_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['kubernetes1014.eqiad.wmnet']

Of which those FAILED:

['kubernetes1014.eqiad.wmnet']

All but 1014 have been image, I think I have a bad network cable for 1014. I have scheduled a quick trip to the data center this afternoon to take care of it.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

kubernetes1014.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202005081636_cmjohnson_84350_kubernetes1014_eqiad_wmnet.log.

The cable was bad and was not getting a link light, swapped the network cable and imaging 1014 now

Completed auto-reimage of hosts:

['kubernetes1014.eqiad.wmnet']

and were ALL successful.

Cmjohnson closed this task as Resolved.Fri, May 8, 5:26 PM
Cmjohnson updated the task description. (Show Details)

these are ready! resolving