Page MenuHomePhabricator

(Need by: TBD) rack/setup/install kubernetes20[07-14].codfw.wmnet and kubestage200[1-2].codfw.wmnet.
Closed, ResolvedPublic

Description

Split of from T241852.

Of the remaining 20 nodes left from T241852, 10 need to become kubernetes hosts. Those will be kubernetes20[07-14].codfw.wmnet and kubestage200[1-2].codfw.wmnet.

Racking proposals:

kubernetes20[07-14].codfw.wmnet
Those are 1G hosts. We want 2 per rack row, racks themselves are utterly unimportant, feel free to pick whatever rack suits you best.
kubernetes2007 Rack A6U21 ge-6/023
kubernetes2008 Rack A6U29 ge-6/013
kubernetes2009 Rack B6U17 ge-6/0/18
kubernetes2010 Rack B6U18 ge-6/0/21
kubernetes2011 Rack C1U29 ge-1/0/21
kubernetes2012 Rack C1U30 ge-/1/0/22
kubernetes2013 Rack D6U2 ge-/6/0/1
kubernetes2014 Rack D6U3 ge-/6/0/2

  • - receive in system on procurement task T231255
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing :
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

kubestage200[1-2].codfw.wmnet Rack A1U28 and rack B1U5
Those are 1G hosts as well. 1 in row A and row B, whatever rack suits you best.
kubestage2001 Rack A1U28 ge-1/0/2
kubestage2002 Rack B1U5 ge-1/0/21

  • - receive in system on procurement task T231255
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing :
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

Papaul triaged this task as Medium priority.May 8 2020, 2:32 PM

Change 597403 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for kubestage200[1-2], kubernetes200[7-14]

https://gerrit.wikimedia.org/r/597403

Change 597403 merged by Papaul:
[operations/dns@master] DNS: Add mgmt and production DNS for kubestage200[1-2], kubernetes200[7-14]

https://gerrit.wikimedia.org/r/597403

Change 597595 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add kubestage200[1-2] , kubernetes200[7-14] MAC entries and to site.pp with role insetup

https://gerrit.wikimedia.org/r/597595

Change 597595 merged by Papaul:
[operations/puppet@production] Add kubestage200[1-2] , kubernetes200[7-14] MAC entries and to site.pp

https://gerrit.wikimedia.org/r/597595

Change 597815 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Partman: Add kubestsge200[1-2] kubernetes200[7-14]

https://gerrit.wikimedia.org/r/597815

Change 597815 merged by Papaul:
[operations/puppet@production] Partman: Add kubestsge200[1-2] kubernetes200[7-14]

https://gerrit.wikimedia.org/r/597815

Change 597856 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Fix typo for kubernetes2007 to kubernetes2014

https://gerrit.wikimedia.org/r/597856

Change 597856 merged by Papaul:
[operations/puppet@production] Fix typo for kubernetes2007 to kubernetes2014

https://gerrit.wikimedia.org/r/597856

@akosiaris please see below what i am getting from kubestage2001 and kubernetes2007

You may use the whole volume group for guided partitioning, or part     │
  │ of it. If you use only part of it, or if you add more disks later,      │
  │ then you will be able to grow logical volumes later using the LVM       │
  │ tools, so using a smaller part of the volume group at installation      │
  │ time may offer more flexibility.                                        │
  │                                                                         │
  │ The minimum size of the selected partitioning recipe is 1.0 GB (or      │
  │ 0%); please note that the packages you choose to install may require    │
  │ more space than this. The maximum available size is 930.1 GB.           │
  │ Hint: "max" can be used as a shortcut to specify the maximum size, or   │
  │ enter a percentage (e.g. "20%") to use that percentage of the maximum   │
  │                                                                         │
  │ 930.1 GB_____________________________________________________________

Change 598069 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] install: Switch kubernetes/kubestage2XXX to stretch

https://gerrit.wikimedia.org/r/598069

Change 598069 merged by Alexandros Kosiaris:
[operations/puppet@production] install: Switch kubernetes/kubestage2XXX to stretch

https://gerrit.wikimedia.org/r/598069

We debugged this with @Papaul, patch above resolves it. Whilte at it, I want to do a fully reimage test with these hosts to make sure we can reimage/setup a new host with 0 operator intervention, so grabbing task. @Papaul, thanks!

Change 598254 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] kubernetes2007-2014: Move to role kubernetes::worker

https://gerrit.wikimedia.org/r/598254

Change 598254 merged by Alexandros Kosiaris:
[operations/puppet@production] kubernetes2007-2014: Move to role kubernetes::worker

https://gerrit.wikimedia.org/r/598254

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['kubernetes2007.codfw.wmnet', 'kubernetes2008.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2012.codfw.wmnet', 'kubernetes2013.codfw.wmnet', 'kubernetes2014.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202005241415_akosiaris_7184.log.

Completed auto-reimage of hosts:

['kubernetes2007.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2008.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2014.codfw.wmnet', 'kubernetes2013.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2012.codfw.wmnet']

Of which those FAILED:

['kubernetes2007.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2008.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2014.codfw.wmnet', 'kubernetes2013.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2012.codfw.wmnet']

Change 605233 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] kubernetes: Add more nodes to the 2 clusters

https://gerrit.wikimedia.org/r/605233

Change 605233 merged by Alexandros Kosiaris:
[operations/puppet@production] kubernetes: Add more nodes to the 2 clusters

https://gerrit.wikimedia.org/r/605233

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['kubernetes2007.codfw.wmnet', 'kubernetes2008.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2012.codfw.wmnet', 'kubernetes2013.codfw.wmnet', 'kubernetes2014.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006121529_akosiaris_235519.log.

Completed auto-reimage of hosts:

['kubernetes2013.codfw.wmnet', 'kubernetes2012.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2014.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2008.codfw.wmnet']

Of which those FAILED:

['kubernetes2013.codfw.wmnet', 'kubernetes2012.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2014.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2008.codfw.wmnet']

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['kubernetes2008.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2012.codfw.wmnet', 'kubernetes2013.codfw.wmnet', 'kubernetes2014.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006140754_akosiaris_145692.log.

Change 605400 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] install: Switch kubernetes2004 to stretch

https://gerrit.wikimedia.org/r/605400

kubernetes2007 has been reimage successfully, it seems like kubernetes2008 to kubernetes2014 require networking configuration on the switch side.

kubernetes2007 has been reimage successfully, it seems like kubernetes2008 to kubernetes2014 require networking configuration on the switch side.

Netbox was alerting with

kubernetes2007 unexpected state for physical device: Planned in netbox

I changed it to active based on the comment above.

Change 605400 merged by Alexandros Kosiaris:
[operations/puppet@production] install: Switch kubernetes2014 to stretch

https://gerrit.wikimedia.org/r/605400

Change 605841 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Add IPv6 address to all kubernetes nodes

https://gerrit.wikimedia.org/r/605841

Change 605841 merged by Alexandros Kosiaris:
[operations/dns@master] Add IPv6 address to all kubernetes nodes

https://gerrit.wikimedia.org/r/605841

Change 605848 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/homer/public@master] Add kubernetes[12]007-kubernetes[12]014 to BGP

https://gerrit.wikimedia.org/r/605848

[edit interfaces interface-range vlan-private1-a-codfw]
     member ge-5/0/8 { ... }
+    member ge-6/0/13;
[edit interfaces interface-range disabled]
-    member ge-6/0/13;
[edit interfaces]
+   ge-6/0/13 {
+       description kubernetes2008;
+   }
[edit interfaces interface-range vlan-private1-b-codfw]
     member ge-1/0/22 { ... }
+    member ge-6/0/18;
+    member ge-6/0/21;
[edit interfaces interface-range disabled]
-    member ge-6/0/21;
-    member ge-6/0/18;
[edit interfaces]
+   ge-6/0/18 {
+       description kubernetes2009;
+   }
+   ge-6/0/21 {
+       description kubernetes2010;
+   }
[edit interfaces interface-range vlan-private1-c-codfw]
     member ge-3/0/2 { ... }
+    member ge-1/0/21;
+    member ge-1/0/22;
[edit interfaces interface-range disabled]
-    member ge-1/0/21;
-    member ge-1/0/22;
[edit interfaces]
+   ge-1/0/21 {
+       description kubernetes2011;
+   }
+   ge-1/0/22 {
+       description kubernetes2012;
+   }
edit interfaces interface-range vlan-private1-d-codfw]
     member xe-7/0/0 { ... }
+    member ge-6/0/1;
+    member ge-6/0/2;
[edit interfaces interface-range disabled]
-    member ge-6/0/2;
-    member ge-6/0/1;
[edit interfaces]
+   ge-6/0/1 {
+       description kubernetes2013;
+   }
+   ge-6/0/2 {
+       description kubernetes2014;
+   }
[edit interfaces interface-range vlan-private1-b-codfw]
     member ge-6/0/21 { ... }
+    member ge-1/0/21;
[edit interfaces interface-range disabled]
-    member ge-1/0/21;
[edit interfaces]
+   ge-1/0/21 {
+       description kubestage2002;
+   }

@akosiaris switch configuration done for all servers.

Change 605848 merged by jenkins-bot:
[operations/homer/public@master] Add kubernetes[12]007-kubernetes[12]014 to BGP

https://gerrit.wikimedia.org/r/605848

Script wmf-auto-reimage was launched by akosiaris on cumin2001.codfw.wmnet for hosts:

['kubernetes2008.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2012.codfw.wmnet', 'kubernetes2013.codfw.wmnet', 'kubernetes2014.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006171245_akosiaris_25622.log.

Completed auto-reimage of hosts:

['kubernetes2013.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2008.codfw.wmnet', 'kubernetes2014.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2012.codfw.wmnet']

Of which those FAILED:

['kubernetes2013.codfw.wmnet', 'kubernetes2011.codfw.wmnet', 'kubernetes2008.codfw.wmnet', 'kubernetes2014.codfw.wmnet', 'kubernetes2010.codfw.wmnet', 'kubernetes2009.codfw.wmnet', 'kubernetes2012.codfw.wmnet']

Change 647728 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] kubestage2*: Assign role

https://gerrit.wikimedia.org/r/647728

Script wmf-auto-reimage was launched by akosiaris on cumin1001.eqiad.wmnet for hosts:

['kubestage2001.codfw.wmnet', 'kubestage2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202012110938_akosiaris_1276.log.

Completed auto-reimage of hosts:

['kubestage2002.codfw.wmnet']

Of which those FAILED:

['kubestage2001.codfw.wmnet']

Change 647728 merged by Alexandros Kosiaris:
[operations/puppet@production] kubestage2*: Assign role

https://gerrit.wikimedia.org/r/647728

akosiaris updated the task description. (Show Details)

And finally being now used. Resolving this.