Page MenuHomePhabricator

rack/setup/install kubestage100[12]
Closed, ResolvedPublic

Description

This task will track the receiving, racking, and setup of the two new kubernetes staging hosts ordered for eqiad on T163459.

Hostname proposal: There are already kubernetes1XXX systems in eqiad, as well as the conf1XXX systems running conftool for kubernetes. Staging and development are often done on the same box, would these be acceptable to call kubernetes-dev1XXX?

Racking proposal: Since none of these boxes currently exist in eqiad, the current suggestion is to place in two different racks/rows, but otherwise in any available 1Gbps network racks.

kubestage1001:

  • - receive in system on procurement task T163459
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - production dns entries added (internal subnet)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/#/c/357890/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

kubestage1002:

  • - receive in system on procurement task T163459
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - production dns entries added (internal subnet)
  • - network port setup (description, enable, internal vlan)
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/#/c/357890/
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

Event Timeline

Since this initially was requested by Alex, I've assigned it to him for feedback on the hostname and racking proposals. Please provide feedback, and assign to @Cmjohnson for followup.

Thanks!

Actually conf1XXX are for conftool, NOT kubernetes. etcd1XXX are for kubernetes, but are going to be renamed to kubetcd1XXX (codfw is already naming them like that) to have the functionality more clearly designated in the hostname.

Now as to the actual naming proposal, I 'd rather we did not do kubernetes-dev1XXX as staging and development are going to be separate in this. Staging in the context we are aiming for here is the last step right before deployment to production. Also the staging cluster is going to be the one integrated with CI (at least in the current vision). I 'd prefer a hostnaming scheme of kubernetes-staging1XXX to more clearly designate that.

As far as the racking proposal goes, it sounds fine to me. Note that since this is the staging cluster we have way more leeway as it will not be critical in any way, but still it would be nice if it could survive a rack row failure.

@akosiaris Would it be possible to shorten kubernetes-staging1XXX, I can't fit that all on a label. If you prefer it's fine I will should abbreviate it on the server itself. Seems like a lot to type imho. Maybe kstage1xxx or staging1xxx

Also, I am assuming you want these in 2 separate rows?

kubestage1xxxx ? It's one letter shorter than kubernetes1xxx which is the production boxes and still conveys the meaning IMHO.

And yes if possible 2 separate rows please.

Change 355794 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns entries for kubestage1001/2 T166264

https://gerrit.wikimedia.org/r/355794

Change 355794 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns entries for kubestage1001/2 T166264

https://gerrit.wikimedia.org/r/355794

Change 357860 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding mac addresses to dhcpd file for several systems, wtp1025-1046, stat1005-1006, ganeti1005-1008, labvirt1015-1018, dumpsdata1001-1002, kubestage1001-1002, analytics1069 task #'s T165173 T165366 T166264 T165531 T165368 T165520 T162216 T166076

https://gerrit.wikimedia.org/r/357860

Change 357860 merged by Cmjohnson:
[operations/puppet@production] Adding mac addresses to dhcpd file for several systems, wtp1025-1046, stat1005-1006, ganeti1005-1008, labvirt1015-1018, dumpsdata1001-1002, kubestage1001-1002, analytics1069 task #'s T165173 T165366 T166264 T165531 T165368 T165520 T162216 T166076

https://gerrit.wikimedia.org/r/357860

Change 357870 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns for several new servers, wtp1025-48, ganeti1005-1008, kubestage1001/1002, dumpsdata1001/2, labvirt1015-18 T165173 T166264 T165531 T165520 T162216 T166076

https://gerrit.wikimedia.org/r/357870

Change 357870 merged by Cmjohnson:
[operations/dns@master] Adding production dns for several new servers, wtp1025-48, ganeti1005-1008, kubestage1001/1002, dumpsdata1001/2, labvirt1015-18 and stat1005/6 T165366 T165368 T165173 T166264 T165531 T165520 T162216 T166076

https://gerrit.wikimedia.org/r/357870

Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: akosiaris.

@RobH added mac address already

RobH renamed this task from rack/setup/instal (2)l kubernetes staging hosts to rack/setup/install kubestage100[12].Jun 8 2017, 7:06 PM

Change 357874 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting install params for kubestage100[12]

https://gerrit.wikimedia.org/r/357874

Change 357874 merged by RobH:
[operations/puppet@production] setting install params for kubestage100[12]

https://gerrit.wikimedia.org/r/357874

Change 357879 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] Revert "Adding mac addresses to dhcpd file for several systems, wtp1025-1046, stat1005-1006, ganeti1005-1008, labvirt1015-1018, dumpsdata1001-1002, kubestage1001-1002, analytics1069 task #'s T165173 T165366 T166264 T165531 T165368 T165520 T162216 T166076"

https://gerrit.wikimedia.org/r/357879

Change 357879 abandoned by RobH:
Revert "Adding mac addresses to dhcpd file for several systems, wtp1025-1046, stat1005-1006, ganeti1005-1008, labvirt1015-1018, dumpsdata1001-1002, kubestage1001-1002, analytics1069 task #'s T165173 T165366 T166264 T165531 T165368 T165520 T162216 T166076"

https://gerrit.wikimedia.org/r/357879

Ok, that large patchset had some issues that borked up dhcp. Rather than try to find the issues in the large one, we reverted it and will make smaller patchsets for individual server/task installs. (not combine tasks across a patchset)

Change 357890 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting kubestage100[12] install params

https://gerrit.wikimedia.org/r/357890

Change 357890 merged by RobH:
[operations/puppet@production] setting kubestage100[12] install params

https://gerrit.wikimedia.org/r/357890

akosiaris updated the task description. (Show Details)

Hosts are up and running, taking over service implementation in T162045

Change 359133 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] kubestage: Set the correct partman recipe

https://gerrit.wikimedia.org/r/359133

Change 359133 merged by Alexandros Kosiaris:
[operations/puppet@production] kubestage: Set the correct partman recipe

https://gerrit.wikimedia.org/r/359133

Change 403310 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Use built GUI for distribution package

https://gerrit.wikimedia.org/r/403310