Page MenuHomePhabricator

Toolforge: new k8s: initial build of the new kubernetes cluster
Open, Needs TriagePublic

Description

Following the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Deploying_k8s

  • For workers, we can introduce the k8s keyword somewhere in the name and single digit. tools-k8s-worker-1, tools-k8s-worker-12
  • For control servers, we can introduce a new puppet prefix for control nodes. Something like tools-k8s-control-1, tools-k8s-control-3
  • For haproxy servers, use tools-k8s-haproxy-x
  • For etcd servers, we are already using tools-k8s-etcd-x, so we will introduce some puppet switch if jessie then old role; if buster then new role

Details

Related Gerrit Patches:

Event Timeline

aborrero created this task.Tue, Oct 29, 5:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Oct 29, 5:56 PM

Change 546995 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: k8s: introduce switch for etcd role

https://gerrit.wikimedia.org/r/546995

Change 546995 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: k8s: introduce switch for etcd role
https://gerrit.wikimedia.org/r/546995

Test this first in toolsbeta!

Note for @aborrero: some hiera keys can be improved for the new k8s cluster.

Change 546995 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] toolforge: k8s: introduce switch for etcd role
https://gerrit.wikimedia.org/r/546995

I tested this in toolsbeta. Applied the role::wmcs::toolforge::k8s::etcd role to the toolsbeta-k8s-etcd- puppet prefix:

aborrero@toolsbeta-k8s-etcd-01:~$ sudo puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for toolsbeta-k8s-etcd-01.toolsbeta.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1572528732'
Notice: The LDAP client stack for this host is: classic/sudoldap
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic/sudoldap'
Notice: /Stage[main]/Role::Wmcs::Toolforge::K8s::Etcd/System::Role[role::wmcs::toolforge::k8s::etcd]/Motd::Script[role-wmcs::toolforge::k8s::etcd]/File[/etc/update-motd.d/05-role-wmcs--toolforge--k8s--etcd]/ensure: defined content as '{md5}87ea45c76121f03d6070a5f5998ef7f3'
Notice: Applied catalog in 22.54 seconds

Change 546995 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] toolforge: k8s: introduce switch for etcd role

https://gerrit.wikimedia.org/r/546995

Mentioned in SAL (#wikimedia-cloud) [2019-10-31T13:59:31Z] <arturo> update puppet prefix tools-k8s-etcd- to use the role::wmcs::toolforge::k8s::etcd T236826

Mentioned in SAL (#wikimedia-cloud) [2019-11-05T13:55:28Z] <arturo> created 3 new VMs: tools-k8s-etcd-[4,5,6] T236826

aborrero updated the task description. (Show Details)Tue, Nov 5, 3:41 PM

Mentioned in SAL (#wikimedia-cloud) [2019-11-06T13:43:16Z] <arturo> created tools-k8s-control puppet prefix T236826

Mentioned in SAL (#wikimedia-cloud) [2019-11-06T13:50:56Z] <arturo> created 3 VMs`tools-k8s-control-[1,2,3]` (T236826)

Mentioned in SAL (#wikimedia-cloud) [2019-11-06T13:51:32Z] <arturo> created FQDN k8s.tools.eqiad1.wikimedia.cloud pointing to tools-k8s-control-1 for the initial bootstrap (T236826)

Note for later, investigate if this is important for us:

root@tools-k8s-control-1:~# kubeadm init --config /etc/kubernetes/kubeadm-init.yaml --upload-certs
[init] Using Kubernetes version: v1.15.1
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.4. Latest validated version: 18.09
[...]

cc @Bstorm

Mentioned in SAL (#wikimedia-cloud) [2019-11-06T16:10:36Z] <arturo> new k8s cluster control nodes are bootstrapped (T236826)

Mentioned in SAL (#wikimedia-cloud) [2019-11-07T11:43:08Z] <arturo> create puppet prefix tools-k8s-haproxy T236826

Mentioned in SAL (#wikimedia-cloud) [2019-11-07T11:43:25Z] <arturo> create VMs tools-k8s-haproxy-[1,2] T236826

Mentioned in SAL (#wikimedia-cloud) [2019-11-07T11:54:10Z] <arturo> point k8s.tools.eqiad1.wikimedia.cloud to tools-k8s-haproxy-1 T236826

I increased project quotas to be able to continue with this task T237633: Request increased quota for tools Cloud VPS project

Mentioned in SAL (#wikimedia-cloud) [2019-11-07T13:01:02Z] <arturo> creating puppet prefix tools-k8s-worker and a couple of VMs tools-k8s-worker-[1,2] T236826

Mentioned in SAL (#wikimedia-cloud) [2019-11-07T13:27:16Z] <arturo> deployed registry-admission-webhook and ingress-admission-controller into the new k8s cluster (T236826)

The initial build has been completed: 2 haproxy nodes, 3 control nodes, 2 worker nodes, 3 etcd nodes.