Page MenuHomePhabricator

kube-apiserver need to reach webhooks running inside of the cluster
Closed, ResolvedPublic

Description

Istio comes with two webhooks by default:

  • A mutating webhook istio-sidecar-injector that we can potentially ignore as we don't use injection
  • A validating webhook istiod-istio-system that is used to validate Istio CRD objects

The latter one we can not ignore as installing istio already triggers a validation request and the kube-apiserver will fail to call the service backing the webhook. This is the scenario described in T285927.

After chatting with @akosiaris we came up with threefour possible solutions:

1. Announce Kubernetes Service IPs via BGP (calico) so they are reachable from outside the cluster

This is what we currently have in staging-codfw as part of the work done in T238909.

Pro:

  • No additional components needed on kubernetes masters
  • Shares the traffic flow with how service traffic would reach the cluster (we're not sure about this yet)

Con:

  • Depends on calico properly announcing Kubernetes Service IPs (which we have not fully implemented yet)
  • Need --masquerade-all on nodes (which effectively hides the real client IP from PODs, https://github.com/kubernetes/kubernetes/issues/24224)
  • Needs calico announced service ips being highly available. This is being worked upstream.
2. Make Kubernetes Masters (tainted) worker nodes

This is what @elukey has implemented for ml clusters in T285927.

Pro:

  • Shares the traffic flow with other intra-cluster traffic to ClusterIPs
  • Kubernetes masters are known to the Kubernetes API (e.g. we can control access via NetworkPolicies and run dedicated workloads on them - istio control plane for example)
  • No dependency on calico announcing Kubernetes Service IPs (like with 1.)

Con:

3. Run kube-proxy on Kubernetes Masters

Just run the kube-proxy process on Kubernetes Masters, essentially providing them with the needed iptables rules to reach ClusterIP services.

Pro:

  • Shares the traffic flow with other intra-cluster traffic to ClusterIPs
  • Less additional components than 2.
  • No dependency on calico announcing Kubernetes Service IPs (like with 1.)

Con:

4. Work around this issue by disabling webhooks

With the outcome of T287007#7431081 this is no longer a viable option

As we potentially won't use Istio CRDs to configure Ingress in first place (see the Configuration part of T287007), we could try to work around this requirement by disabling/not deploying the webhooks at all. I'm not sure if that is possible, though.

Pro:

  • No additional components on the masters
  • No dependency on calico announcing Kubernetes Service IPs (like with 1.)
  • No dependency to istiod (serving the webhooks) from kube-apiserver

Con:

  • Hard deviation from the Istio setup standard
  • We might have to revisit this problem/decision later (for things like OPA or other alternatives to PSPs: T273507)

We're going with option 2, todos:

  • Migrate staging-eqiad
  • Migrate codfw
  • Migrate eqiad
  • Remove unused master.pp parameter profile::kubernetes::master::expose_puppet_certs

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+5 -1
operations/puppetproduction+0 -18
operations/homer/publicmaster+2 -0
operations/puppetproduction+38 -19
operations/puppetproduction+1 -1
operations/homer/publicmaster+2 -0
operations/puppetproduction+88 -5
operations/puppetproduction+6 -0
operations/puppetproduction+6 -5
operations/puppetproduction+3 -1
labs/privatemaster+5 -0
operations/homer/publicmaster+1 -0
operations/puppetproduction+22 -20
operations/deployment-chartsmaster+19 -0
operations/deployment-chartsmaster+2 -14
operations/puppetproduction+2 -2
operations/homer/publicmaster+1 -0
operations/puppetproduction+69 -0
operations/puppetproduction+13 -34
labs/privatemaster+3 -2
operations/puppetproduction+15 -0
operations/puppetproduction+1 -0
operations/puppetproduction+14 -2
labs/privatemaster+1 -0
Show related patches Customize query in gerrit

Event Timeline

Change 754003 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Migrate kube-scheduler away from insecure API

https://gerrit.wikimedia.org/r/754003

Change 754006 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] Add profile::kubernetes::master::scheduler_token to staging

https://gerrit.wikimedia.org/r/754006

Change 754006 merged by JMeybohm:

[labs/private@master] Add profile::kubernetes::master::scheduler_token to staging

https://gerrit.wikimedia.org/r/754006

Change 754003 merged by JMeybohm:

[operations/puppet@production] Migrate kube-scheduler away from insecure API

https://gerrit.wikimedia.org/r/754003

Change 754462 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Add missing notify on kube-scheduler config change

https://gerrit.wikimedia.org/r/754462

Change 754462 merged by JMeybohm:

[operations/puppet@production] Add missing notify on kube-scheduler config change

https://gerrit.wikimedia.org/r/754462

Change 754514 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s-apiserver: Disable insecure API on systems that no longer need it

https://gerrit.wikimedia.org/r/754514

Change 754515 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Make disabled insecure API the default on kubernetes masters

https://gerrit.wikimedia.org/r/754515

Change 754514 merged by JMeybohm:

[operations/puppet@production] k8s-apiserver: Disable insecure API on systems that no longer need it

https://gerrit.wikimedia.org/r/754514

Change 754556 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Update codfw kubernetes master to a full node

https://gerrit.wikimedia.org/r/754556

Change 754945 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/homer/public@master] Add kubestagemaster2001 to k8s_staging iBGP config

https://gerrit.wikimedia.org/r/754945

Change 755389 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] Add scheduler_token to all k8s masters

https://gerrit.wikimedia.org/r/755389

Change 755389 merged by JMeybohm:

[labs/private@master] Add scheduler_token to all k8s masters

https://gerrit.wikimedia.org/r/755389

Change 754515 merged by JMeybohm:

[operations/puppet@production] Make disabled insecure API the default on kubernetes masters

https://gerrit.wikimedia.org/r/754515

Mentioned in SAL (#wikimedia-operations) [2022-01-19T14:33:36Z] <jayme> disabled insecure API on all k8s masters - T290967

Change 754556 merged by JMeybohm:

[operations/puppet@production] Update codfw kubernetes master to a full node

https://gerrit.wikimedia.org/r/754556

Change 754945 merged by jenkins-bot:

[operations/homer/public@master] Add kubestagemaster2001 to k8s_staging eBGP config

https://gerrit.wikimedia.org/r/754945

Change 755698 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Fix nrpe_check_disk_options hiera key for kubernetes staging masters

https://gerrit.wikimedia.org/r/755698

Change 755698 merged by JMeybohm:

[operations/puppet@production] Fix nrpe_check_disk_options hiera key for kubernetes staging masters

https://gerrit.wikimedia.org/r/755698

Change 755920 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Remove the hacks to around masquerade-all

https://gerrit.wikimedia.org/r/755920

Change 755920 merged by jenkins-bot:

[operations/deployment-charts@master] Remove the hacks around masquerade-all

https://gerrit.wikimedia.org/r/755920

Change 755924 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Add master IPs to main/wikikube clusters

https://gerrit.wikimedia.org/r/755924

Change 755924 merged by jenkins-bot:

[operations/deployment-charts@master] Add master IPs to main/wikikube clusters

https://gerrit.wikimedia.org/r/755924

Change 755977 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Upgrade staging-eqiad kubernetes master to a full node

https://gerrit.wikimedia.org/r/755977

Change 755978 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/homer/public@master] Add kubestagemaster1001 to k8s_staging eBGP config

https://gerrit.wikimedia.org/r/755978

Change 755977 merged by JMeybohm:

[operations/puppet@production] Upgrade staging-eqiad kubernetes master to a full node

https://gerrit.wikimedia.org/r/755977

Change 755978 merged by jenkins-bot:

[operations/homer/public@master] Add kubestagemaster1001 to k8s_staging eBGP config

https://gerrit.wikimedia.org/r/755978

Mentioned in SAL (#wikimedia-operations) [2022-01-25T08:32:43Z] <jayme> kubernetes staging migrated tainted worker node setup - T290967

Change 757407 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Enable IPv6DualStack for kubelet on staging masters

https://gerrit.wikimedia.org/r/757407

Change 757408 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Split profile::kubernetes::master_hosts by DC

https://gerrit.wikimedia.org/r/757408

Change 757433 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Enable overlayfs on kubernetes masters

https://gerrit.wikimedia.org/r/757433

Change 757434 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Upgrade codfw kubernetes masters to tainted full nodes

https://gerrit.wikimedia.org/r/757434

Change 757437 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/homer/public@master] Add k8s masters in codfw eBGP config

https://gerrit.wikimedia.org/r/757437

Change 757438 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/homer/public@master] Add k8s masters in eqiad eBGP config

https://gerrit.wikimedia.org/r/757438

Change 757441 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] Add keys needed for k8s node profile to main master nodes

https://gerrit.wikimedia.org/r/757441

Change 757441 merged by JMeybohm:

[labs/private@master] Add keys needed for k8s node profile to main master nodes

https://gerrit.wikimedia.org/r/757441

Change 757407 merged by JMeybohm:

[operations/puppet@production] Enable IPv6DualStack for kubelet on staging masters

https://gerrit.wikimedia.org/r/757407

Change 757408 merged by JMeybohm:

[operations/puppet@production] Split profile::kubernetes::master_hosts by DC

https://gerrit.wikimedia.org/r/757408

Change 757433 merged by JMeybohm:

[operations/puppet@production] Enable overlayfs on kubernetes masters

https://gerrit.wikimedia.org/r/757433

Change 757615 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Upgrade eqiad kubernetes masters to tainted full nodes

https://gerrit.wikimedia.org/r/757615

Change 757631 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes::master: Remove expose_puppet_certs parameter

https://gerrit.wikimedia.org/r/757631

Change 757434 merged by JMeybohm:

[operations/puppet@production] Upgrade codfw kubernetes masters to tainted full nodes

https://gerrit.wikimedia.org/r/757434

Host rebooted by jayme@cumin1001 with reason: cgroup_enable=memory after docker install

Change 757437 merged by jenkins-bot:

[operations/homer/public@master] Add k8s masters in codfw eBGP config

https://gerrit.wikimedia.org/r/757437

Host rebooted by jayme@cumin1001 with reason: cgroup_enable=memory after docker install

Change 757658 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Fix nrpe_check_disk_options hiera key for kubernetes masters

https://gerrit.wikimedia.org/r/757658

Change 757658 merged by JMeybohm:

[operations/puppet@production] Fix nrpe_check_disk_options hiera key for kubernetes masters

https://gerrit.wikimedia.org/r/757658

Change 757615 merged by JMeybohm:

[operations/puppet@production] Upgrade eqiad kubernetes masters to tainted full nodes

https://gerrit.wikimedia.org/r/757615

Host rebooted by jayme@cumin1001 with reason: cgroup_enable=memory after docker install

Change 757438 merged by jenkins-bot:

[operations/homer/public@master] Add k8s masters in eqiad eBGP config

https://gerrit.wikimedia.org/r/757438

Host rebooted by jayme@cumin1001 with reason: cgroup_enable=memory after docker install

Change 757631 merged by JMeybohm:

[operations/puppet@production] kubernetes::master: Remove expose_puppet_certs parameter

https://gerrit.wikimedia.org/r/757631

All control planes have been migrated, I've also updated the docs at https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/New to make this the new default.

Change 759741 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Add label node-role.kubernetes.io/master to masters

https://gerrit.wikimedia.org/r/759741

Change 759741 merged by JMeybohm:

[operations/puppet@production] Add label node-role.kubernetes.io/master to masters

https://gerrit.wikimedia.org/r/759741