Page MenuHomePhabricator

Allow to address Kubernetes API servers from NetworkPolicy
Open, HighPublic

Description

Currently it's not possible to address Kubernets API servers easily from NetworkPolicy, but starting with T287443 we now need to.

Right now we have the list of IPs hardcoded in various

The reason behind this is that Calico does not model the Endpoints of the k8s service kubernetes.default.svc.cluster.local as WorkloadEndpoints as they are not backed by Pods. In version 3.20 and onwards concept of service-based egress rules was introduced, especially mentioning the ability to define rules for services not backed by pods.

The following chart/services will need to be refactored/adapted:

  • helmfile_istio-gateways.yaml
  • flink-kubernetes-operator
  • spark-operator
  • cert-manager
  • kserve
  • knative-serving

Event Timeline

JMeybohm triaged this task as Medium priority.Jul 27 2021, 4:00 PM
JMeybohm created this task.
JMeybohm renamed this task from Allow to address Kubernets API servers from Calico NetworkPolicy to Allow to address Kubernets API servers from NetworkPolicy.Nov 12 2021, 2:38 PM
JMeybohm updated the task description. (Show Details)
JMeybohm added a subscriber: elukey.
JMeybohm lowered the priority of this task from Medium to Low.Nov 18 2022, 4:02 PM
JMeybohm added a project: serviceops.
JMeybohm moved this task from Incoming 🐫 to ⎈Kubernetes on the serviceops board.

Change 895696 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] admin_ng/cert-manager: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/895696

I've added a CR leveraging the service selector in calico network policies: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/895696/
If that works, the other components using the manually defined kubernetesMasters.cidrs may follow:

  • helmfile_istio-gateways.yaml
  • flink-kubernetes-operator
  • knative-serving
  • kserve
  • rdf-streaming-updater (does not even use kubernetesMasters.cidrs, it has yet another copy of the master IPs)

Looking at the list is seems as if that network policy should be part of a helm chart module

kube-state-metrics successfully introduced the pattern of using a calico networkpolicy with service selector to match masters in https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/974158

JMeybohm raised the priority of this task from Low to High.Apr 16 2024, 11:00 AM

We should prioritize T353464: Migrate wikikube control planes to hardware nodes because of T358936: Kubernetes apiserver probe failures on restart. Would be nice to have this done to lower configuration overhead, so raising this as well.

Change #1025296 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] admin_ng/helmfile_istio-gateway: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/1025296

Change #895696 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng/cert-manager: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/895696

Change #1026495 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] admin_ng/cert-manager: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/1026495

Change #1026495 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng/cert-manager: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/1026495

I did deploy the cert-manager changes to aux, @brouberol did dse and @klausman will take care of ml clusters, thanks all

Change #1025296 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng/helmfile_istio-gateway: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/1025296

Changes to istio-gateways in all clusters have been deployed

Change #1029573 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] (WIP) flink-kubernetes-operator: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/1029573

@dcausse and I will deploy 1029573 tomorrow EU morning

Change #1029573 merged by jenkins-bot:

[operations/deployment-charts@master] flink-kubernetes-operator: Remove dependency on kubernetesMasters.cidrs

https://gerrit.wikimedia.org/r/1029573

Change #1031810 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] rdf-streaming-updater: Remove duplicate definition of k8s api-servers

https://gerrit.wikimedia.org/r/1031810

Change #1031811 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Remove kubernetesMasters definition from all wikikube values

https://gerrit.wikimedia.org/r/1031811

I will have a look at the spark-operator too. Thanks all for working on this.

Aklapper renamed this task from Allow to address Kubernets API servers from NetworkPolicy to Allow to address Kubernetes API servers from NetworkPolicy.Wed, May 15, 11:32 AM

Change #1031892 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: remove zk network policies

https://gerrit.wikimedia.org/r/1031892

I will have a look at the spark-operator too. Thanks all for working on this.

Send a review my way if you want, cheers

Change #1031901 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Remove kubernetesMasters definition from staging-codfw

https://gerrit.wikimedia.org/r/1031901

Change #1031901 merged by JMeybohm:

[operations/deployment-charts@master] Remove kubernetesMasters definition from staging-codfw

https://gerrit.wikimedia.org/r/1031901

Change #1031593 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Remove kubernetesMasters definition from dse-k8s

https://gerrit.wikimedia.org/r/1031593

I will have a look at the spark-operator too. Thanks all for working on this.

Send a review my way if you want, cheers

It turns out that this was simpler than expected because spark-operator is already using calico network policies to access the k8s API.

Now that the flink-operator has been updated, that means I think we can just remove the unused values: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1031593

Change #1031810 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater: Remove duplicate definition of k8s and zk

https://gerrit.wikimedia.org/r/1031810

Change #1031593 merged by Btullis:

[operations/deployment-charts@master] Remove kubernetesMasters definition from dse-k8s

https://gerrit.wikimedia.org/r/1031593

Change #1031811 merged by jenkins-bot:

[operations/deployment-charts@master] Remove kubernetesMasters definition from all wikikube values

https://gerrit.wikimedia.org/r/1031811

@klausman you can go head with kserve and knative-serving when you have some time, ping me for reviews etc

@klausman you can go head with kserve and knative-serving when you have some time, ping me for reviews etc

Will do! I'll likely get to it next week.