This is the follow up task for T287007.
Already merged changes regarding this:
- custom_deploy: Add istio manifest for main clusters
- admin_ng: Support managing of system namespaces with helmfile
- admin_ng/main: Create istio-system namespace
Those changes, together with a WIP one, made it possible to generally install istio to staging-codfw but there are still some open questions/things to fix:
- Decide how we want the kube-apiserver to reach webooks running inside of the cluster, see: T290967
- Figure out how to deal with the internal ca that istio manages. It is by default used to secure communication with itsiod as well as establish trust between the Ingress-Gateway and services.
- We can leave that alone as it is fully managed by istio itself and only used for istiod<->istio-ingressgateway communication in our setup
- Make Ingress-Gateway trust Puppet-CA (e.g. tls-proxy) certificates 730591
- Make prometheus scrape istiod and Ingress-Gateway
- Decide on how we want to run the Ingress-Gateway and ultimately how we want PyBal to healthcheck it/the k8s nodes. See "On running the Istio-Ingressgateway"
- We will run ingressgateway as daemonset with a exernalTrafficPolicy: Local service in front
-
Provision a default ingress gateway for staging clusters (serving staging.svc.<DC>.discovery.wmnet)Nothing we can easily do without changing the HTTP routing compared to production.- Create an active/passive LVS for staging and make it accessible: T300740
- Implement something to provision k8s Secret objects (in istio-system namespace) for service certificates (currently generated via cergen) T294560
- Bunch of docs and training session for SRE (https://wikitech.wikimedia.org/wiki/Kubernetes/Ingress)
- Deploy all the things to wikikube clusters
I'm keeping some additional, unordered notes at https://wikitech.wikimedia.org/wiki/User:JMeybohm/Kubernetes/Ingress
On running the Istio-Ingressgateway
Regardless of the way we'll be deploying the ingressgateway, connections to it will happen via LVS -> NodePort. See what ML did
We can use PyBal IdleConnection monitor as the Ingressgateway HTTP health endpoint is exposed on a dedicated port and PyBal can only do ProxyFetch one the service port (not a different one).
We could potentially patch PyBal to allow a different port (maybe per ProxyFetch URLs) as well [1]
Autoscaling
By default Istio configures the Ingressgateway deployment (and control-plane) with autoscaling enabled (HPA) on targetAverageUtilization.
Pro
- Run only as much ingressgateways as we need (potentially)
Con
- Potential extra network hop from one Node to another (running a ingressgateway Pod)
- PyBal can't differentiate on Ingressgateway down vs. Node down. If no ingressgateway is available, the NodePort won't accept connections and PyBal would see all Nodes as down (not sure if that's actually a problem).
- We have no experience with HPA
Daemonset
In this scenario we run Ingressgateway as Daemonset (e.g. on each Node) and set it's Service externalTrafficPolicy=Local (this ensures a connection to a Nodes NodePort will be answered by the Ingressgateway Pod on the same node).
Pro
- No extra network hop between LVS
- Health checking an Ingressgateway is actually health checking a Node (in contrast to some Ingressgateway potentially running on a different node)
Con
- Waste on resources as we run one Ingressgateway per node (would need to figure out how much that is. More Gateways will also add more load to the Control Plane)