Page MenuHomePhabricator

Evaluate istio as an ingress for production usage
Closed, ResolvedPublic

Description

While we don't use istio as a full service mesh, we make ample use of istio as a gateway in the ml cluster for knative. So, while istio would be too complex as a mere ingress, but given we're already using it, we should indeed evaluate it as an alternative ingress.

Specifically, for each ingress we're answering the following questions:

  • What is the general architecture?
  • How can we deploy it on bare metal?
  • Do we need to build and maintain docker images ourselves?
  • How it can be configured to proxy various services with easy parametrization? Three different ways actually, see Configuration below.
  • How do we operate on it?
  • Is it easy to collect metrics?

    Metrics can easily be collected with prometheus - in fact, istio ships with the correct annotations and thus should easily be picked up by our prometheus without adding any new rule.
  • How do we collect logs?

    Access logs and other logs are easy to collect as well. See:

Thankfully given we already use istio for ML we know the answer to at least the first few questions.

Configuration

There are three ways to configure istio-ingressgateway(s) with different levels of granularity and different feature sets

Kubernetes Ingress (Simple)

https://istio.io/v1.9/docs/tasks/traffic-management/ingress/ingress-control/

  • Uses Kubernetes default Ingress objects for configuration: https://people.wikimedia.org/~jayme/k8s-docs/v1.16/docs/concepts/services-networking/ingress/
  • But supports sharing of hostnames between namespaces (due to the central nature of istio-ingressgateway)
  • L7 (HTTP(S)) only
  • TLS certificates need to be placed in the namespace of istio-ingressgateway
  • Just plain host and path prefix matching
  • No advanced features like weight, header matching
  • No "role model" or binding restrictions (e.g. allowing namespaces to use a specific hostname only, enforce default policies ...)
  • Least number of API objects involved

Kubernetes Gateway API (Medium)

https://istio.io/v1.9/docs/tasks/traffic-management/ingress/gateway-api/

  • Uses the "new standard" Kubernetes API which is deemed the successor of Ingress:
  • L7 and L4 routing
  • TLS certificates need to be placed in the namespace of istio-ingressgateway
  • Host and path (prefix and exact) matching
  • Advanced features like weight, header matching and modification,...
  • "Role model" or binding restrictions (e.g. allowing namespaces to use a specific hostname only, enforce default policies ...).
  • API still in v1alpha1 (3rd release), soon to be v1alpha2
  • Needs to be added (CRD) to the cluster
  • May be subject to breaking changes in the future (not completely clear to me but I think we should assume)
  • Already implemented by some other popular ingress controllers (like ambassador, contour, traefik and HAProxy)

Ingress Gateways (Advanced)

https://istio.io/v1.9/docs/tasks/traffic-management/ingress/ingress-control/

  • Uses Istio specific API
  • L7 and L4 routing
  • TLS certificates can be placed anywhere
  • Host and path (prefix and exact) matching
  • No "role model" or binding restrictions (e.g. allowing namespaces to use a specific hostname only, enforce default policies ...).
  • Advanced features like weight, header matching and modification,...
  • Very advanced features like fault injection, circuit breaking, mirroring etc.
  • Needs to be added (CRD) to the cluster
  • Specific to istio
  • High number of API objects involved

Event Timeline

Istio can be configured with native ingress resources, using the annotation:

kubernetes.io/ingress.class: istio

see https://istio.io/latest/docs/tasks/traffic-management/ingress/kubernetes-ingress/

The other quirk is that secret added to the ingress will need to be located in the istio namespace (which isn't that surprising, but a deviation from the norm, and also means we need root intervention whenever we want to add an TLS-terminated ingress).

Of course, istio also offers its own custom resource definitions for a richer configuration:

the istio gateway (https://istio.io/latest/docs/reference/config/networking/gateway/) provides what looks like a simplified configuration for envoy, more or less, using three types of CRDs:

  • Gateway to expose the ports we want to listen to
  • VirtualService to indicate the routing prefixes

but I don't think we really need it at this point in time; we might as well wait for the new api gateway interface in newer kubernetes installations, that will probably add more features support.

Metrics can easily be collected with prometheus - in fact, istio ships with the correct annotations and thus should easily be picked up by our prometheus without adding any new rule.

Overall istio looks a lot like the other envoy-based ingress I evaluated, being just quite a bit more complicated because istio can do much, much more than just being an api gateway.

The advantages are, though, the reason why I'm not ruling it out as a viable alternative:

  • no unknown unknowns
  • Reduce the number of technologies we run on top of kubernetes
  • An industry standard that will be maintained for the forseeable future
  • Lower cost of startup for us than any other ingress technology, including nginx.

As it stands, I think we should pick istio as an ingress piggybacking the work done by the machine-learning team.

Change 719265 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] custom_deploy: Add istio manifest for main clusters

https://gerrit.wikimedia.org/r/719265

Change 719265 merged by jenkins-bot:

[operations/deployment-charts@master] custom_deploy: Add istio manifest for main clusters

https://gerrit.wikimedia.org/r/719265

A few questions/points:

  • I see a bullet point TLS certificates need to be placed in the namespace of istio-ingressgateway and a comment by @Joe above, but it doesn't clarify it to me. Does this mean that certs AND private keys get placed in the namespace as Secrets/Configmaps?
  • Care to add an example of Just plain host and path prefix matching (1st case) vs Host and path (prefix and exact) matching (2nd and 3rd case).
  • Interesting that the "Medium" has "Role model" for binding restrictions while "Advanced" doesn't. Is that correct?
  • Piggybacking on the work already done by the ML-team is indeed a huge contributing factor for choosing istio over other ingresses that have way smaller complexity (e.g. linkerd, contour, etc).
  • We 've talked about this already in some channels, but stating it here as well for completeness. My take is that we go as simple as possible on this one. Debugging on kubernetes can already be a difficult task due to complexities of the architecture, adding istio will only add to that complexity. Let's keep the extra things added as lean as possible. So, I guess that means choice #1 for now.

A few questions/points:

  • I see a bullet point TLS certificates need to be placed in the namespace of istio-ingressgateway and a comment by @Joe above, but it doesn't clarify it to me. Does this mean that certs AND private keys get placed in the namespace as Secrets/Configmaps?

Yes, exactly that. Cert and private key for all hostnames we want to use via Ingress need to be placed in the istio-ingressgateway Namespace as secrets. We already have those as ConfigMaps in the namespaces of the services, though.

  • Care to add an example of Just plain host and path prefix matching (1st case) vs Host and path (prefix and exact) matching (2nd and 3rd case).

For 1st case a definition like path: /foo will always be treated as path: /foo.*. For 2nd and 3rd we may choose to have an exact match only instead ( path: /foo != path: /foo.+ ).

  • Interesting that the "Medium" has "Role model" for binding restrictions while "Advanced" doesn't. Is that correct?

I was surprised as well, but I did not find anything regarding that in Istio.

  • We 've talked about this already in some channels, but stating it here as well for completeness. My take is that we go as simple as possible on this one. Debugging on kubernetes can already be a difficult task due to complexities of the architecture, adding istio will only add to that complexity. Let's keep the extra things added as lean as possible. So, I guess that means choice #1 for now.

Agreed. It's also important to add that we could mix and match (assuming we add the required CRDs for the 2nd option) if we see the need. Although I would suppose not to to that to much as that will diverge the way services are configured quite heavily (adding even more complexity).