Page MenuHomePhabricator

Install Istio on ml-serve cluster
Open, Needs TriagePublic

Description

For the Lift-Wing proof of concept, we want to install KFServing.

Istio is the primary dependency of both KFServing & Knative.

We should be able to install via helm:
https://istio.io/latest/docs/setup/install/helm/

Event Timeline

After talking with @elukey last week, we both seemed to agree that we should install Istio without the full service mesh (sidecar injection) for our proof of concept. We do not need the full mesh network at this point and it introduces considerable overhead to the MVP.

The KFServing docs also mention this as a quick way to get started: https://github.com/kubeflow/kfserving#prerequisites

Today I followed up on the Kubeflow's slack (there is a kfserving chan) and I got a couple of interesting links:

https://github.com/kubeflow/kfserving/blob/master/hack/quick_install.sh
https://github.com/ajinkya933/Kubeflow-Serving

From the quick install script, it seems that the bare minimum config to make everything working is:

  1. An istio namespace
  2. Some basic config for Ingress

The steps outlined in https://istio.io/latest/docs/setup/install/helm/ seems more related to setting up a more complete setup. All the helm charts are available in the release tarball so hopefully it will not be super hard to test them (still need to figure out how to helm install only on our cluster for tests without making mess elsewhere).

From another angle: https://knative.dev/docs/install/installing-istio/#installing-istio. Our dear Knative needs Istio as well, and it seems better to use 1.8.2 (last upstream is 1.9).

Both approaches (except the helm one) use the istioctl command, that IIUC it is a binary shipped in the release that should automate some manual work. There is also a mention of istiod, that should be the istio daemon needed when using the service mesh as control plane, but we shouldn't need it now.

How to make everything working is still a bit unclear to me, but I'll keep the task updated :D

Hey @elukey, this is the script we're using for e2e testing on kfserving community. It is using the most recent versions of Istio and knative, with their operators and having the sidecar injection disabled as per the requirement above.

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      proxy:
        autoInject: disabled

@Theofpa thanks a lot for following up! I have a generic question about what istio setting we should try to pursue. My understanding is that a service mesh is not strictly needed to serve simple models via KFserving, but as soon as the complexity increases a bit the Istio control plane and Envoy sidecar is needed (to allow service -> service communications). Should we start from the beginning with a full service mesh (since it will be surely needed) or do you think that it is not worth it as first step?

The service to service communication can be enhanced with a service mesh if we require for example security policies across the services of that cluster. By having Istio sidecar injection in a namespace, each pod will have an extra container with the envoy proxy that brings the access control, the logging, the tracing, etc in the services that these pods provide.

So, we need to answer the following question:

What type of workloads are we going to have in this cluster?

Model serving only? Model serving AND other services which communicate with each other via access-control?

In case of model serving only, we don’t need to use service mesh, it’s only going to be an overhead. We can just use the Istio ingress gateway and the Istio virtualservices as managed by the kfserving reconciler.

In case we are going to host other services as well (for example public services which we want to restrict from accessing the model services), we can benefit from the service mesh. We can have Istio managing the communication across the namespaces and their services based on roles, and tracking this communication with metrics and tracing in Istio’s Prometheus&Jaeger.

It looks like this is a cluster dedicated to model serving, and any incoming traffic will be managed by northbound interfaces. So my recommendation would be to keep the sidecar injection disabled.

@Theofpa makes sense, our use case is surely only model serving, and the service mesh seemed to us an overkill, so good to have it confirmed :) My doubts were related to use cases like:

  1. Fetch the model from a storage (if not included in the Docker image) like S3/Swift (we have an internal cluster).
  2. Fetch features from cache/store/etc.. (not sure if needed for the first models but I am pretty sure the use case will come up).

My fear was that the above use cases needed dedicated micro-services, and hence the Istio service mesh. If it is not the case then I am very happy. I see that Istio offers some helm charts, it would be great to fit them into our deployment-charts repo, this is what me and Tobias are going to work on in the immediate future (trying also to fit Istio's requirements in the RBAC policies that the SRE team suggests for the kubernetes clusters).

Having said that, please note that I have zero experience in kubernetes and ML, so I hope I haven't written anything totally off!

elukey added a subscriber: elukey.

Today I tried to think about next steps for this task, and I have some thoughts, lemme know :)

From T278194#6964746 it seems that we should target istio 1.6.2 for our current environment. It is almost a year old, not very up to date, but until we upgrade kubernetes it seems better to follow what works best with knative 0.18 (we may have some flexibility for istio versions, so let's say a version close to 1.6.2).

So we can start from https://github.com/kubeflow/kfserving/blob/master/test/scripts/run-e2e-tests.sh#L39-L71, in which we can see a simple example about how and what to deploy to get a minimal istio config:

  • istio gateway
  • istio operator
  • istiod for the control plane

IIUC, all the above (including pulling images from docker hub) is handled in the script by the binary istioclt, shipped with all releases of istio. We want to use helm 3 if possible, and the first indication about how to do it was added in 1.8's doc (https://istio.io/v1.8/docs/setup/install/helm/) but in theory we should be able to work on 1.6 without much troubles (famous last words).

The big missing piece at the moment are the docker images, that we should somehow end up having in our internal Wikimedia Docker registry. I followed up with Service ops today and they pointed me to how calico is packaged, namely we pick a certain release, verify it and copy binaries to a deb package. Next step is to figure out what docker images are needed, and if we can create them on our docker registry.

We can also decide later on if it is more convenient to use istioctl or helm (the latter seems to be more self descriptive and better for documentation).

I was able to bootstrap minikube with k8s 1.20.2 (the other ones failed for cgroup issues..)

elukey@wintermute:~/Wikimedia/scratch-dir/istio-1.6.2$ ./bin/istioctl operator init
Using operator Deployment image: docker.io/istio/operator:1.6.2
✔ Istio operator installed                                                                                           
✔ Installation complete

docker@minikube:~$ docker ps | grep istio | grep -v pause
88d852cb49d4   istio/operator         "operator server"        About a minute ago   Up About a minute             k8s_istio-operator_istio-operator-5668d5ddb-kkk9t_istio-operator_7c4aaa0f-f6a6-4f81-81f5-e47d9cb6e887_0

docker@minikube:~$ docker images | grep istio
istio/operator                            1.6.2      69540da46816   10 months ago   223MB

Then:

elukey@wintermute:~/Wikimedia/scratch-dir/istio-1.6.2$ ./bin/istioctl manifest apply -y -f ./istio-minimal-operator.yaml
✔ Istio core installed                                                                                               
✔ Istiod installed                                                                                                   
✔ Ingress gateways installed                                                                                         
✔ Addons installed                                                                                                   
✔ Installation complete  


docker@minikube:~$ docker ps | grep istio | grep -v pause
f4e15ec117a1   istio/proxyv2          "/usr/local/bin/pilo…"   19 seconds ago   Up 18 seconds             k8s_istio-proxy_prometheus-56944b6bd5-x99j8_istio-system_f2e27f41-3d04-40ab-8581-2707d361566a_0
b71a6e34a987   61bf337f2956           "/bin/prometheus --s…"   21 seconds ago   Up 21 seconds             k8s_prometheus_prometheus-56944b6bd5-x99j8_istio-system_f2e27f41-3d04-40ab-8581-2707d361566a_0
34333b758ace   14e45d814562           "/usr/local/bin/pilo…"   22 seconds ago   Up 21 seconds             k8s_discovery_istiod-c4cfbfb6c-l5mzq_istio-system_257abb97-84d2-4b45-9d98-60dd6694e620_0
db82ac640191   1162f09e0728           "/usr/local/bin/pilo…"   23 seconds ago   Up 22 seconds             k8s_istio-proxy_istio-ingressgateway-57bd88c95c-g7v66_istio-system_eb649955-8963-40c8-af27-b8ca297b0bba_0
88d852cb49d4   istio/operator         "operator server"        6 minutes ago    Up 6 minutes              k8s_istio-operator_istio-operator-5668d5ddb-kkk9t_istio-operator_7c4aaa0f-f6a6-4f81-81f5-e47d9cb6e887_0

docker@minikube:~$ docker images | grep istio
istio/proxyv2                             1.6.2      1162f09e0728   10 months ago   304MB
istio/pilot                               1.6.2      14e45d814562   10 months ago   237MB
istio/operator                            1.6.2      69540da46816   10 months ago   223MB

elukey@wintermute:~/Wikimedia/scratch-dir/istio-1.6.2$ kubectl get namespaces
NAME              STATUS   AGE
default           Active   12m
istio-operator    Active   9m40s
istio-system      Active   4m58s

elukey@wintermute:~/Wikimedia/scratch-dir/istio-1.6.2$ kubectl get pods -n istio-operator
NAME                             READY   STATUS    RESTARTS   AGE
istio-operator-5668d5ddb-kkk9t   1/1     Running   0          10m

elukey@wintermute:~/Wikimedia/scratch-dir/istio-1.6.2$ kubectl get pods -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istio-ingressgateway-57bd88c95c-g7v66   1/1     Running   0          4m10s
istiod-c4cfbfb6c-l5mzq                  1/1     Running   0          4m8s
prometheus-56944b6bd5-x99j8             2/2     Running   0          4m8s

Even if used istioctl for this use case (and not helm), we should have a complete list of Docker images to add to our internal registry. In theory the best thing would be to avoid pulling from Dockerhub directly, and https://github.com/istio/istio/blob/release-1.6/tools/istio-docker.mk looks promising.

Addendum - the istio operator pod is needed only if we want to support istioctl, it seems not needed when using helm. As starting point, we could try to import istio/proxyv2 and istio/pilot in the WMF Docker registry, and then come up with some Helm charts for Istio.

Mapping images -> pods:

elukey@wintermute:~/Wikimedia/minikube$ kubectl get pods --all-namespaces -o=jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .spec.containers[*]}{.image}{", "}{end}{end}' |sort | grep istio | grep -v knative
istiod-c4cfbfb6c-j2m6k:	docker.io/istio/pilot:1.6.2, 
istio-ingressgateway-57bd88c95c-g7v66:	docker.io/istio/proxyv2:1.6.2, 
istio-operator-5668d5ddb-kkk9t:	docker.io/istio/operator:1.6.2, 
prometheus-56944b6bd5-x99j8:	docker.io/prom/prometheus:v2.15.1, docker.io/istio/proxyv2:1.6.2,

Something interesting that I found today is: https://gcsweb.istio.io/gcs/istio-build/dev/1.6-alpha.3ddc57b6d1e15afebefd725e01c0dc7099f3f6dd/docker/

Istio pushes daily builds to gcsweb, containing also the Docker images that we need. I suppose that we could build the docker dir on deneb as well, and then push the docker images to our docker registry. We could also use the above website as source of truth for Docker images.

Links to start:

https://doc.wikimedia.org/docker-pkg/
https://gerrit.wikimedia.org/r/admin/repos/operations/docker-images/production-images

Joe gave me a nice pointer in production-images, namely the loki multi-stage container example. Basically the idea is to build go binaries in one container first, then use them for the official Docker image to push to the registry. If we find a way to build istio (that in theory shouldn't be super difficult) we should also be able to re-use the Docker images like https://github.com/istio/istio/blob/master/pilot/docker/Dockerfile.proxyv2 relatively easy (same thing for Knative etc..)

More info about what binaries are executed in the minikube test that I made:

docker@minikube:~$ docker ps --no-trunc | grep istio | grep -v pause | grep istio-system  | cut -d '"' -f 2
/usr/local/bin/pilot-discovery discovery --monitoringAddr=:15014 --log_output_level=default:info --domain cluster.local --trust-domain=cluster.local --keepaliveMaxServerConnectionAge 30m
/usr/local/bin/pilot-agent proxy sidecar --domain istio-system.svc.cluster.local istio-proxy-prometheus --proxyLogLevel=warning --proxyComponentLogLevel=misc:error --controlPlaneAuthPolicy NONE --trust-domain=cluster.local
/usr/local/bin/pilot-agent proxy router --domain istio-system.svc.cluster.local --proxyLogLevel=warning --proxyComponentLogLevel=misc:error --log_output_level=default:info --serviceCluster istio-ingressgateway --trust-domain=cluster.local
/bin/prometheus --storage.tsdb.retention=6h --config.file=/etc/prometheus/prometheus.yml

The above doesn't include the istio operator (that hanldes istioctl commands) since we may not needed it we use helm.

I tried then to clone the istio github repo, checkout in a separate branch the 1.6.2 tag, and ran make && make docker to see what the build process looked like. In the out/linux-amd64 dir I found:

elukey@wintermute:~/github/istio$ ls out/linux_amd64/
client  docker_build  docker_temp  envoy  istioctl  istio_is_init  logs  mixc  mixgen  mixs  node_agent  operator  pilot-agent  pilot-discovery  policybackend  release  sdsclient  server

There seems also to be some pre-backed environment/layout to build the docker images:

elukey@wintermute:~/github/istio/out/linux_amd64/docker_build$ ls
docker.app  docker.app_sidecar  docker.istioctl  docker.mixer  docker.mixer_codegen  docker.operator  docker.pilot  docker.proxyv2  docker.test_policybackend
elukey@wintermute:~/github/istio/out/linux_amd64/docker_build$ ls docker.proxyv2/
Dockerfile.proxyv2  envoy  envoy_bootstrap_v2.json  envoy_policy.yaml.tmpl  gcp_envoy_bootstrap.json  metadata-exchange-filter.wasm  pilot-agent  stats-filter.wasm
FROM docker-registry.wikimedia.org/golang:1.13-3 as build

ENV ISTIO_VERSION=1.6.2
ENV SOURCE_REPO=https://github.com/istio/istio.git
ENV REPO_BASE=/go/github.com/istio/istio

ENV BUILD_WITH_CONTAINER=0
ENV GOARCH=amd64
ENV GOOS=linux

WORKDIR /go

USER root
RUN apt-get update && apt-get install -y curl ca-certificates

USER nobody
RUN mkdir -p $REPO_BASE \
  && cd $REPO_BASE \
  && git clone $SOURCE_REPO \
  && cd istio \
  && git checkout tags/$ISTIO_VERSION

WORKDIR $REPO_BASE/istio
RUN make build-linux

The above seems ok to just build the istio binaries!

Current idea:

  • multi-stage docker build to generate the images to push to our registry
  • light debian packaging for istioctl, to deploy it on the deployment server, to be able to control the istio mesh.

Change 688211 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] WIP - Add istio base images build support

https://gerrit.wikimedia.org/r/688211