Page MenuHomePhabricator

Install Knative on ml-serve cluster
Open, Needs TriagePublic

Description

The Lift-Wing proof of concept requires Knative to be installed in order to run KFServing.

We need Knative Serving: v0.14.3+

The Knative docs say to install via k8s CRDs/Operators:
https://knative.dev/docs/install/any-kubernetes-cluster/

There is also some prior art around creating a custom helm chart (which fits better into the WMF stack):
https://github.com/triggermesh/charts

Note: cluster-local-gateway is required to serve cluster-internal traffic for transformer and explainer use cases (unless we are running v0.19.0).
Please follow instructions here to install cluster local gateway.

Event Timeline

Little note from https://knative.dev/docs/install/any-kubernetes-cluster/#before-you-begin

Knative v0.21.0 requires a Kubernetes cluster v1.17 or newer, as well as a compatible kubectl.

We are running 1.16 at the moment, and next Fiscal Year we'll work with SRE for 1.20, but there is no clear timeline yet.

Can we use a less recent version compatible with 1.16? Is it a viable path given that KNative is relatively young and probably releases every couple of months? (see https://github.com/knative/serving/tags)

@Theofpa Any guidance from you on this would be really helpful :)

I've made a version compatibility matrix from our recent tests (kfserving#1334, kfserving#1482):

kubernetesistioknative
1.161.3.10.17
1.161.6.20.18
1.171.7.10.20
1.191.8.20.21

The transition from knative<=1.18 to knative>=1.19 introduced a change that impacted kfserving: The deprecation of the cluster-local-gateway.

I understand from T272918 that the requirement was for k8s 1.16-1.18 version and the delivered cluster is a k8s-1.16.

It looks like we have two options:

  1. Stay with k8s-1.16 and install knative-0.18
    • Problem: on a future upgrade, we'll have to deal with the migration from cluster-local-gateway to knative-local-gateway.
  2. Get a k8s-1.19 and install knative-0.21
    • Problem: I assume the delivery time will be long, and impact the project delivery. Unless we can request the upgrade of that empty cluster to k8s 1.17, 1.18 or 1.19?

I would recommend to go with the most recent versions as we are in greenfield and in such a way we can stay up-to-date for a longer period.

This is a really important set of infos, thanks! I think that for the MVP we can go for 1.16 + 0.18, and then we can decide later on what to do. IIUC our SRE team is planning to introduce k8s 1.20 later on during the year, so we could possibly anticipate the need and be the first ones to test it (before going really live).

I do share the opinion that we should be as close to upstream as possible, especially to get the latest bugfixes from knative upstream if needed. I'd be worried in ending up stuck on 0.18 with some bugs to solve with patches for later versions only (and backporting patches on 0.18 is not a great idea either).

Tried to install knative + istio following https://github.com/kubeflow/kfserving/blob/master/test/scripts/run-e2e-tests.sh#L75-L102 on minikube + k8s 1.20.2 (1.16.0 seems not running ok with minukube):

elukey@wintermute:~/Wikimedia/minikube$ kubectl apply -f operator.yaml 
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
customresourcedefinition.apiextensions.k8s.io/knativeeventings.operator.knative.dev created
customresourcedefinition.apiextensions.k8s.io/knativeservings.operator.knative.dev created
configmap/config-logging created
configmap/config-observability created
deployment.apps/knative-operator created
clusterrole.rbac.authorization.k8s.io/knative-serving-operator-aggregated created
clusterrole.rbac.authorization.k8s.io/knative-serving-operator created
clusterrole.rbac.authorization.k8s.io/knative-eventing-operator-aggregated created
clusterrole.rbac.authorization.k8s.io/knative-eventing-operator created
clusterrolebinding.rbac.authorization.k8s.io/knative-serving-operator created
clusterrolebinding.rbac.authorization.k8s.io/knative-serving-operator-aggregated created
clusterrolebinding.rbac.authorization.k8s.io/knative-eventing-operator created
clusterrolebinding.rbac.authorization.k8s.io/knative-eventing-operator-aggregated created
serviceaccount/knative-operator created

elukey@wintermute:~/Wikimedia/minikube$ cat knative-serving.yaml 
apiVersion: v1
kind: Namespace
metadata:
 name: knative-serving
 labels:
   istio-injection: enabled
---
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
elukey@wintermute:~/Wikimedia/minikube$ kubectl apply -f knative-serving.yaml 
namespace/knative-serving created
knativeserving.operator.knative.dev/knative-serving created

docker@minikube:~$ docker ps | grep -v pause | grep knative
9291aafdb339   gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler   "/ko-app/autoscaler"     13 seconds ago   Up 12 seconds             k8s_autoscaler_autoscaler-6cb7c9d4fb-s9qn5_knative-serving_c5a7465e-be7a-455f-bd18-8c82f4092396_0
f09372a2e006   gcr.io/knative-releases/knative.dev/serving/cmd/activator    "/ko-app/activator"      27 seconds ago   Up 26 seconds             k8s_activator_activator-666887556-qfsv9_knative-serving_11e760ce-df4b-4c59-87d8-7cb51c083a54_0
64fbbf25845b   gcr.io/knative-releases/knative.dev/operator/cmd/operator    "/ko-app/operator"       3 minutes ago    Up 3 minutes              k8s_knative-operator_knative-operator-6b6fb7bdf5-tqn94_default_8436e577-af49-4d25-8f16-90601db4c515_0

docker@minikube:~$ docker images | grep knative
gcr.io/knative-releases/knative.dev/operator/cmd/operator    <none>     c9cf5f68657a   4 months ago    70.8MB
gcr.io/knative-releases/knative.dev/serving/cmd/activator    <none>     1d721a5f82f5   5 months ago    64.2MB
gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler   <none>     f736a5dbb725   5 months ago    64.3MB
gcr.io/knative-releases/knative.dev/serving/cmd/controller   <none>     514b2f906521   5 months ago    69.4MB