Page MenuHomePhabricator

Update to KServe 0.11
Closed, ResolvedPublic5 Estimated Story Points

Description

The KServe 0.11 release is about to be published, we should try to upgrade to it in order to get its latest features (including a configurable access log format that we'll need for T333804.

Overall plan:

  • - Check the release changelog to familiarize with what changed.
  • - Upgrade the Docker images in the production-images repo to the new version. This includes the storage-initializer one, that is the image ran by one of the containers in the model server' pods (so we'll need to wait the prerequisite of the next point before proceeding).
  • - Rollout the new model servers.
  • - Once kserve==0.11 is published in Pypi, we can start upgrading all the model server Docker images to it. The changes are not as big as the last upgrade, so I'd expect to see some dependency change but not much.
  • - Upgrade the KServe's deployment-charts chart with the new yaml config.
  • - Rollout the new control plane in k8s.

Event Timeline

Change 967451 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/docker-images/production-images@master] images: Update kserve/build to kserve v0.11.1

https://gerrit.wikimedia.org/r/967451

Change 967451 merged by Klausman:

[operations/docker-images/production-images@master] images: Update kserve/build to kserve v0.11.1

https://gerrit.wikimedia.org/r/967451

calbon set the point value for this task to 5.Nov 2 2023, 7:12 PM
calbon triaged this task as Medium priority.Nov 2 2023, 7:25 PM

0.11.2 was released recently. I will update the images to that version before proceeding.

Change 975839 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/docker-images/production-images@master] kserve: Update images to v0.11.2

https://gerrit.wikimedia.org/r/975839

Change 975839 abandoned by Klausman:

[operations/docker-images/production-images@master] kserve: Update images to v0.11.2

Reason:

Broken change due to git shenanigans

https://gerrit.wikimedia.org/r/975839

Images have been built and published:

```# docker images
REPOSITORY                                                                       TAG                            IMAGE ID       CREATED          SIZE
docker-registry.discovery.wmnet/kserve-storage-initializer                       0.11.2-1                       caed698a1bb0   6 minutes ago    727MB
docker-registry.discovery.wmnet/kserve-storage-initializer                       latest                         caed698a1bb0   6 minutes ago    727MB
docker-registry.discovery.wmnet/kserve-agent                                     0.11.2-1                       47980b2c48db   7 minutes ago    139MB
docker-registry.discovery.wmnet/kserve-agent                                     latest                         47980b2c48db   7 minutes ago    139MB
docker-registry.discovery.wmnet/kserve-controller                                0.11.2-1                       0ba5149f28fe   7 minutes ago    137MB
docker-registry.discovery.wmnet/kserve-controller                                latest                         0ba5149f28fe   7 minutes ago    137MB
docker-registry.discovery.wmnet/kserve-build                                     0.11.2-1                       852289661bd6   8 minutes ago    2.58GB
docker-registry.discovery.wmnet/kserve-build                                     latest                         852289661bd6   8 minutes ago    2.58GB

@klausman I can work on this if you want, IIUC the control plane is still on 0.10. Nothing major but better to be consistent, lemme know.

Change 1007330 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kserve: upgrade to upstream 0.11.2

https://gerrit.wikimedia.org/r/1007330

Taking over from Tobias to free some tasks from his queue since I am just getting back to work (need something technical to do :D)

Change 1007330 merged by Elukey:

[operations/deployment-charts@master] kserve: upgrade to upstream 0.11.2

https://gerrit.wikimedia.org/r/1007330

Change 1007391 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kserve: bump Docker image default versions to 0.11.2

https://gerrit.wikimedia.org/r/1007391

Change 1007391 merged by Elukey:

[operations/deployment-charts@master] kserve: bump Docker image default versions to 0.11.2

https://gerrit.wikimedia.org/r/1007391

Change 1007400 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] kserve: upgrade all images to Bookworm

https://gerrit.wikimedia.org/r/1007400

Change 1007624 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kserve: update default Docker images

https://gerrit.wikimedia.org/r/1007624

Change 1007400 merged by Elukey:

[operations/docker-images/production-images@master] kserve: upgrade all images to Bookworm

https://gerrit.wikimedia.org/r/1007400

Change 1007624 merged by Elukey:

[operations/deployment-charts@master] kserve: update default Docker images

https://gerrit.wikimedia.org/r/1007624

Change 1007903 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kserve: add missing comma to kserve yaml config

https://gerrit.wikimedia.org/r/1007903

Change 1007903 merged by Elukey:

[operations/deployment-charts@master] kserve: add missing comma to kserve yaml config

https://gerrit.wikimedia.org/r/1007903

Change 1007907 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] kserve: use numeric id for nobody

https://gerrit.wikimedia.org/r/1007907

Change 1007907 merged by Elukey:

[operations/docker-images/production-images@master] kserve: use numeric id for nobody

https://gerrit.wikimedia.org/r/1007907

Change 1007915 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kserve: bump docker image version for the storage-initializer

https://gerrit.wikimedia.org/r/1007915

Change 1007915 merged by Elukey:

[operations/deployment-charts@master] kserve: bump docker image version for the storage-initializer

https://gerrit.wikimedia.org/r/1007915

Change 1007923 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] kserve: add wmf-certificates in the right place for storage-init

https://gerrit.wikimedia.org/r/1007923

Change 1007923 merged by Elukey:

[operations/docker-images/production-images@master] kserve: add wmf-certificates in the right place for storage-init

https://gerrit.wikimedia.org/r/1007923

Change rMW10079319463f had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] kserve: bump default Docker image for storage-init

https://gerrit.wikimedia.org/r/1007931

Change rMW10079319463f merged by Elukey:

[operations/deployment-charts@master] kserve: bump default Docker image for storage-init

https://gerrit.wikimedia.org/r/1007931

The new control plane is running in staging, I took the opportunity to upgrade all Docker images to Debian Bookworm.

New control plane deployed to ml-serve-{eqiad,codfw}, tested if killing a pod worked, all good.