Page MenuHomePhabricator

Make `webservice restart` (k8s backend) issue rollout restart instead of killing pod
Closed, DuplicatePublicFeature

Description

Feature summary (what you would like to be able to do and where):
I would like the webservice restart command (on the kubernetes backend) to issue the equivalent of a kubectl rollout restart deployment $TOOL_NAME command, instead of killing the pod as it currently does and then waiting for Kubernetes to recreate it.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
By default, I believe this would not make any difference. However, for tools that configure liveness, readiness or startup probes (e.g. by adding them to the webservice-created deployment with kubectl patch), this would enable a restart with no downtime, since Kubernetes would wait until the new container is ready before sending traffic to it and deleting the old pod.

Benefits (why should this be implemented?):
Maintainers of tools that have defined a startup probe could then continue to use the standard webservice restart command, with the benefit of eliminating downtime, instead of having to remember to run the separate kubectl rollout restart deployment $TOOL_NAME command.

Event Timeline

For the Wikidata Lexeme Forms tool, I’ve configured a simple readiness probe in R2362:4da7f64c4b05: Enable updates/restarts without downtime.

Edit: This turns out to have been a terrible idea because it floods the ~/uwsgi.log with /healthz requests once per second. What you want is a startup probe. R2362:3c1b6e08105b: Change readinessProbe to startupProbe

(If you use flask-healthz, it tries to disable logging of healthz HTTP requests, but that only works with gunicorn and Toolforge uses uwsgi.)

the equivalent of a kubectl rollout restart deployment $TOOL_NAME command

According to StackOverflow, there isn’t really a direct API equivalent to this, kubectl just patches a restartedAt annotation into the deployment’s metadata and that causes a restart. 🤷

Change 721989 had a related patch set uploaded (by Lucas Werkmeister; author: Lucas Werkmeister):

[operations/software/tools-webservice@master] Perform rolling restarts on kubernetes

https://gerrit.wikimedia.org/r/721989

Change 721989 abandoned by Lucas Werkmeister:

[operations/software/tools-webservice@master] Perform rolling restarts on kubernetes

Reason:

superseded by I2f3d2e7fa8

https://gerrit.wikimedia.org/r/721989

oops I forgot this already had a task when I created T337182