Page MenuHomePhabricator

toolforge jobs-framework-emailer: increase reliability
Closed, ResolvedPublic

Description

The daemon can die without Kubernetes noticing, see T317998.

This is an indication that we may need to introduce liveness probes:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Event Timeline

Change 842488 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-emailer@main] emailer: cfg: avoid deadlocks when reading configmap

https://gerrit.wikimedia.org/r/842488

Change 842502 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce k8s liveness probe support

https://gerrit.wikimedia.org/r/842502

Change 842488 merged by jenkins-bot:

[cloud/toolforge/jobs-framework-emailer@main] emailer: cfg: avoid deadlocks when reading configmap

https://gerrit.wikimedia.org/r/842488

Change 842502 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce k8s liveness probe support

https://gerrit.wikimedia.org/r/842502

Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-18T10:18:41Z] <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-emailer:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer (64385e9) (T320405) - cookbook ran by arturo@nostromo

Change 854536 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce decorator to factorice endless tasks management

https://gerrit.wikimedia.org/r/854536

aborrero claimed this task.

Since the latest round of updated this hasn't failed again.

Next iteration would be to add prometheus metrics for proper monitoring and alerting. I'll leave that for another task.

Change 854536 abandoned by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce decorator to factorice endless tasks management

Reason:

not working on this at the moment

https://gerrit.wikimedia.org/r/854536