The daemon can die without Kubernetes noticing, see T317998.
This is an indication that we may need to introduce liveness probes:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
The daemon can die without Kubernetes noticing, see T317998.
This is an indication that we may need to introduce liveness probes:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | BUG REPORT | aborrero | T317998 toolforge-jobs emails not working | ||
Resolved | aborrero | T320405 toolforge jobs-framework-emailer: increase reliability |
Change 842488 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[cloud/toolforge/jobs-framework-emailer@main] emailer: cfg: avoid deadlocks when reading configmap
Change 842502 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce k8s liveness probe support
Change 842488 merged by jenkins-bot:
[cloud/toolforge/jobs-framework-emailer@main] emailer: cfg: avoid deadlocks when reading configmap
Change 842502 merged by Arturo Borrero Gonzalez:
[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce k8s liveness probe support
Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-18T10:18:41Z] <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-emailer:latest from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer (64385e9) (T320405) - cookbook ran by arturo@nostromo
Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-18T10:24:33Z] <wm-bot2> deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer (64385e9) (T320405) - cookbook ran by arturo@nostromo
Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-18T10:30:19Z] <wm-bot2> deployed kubernetes component https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-emailer (64385e9) (T320405) - cookbook ran by arturo@nostromo
Change 854536 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce decorator to factorice endless tasks management
Since the latest round of updated this hasn't failed again.
Next iteration would be to add prometheus metrics for proper monitoring and alerting. I'll leave that for another task.
Change 854536 abandoned by Arturo Borrero Gonzalez:
[cloud/toolforge/jobs-framework-emailer@main] emailer: introduce decorator to factorice endless tasks management
Reason:
not working on this at the moment