Page MenuHomePhabricator

2023-11-01-154653 -> 2023-11-06-164826 bump of Python function-evaluator is broken (but not JS)
Closed, ResolvedPublic

Description

https://logstash.wikimedia.org/app/dashboards#/view/d43f9bf0-17b5-11eb-b848-090a7444f26c?_g=h@15c3209&_a=h@287acde

Readiness probe failed: Get "http://10.64.75.43:6927/_info": dial tcp 10.64.75.43:6927: connect: connection refused etc.

Event Timeline

helmfile will rollback the deployment in case it does not get "ready" within 10 minutes (timeout parameter at the top of your helmfile.yaml). HelmReleaseBadStatus might be firing temporarily in that case but that's fine. Aborting helmfile is usually a bad idea as it might leave the deployment in a bad state (e.g. not rolled back). I'll check in a minute.

Generally this (e.g. deployment not going through) means that the new version of your thing does not get Ready in k8s terms (e.g. the container does not start, the readinessProbe fails or alike)

The automatic rollback has completed successfully, so no issue there. You might try again any time.
Forgot to say: The helm-releases dashboard does only show chart-versions (which did not change in your case) and the time of last deployment. So just from that it's not possible to tell if your new image version is running or not.

Aha, thanks! Will investigate at our end then!

Jdforrester-WMF renamed this task from helmfile -e staging -i apply --context 5 times out for version bump of Python function-evaluator (but not JS) to 2023-11-01-154653 -> 2023-11-06-164826 bump of Python function-evaluator is broken (but not JS).Nov 7 2023, 2:17 PM
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF changed the status of subtask T350700: Corruption in RustPython Binary from Open to In Progress.
Jdforrester-WMF moved this task from To Triage to Backlog on the Abstract Wikipedia team board.