Readiness probe failed: Get "http://10.64.75.43:6927/_info": dial tcp 10.64.75.43:6927: connect: connection refused etc.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Jdforrester-WMF | T350685 2023-11-01-154653 -> 2023-11-06-164826 bump of Python function-evaluator is broken (but not JS) | |||
| Resolved | BUG REPORT | cmassaro | T350700 Corruption in RustPython Binary |
Event Timeline
helmfile will rollback the deployment in case it does not get "ready" within 10 minutes (timeout parameter at the top of your helmfile.yaml). HelmReleaseBadStatus might be firing temporarily in that case but that's fine. Aborting helmfile is usually a bad idea as it might leave the deployment in a bad state (e.g. not rolled back). I'll check in a minute.
Generally this (e.g. deployment not going through) means that the new version of your thing does not get Ready in k8s terms (e.g. the container does not start, the readinessProbe fails or alike)
See https://logstash.wikimedia.org/goto/250702d316f57befaa341112c63fa99e for the k8s events emitted during your deployment
The automatic rollback has completed successfully, so no issue there. You might try again any time.
Forgot to say: The helm-releases dashboard does only show chart-versions (which did not change in your case) and the time of last deployment. So just from that it's not possible to tell if your new image version is running or not.