Page MenuHomePhabricator

Kubernetes emitting ProbeWarning events for toolhub-main container
Closed, ResolvedPublicBUG REPORT

Description

https://logstash.wikimedia.org/goto/5a705b89015604f6d4a7e7827b8bad53

It looks like every ready probe against the main container is emitting a ProbeWarning message (excerpt from logstash below).

"k8s_event": {
      "firstTimestamp": "2021-10-19T16:49:50Z",
      "message": "Readiness probe warning: ",
      "reportingInstance": "",
      "reportingComponent": "",
      "reason": "ProbeWarning",
      "source": {
        "component": "kubelet",
        "host": "kubernetes1011.eqiad.wmnet"
      },
      "type": "Warning",
      "metadata": {
        "resourceVersion": "114064680",
        "selfLink": "/api/v1/namespaces/toolhub/events/toolhub-main-779789d6fd-9bsw6.16af7c9a12f8b1ca",
        "name": "toolhub-main-779789d6fd-9bsw6.16af7c9a12f8b1ca",
        "creationTimestamp": "2021-10-19T16:49:50Z",
        "namespace": "toolhub",
        "uid": "b3e4b4be-703c-4a59-8a69-64bbff39f7d2"
      },
      "eventTime": null,
      "lastTimestamp": "2021-10-21T22:44:30Z",
      "involvedObject": {
        "resourceVersion": "113229270",
        "name": "toolhub-main-779789d6fd-9bsw6",
        "fieldPath": "spec.containers{toolhub-main}",
        "kind": "Pod",
        "namespace": "toolhub",
        "uid": "1720e6cf-f986-4754-8680-7ef711247da1",
        "apiVersion": "v1"
      },
      "count": 19409
    },

Possibly related to this notice from the Kubernetes 1.14 release notes?

Health check (liveness & readiness) probes using an HTTPGetAction will no longer follow redirects to different hostnames from the original probe request. Instead, these non-local redirects will be treated as a Success (the documented behavior). In this case an event with reason "ProbeWarning" will be generated, indicating that the redirect was ignored. If you were previously relying on the redirect to run health checks against different endpoints, you will need to perform the healthcheck logic outside the Kubelet, for instance by proxying the external endpoint rather than redirecting to it. (#75416, @tallclair)

Event Timeline

Possibly related to this notice from the Kubernetes 1.14 release notes?

I think that's a correct assumption. Your service currently replies with a 301 Location: https://toolhub.wikimedia.org/healthz which the kubelet will not follow (and therefore emit the warning). I'd suggest you disable https redirect for that particular endpoint in your service.

Change 734303 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] config: Do not force TLS for health checks

https://gerrit.wikimedia.org/r/734303

Change 734303 merged by jenkins-bot:

[wikimedia/toolhub@main] config: Do not force TLS for health checks

https://gerrit.wikimedia.org/r/734303

bd808 changed the task status from Open to In Progress.Oct 25 2021, 3:55 PM
bd808 claimed this task.
bd808 moved this task from Backlog to Review on the Toolhub board.

Change 734355 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Bump container version to 2021-10-25-160227-production

https://gerrit.wikimedia.org/r/734355

Change 734355 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Bump container version to 2021-10-25-160227-production

https://gerrit.wikimedia.org/r/734355

Thank you very much for pointing out the 301 redirect @JMeybohm. The ProbeWarning events stopped in each cluster as 2021-10-25-160227-production rolled out. The last event was posted at 2021-10-25T17:32:18Z.