Page MenuHomePhabricator

Add /healthz system status check endpoint
Closed, ResolvedPublic

Description

Using a /healthz route as a check for service health is a common Kubernetes convention (which seems to come from Google internal practices).

There are a few Django packages intended for making this sort of health check:

Event Timeline

After some discussion on irc with @CDanis, @Joe, @RLazarus, and @JMeybohm I have a different understanding of /healthz. Basically we shuold only be testing that the container's main process (whatever our wsgi container is) is alive and the app is mounted. Testing external dependencies like databases and cache systems is not desired.

@CDanis summarized the discussion like this:

[16:51]  <   cdanis> bd808: yeah, basically liveness should just demonstrate that your process isn't deadlocked or unresponsive.  if it does depending on external things -- especially if you have transitive chains of that -- thrashing lots of pods with restarts becomes a real concern, amongst other things

Change 708608 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] monitoring: Add /healthz endpoint

https://gerrit.wikimedia.org/r/708608

bd808 moved this task from Backlog to Review on the Toolhub board.

Change 708608 merged by jenkins-bot:

[wikimedia/toolhub@main] monitoring: Add /healthz endpoint

https://gerrit.wikimedia.org/r/708608