Page MenuHomePhabricator

Setup monitoring for kubernetes core components.
Closed, DeclinedPublic

Description

As part of https://phabricator.wikimedia.org/T130972#2178746 we realized there is no monitoring surrounding this so we'll have to get that going before we make it live for real users.

Event Timeline

Minimum required is just to check:

  1. All the processes that are running are running
  2. All the things that should be marked as ready are marked as ready

Not fully sure how to do this now.

Change 297575 had a related patch set uploaded (by Yuvipanda):
tools: Add a check for k8s backed webservices

https://gerrit.wikimedia.org/r/297575

Change 297575 merged by Yuvipanda:
tools: Add a check for k8s backed webservices

https://gerrit.wikimedia.org/r/297575

I'm going to do this via toolschecker...

Change 297771 had a related patch set uploaded (by Yuvipanda):
tools: Fix k8s webservice backend check

https://gerrit.wikimedia.org/r/297771

Change 297771 merged by Yuvipanda:
tools: Fix k8s webservice backend check

https://gerrit.wikimedia.org/r/297771

Change 297774 had a related patch set uploaded (by Yuvipanda):
tools: Add icinga check for kubernetes webservice

https://gerrit.wikimedia.org/r/297774

Change 297774 merged by Yuvipanda:
tools: Add icinga check for kubernetes webservice

https://gerrit.wikimedia.org/r/297774

This will check for the webservice to start and stop, which is exercising the following things;

  1. Master is reachable and responsive
  2. Docker registry is reachable and responsive
  3. there's enough capacity to schedule at least one web pod
  4. kube2proxy and whole proxying system is reachable

That's a pretty ok if convoluted check!

yuvipanda raised the priority of this task from Medium to High.Jul 13 2016, 3:04 PM
Bstorm subscribed.

At this point, while we have monitoring, we need to set up something more of a monitor for toolforge in general, which is not really captured by this ticket.