Page MenuHomePhabricator

Set up server monitoring
Closed, ResolvedPublic

Description

Since we've had some trouble with servers in the past (T233814, T202420, T196636...) it may be a good idea to set up a server monitoring tool for our more crucial servers. This should help us notice when something goes wrong and possibly even fix some things automatically.

Event Timeline

I looked around a bit and it seems like Nagios is widely used, but Monit may be better suited for is since it's not as involved (https://serverfault.com/questions/425718/monit-versus-nagios).

Let make this the next "server" project once Matomo is up and running and in use. Would be nice to have this set up so that things we migrate can automatically be added to it.

After looking around a bit, Monit only works locally. You can do some remote checks (e.g. https://lists.nongnu.org/archive/html/monit-general/2007-08/msg00092.html), but it seems that M/Monit is what you really want. This is a payed service with a 30 days trial available. It costs 65€ as a one time fee to monitor up to 5 hosts (more prices). The Serverfault answer above mentions that it's not very good for multiple servers, but also mentions "a larger number of hosts", so 🤷.

I found mentions of more solutions that may work for us, including: Munin, Cacti and OpenNMS. We should start be specifying what our minimum requirements are and see what fits.

I've added some requirements to the descriptions. @Lokal_Profil, can you have a look if there are any more that should be added and/or if you have opinions on the ones that I added?

I broke out the process of finding a solution to T239059. Please add the things mentioned above there.

I don't see me having enough time to look into this in the foreseeable future.

Note that we can no longer add new services to downnotifier

I think Influx should be able to do this. @kalle, if I remember correctly you looked at sending email for certain statuses. Did you manage to get that working?