Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Sebastian_Berlin-WMSE | T234508 Set up server monitoring | |||
| Declined | None | T239043 Install Monit | |||
| Resolved | Lokal_Profil | T239059 Decide on monitoring solution | |||
| Resolved | Lokal_Profil | T332679 Document how we use Grafana internally | |||
| Resolved | Lokal_Profil | T332681 Phase out Down notifier | |||
| Resolved | Lokal_Profil | T331420 Try Grafana for alerting when pages are unreachable |
Event Timeline
I looked around a bit and it seems like Nagios is widely used, but Monit may be better suited for is since it's not as involved (https://serverfault.com/questions/425718/monit-versus-nagios).
Let make this the next "server" project once Matomo is up and running and in use. Would be nice to have this set up so that things we migrate can automatically be added to it.
After looking around a bit, Monit only works locally. You can do some remote checks (e.g. https://lists.nongnu.org/archive/html/monit-general/2007-08/msg00092.html), but it seems that M/Monit is what you really want. This is a payed service with a 30 days trial available. It costs 65€ as a one time fee to monitor up to 5 hosts (more prices). The Serverfault answer above mentions that it's not very good for multiple servers, but also mentions "a larger number of hosts", so 🤷.
I found mentions of more solutions that may work for us, including: Munin, Cacti and OpenNMS. We should start be specifying what our minimum requirements are and see what fits.
I've added some requirements to the descriptions. @Lokal_Profil, can you have a look if there are any more that should be added and/or if you have opinions on the ones that I added?
I broke out the process of finding a solution to T239059. Please add the things mentioned above there.
I think Influx should be able to do this. @kalle, if I remember correctly you looked at sending email for certain statuses. Did you manage to get that working?