Before calling maps production ready, we need to review alerting scheme and ensure we page the right people when something goes wrong and we don't wake up anyone if things are not that bad.
Summary of current alerts:
Critical services
Those services are directly used to serve content to users
| Service | Check description | Contact groups |
|---|---|---|
| cassandra | TCP check on Cassandra port | admins,team-services |
| cassandra | Service check | admins |
| kartotherian | Service checker (Swagger based) | admins |
| maps | 5xx rate | admins |
| maps-lb | HTTP checks on HTTP, HTTPS, IPv4 and IPv6 for each DC | admins,sms,admins |
| varnish | standard varnish checks | admins |
| kartotherian LVS | LVS check | admins,sms,admins |
Non critical services
Those services are use for tile generation, they have no direct user impact
| Service | Check description | Contact groups |
|---|---|---|
| tilerator | HTTP check | admins |
| tileratorui | HTTP check | admins |
| postgres | no check yet |