Currently we rely on manual testing and user reports to notice if a MT service is not working. This is not optimal.
There are at least three types of failures:
# External service fails with a specific content
# External service is down or too slow
# External service fails because of a configuration error (e.g. expired key, over quota etc.)
With automated monitoring (with alerts) we cannot capture 1, but we can at least immediately see if it is 2 or 3 and investigate more.
== Current status
* Errors are logged with minimal details (HTTP status code, language pair) to ...(LogStash? local logs?)
* No alerts or overview over time
== Possible options
=== CX internal
CX could internally ping the services with a fixed request and log response time / failure state.
How to get alerts? Where to log? Can we graph it?
=== CX ping-api
CX could introduce a new api "ping" that can be used to check service status without authorization. The API only returns up/down and maybe response time.
This should be easy to integrate with existing monitoring tools which can also provide alerts
=== Direct endpoint monitoring
We could also try to directly ping the APIs, but without keys, we would only know if service is unreachable.