Change Details

Currently we rely on manual testing and user reports to notice if a MT service is not working. This is not optimal. There are at least three types of failures: # External service fails with a specific content. # External service is down or too slow. # External service fails because of a configuration error (e.g. expired key, over quota etc.) With automated monitoring (with alerts) we cannot capture 1, but we can at least immediately see if it is 2 or 3 and investigate more. == Current status * Errors are logged with minimal details (HTTP status code, language pair) to ...(LogStash?. local logs?)We can only WMF hosted services (ie Apertium) stack trace properly. * No alerts or overview over time. == Possible options === CX internal CX could internally ping the services with a fixed request and log response time / failure state. How to get alerts? Where to log? Can we graph it? === CX ping-api CX could introduce a new api "ping" that can be used to check service status without authorization. The API only returns up/down and maybe response time. This should be easy to integrate with existing monitoring tools which can also provide alerts === Direct endpoint monitoring We could also try to directly ping the APIs, but without keys, we would only know if service is unreachable.