In T362503 we found out that some requests seem to have caused high processing time during preprocess() ending up in a total failure of the isvc (clients hanging for several seconds before failing). This is likely due to a bug in revscoring, and since we don't log the json payload of every request landing to our isvcs, we don't have a good way to find repro use cases.
We should think about adding one of the following (or both?):
- Logs of the request's JSON payload landing to every isvcs. This seems to be the easiest but it will produce a lot of logs, and it might add some complexity when reviewing our access logs (more data is not always the better). Also in case we pass very complex strings etc.. into the JSON payload we may not want them to be printed every time (say a string that is 100 lines long). We shouldn't have these use cases yet, but worth to mention.
- Log verbosely the request's JSON payload only if preprocess() or process() fail or take too much time to complete (say more than X seconds).
If we had the second option before the ruwiki outage we'd have seen slow logs in the isvcs's access logs (or on logstash) with a clear way on how to reproduce the problem.