In order to be able to debug and monitor health of the rendering service we should log failed requests.
Checklist:
- How often is the timeout hit
- Do we get a response from the service that is not 200
Notes
- It might make sense to talk to Campsite people how to do this. Right now it is unclear, what is technically feasible.
- The respective exceptions are caught here & here
- See: https://www.mediawiki.org/wiki/Manual:Structured_logging and https://wikitech.wikimedia.org/wiki/Logstash for logging
- https://wikitech.wikimedia.org/wiki/Graphite and https://wikitech.wikimedia.org/wiki/Statsd for possible statistics on number of failures
- consider separating logging "bucket" for statistics into the mediawiki client reading the termbox SSR and the SSR service itself (e.g. wikibase.termbox.mwclient and wikibase.termbox.server)
- Logging grep e.g. $this->logger->debug(
- consider adding a reviewer with experience in that area
Decisions Made
Logs are not specially namespaced beyond Wikibase but mention the method name to aid in making them discoverable in Logstash. Seems to follow precedence.
Metrics will go in wikibase.repo.TermboxRemoteRenderer.unsuccessfulResponse and wikibase.repo.TermboxRemoteRenderer.requestTimeout.