Production monitoring improvements so we know when something is wrong. Right now we know if it’s up, what the network access rate is, hardware resource consumption. Currently preventing us from debugging render/parser slowness
https://wikitech.wikimedia.org/wiki/Wikifunctions/Performance_observability
Definition of Done
- we are able to pinpoint the source of performance issues in our features [this blocks later caching work]
- we can distinguish slowness independently between different components e.g. python vs. JS
- we can spot if an edge case has happened that’s not covered by the end-to-end tests (bonus: alerting on this)