Make sure that all SLIs (see T335498) are being measured, so that we can ensure that we are meeting operational service level objectives. This task is about metric collection, creating the dashboards will follow in T338009.
We need some discussion on how to collect those metrics. Should we reuse the Search Satisfaction Schema? Should we use statsv (see also T315091#8311847)? As an example, there is a pageview grafana dashboard that could be helpful.
Summary of the chosen SLOs:
Latency
- Special:Search latency
- The amount of time it takes to return search results for a query. This includes all extra search features:
- sister search
- did you mean
- The amount of time it takes to return search results for a query. This includes all extra search features:
- MediaSearch latency
- The amount of time it takes MediaSearch to return media results
- Autocomplete latency
- The amount of time it takes to return article suggestions based on autocompleted strings in the search bar
- Search preview latency
- The amount of time it takes for the search preview and its elements to respond to user actions
- NTH: Bot latency
- What is a reasonable amount of latency for bots? What is the best way to measure this?
Updates
- Search update lag
- The amount of time it takes updates/edits to wikis to reflected in search results – i.e. how long do I have to wait before I can search for something I just changed?
AC:
- instrumentation exists for all SLIs defined in T335498