As a maintainer of the search infrastructure I want to monitor the update lag of the search indices so that I can evaluate if the system performance matches our expectations.
What we want to track here is the time needed for a change to propagate to elasticsearch, in other words the time spent in the update pipeline.
We do not want to track here the time elasticsearch takes to refresh its datastructure to make these changes visible to users (index refresh setting), this value can be extracted using a script at P17040.
The data should be aggregated on:
- the kind of update: revision based or page refresh (i.e. in cirrus world: LinksUpdatePrioritized vs LinksUpdate)
- the target elasticsearch cluster
Out of scope for this ticket is the update lag for the mjolnir batch update pipeline.
AC:
- a new set of metrics is available in graphite
- a new grafana dashboard is created show the values of these metrics