Page MenuHomePhabricator

Monitor CirrusSearch update lag
Closed, ResolvedPublic5 Estimated Story Points

Description

As a maintainer of the search infrastructure I want to monitor the update lag of the search indices so that I can evaluate if the system performance matches our expectations.

What we want to track here is the time needed for a change to propagate to elasticsearch, in other words the time spent in the update pipeline.
We do not want to track here the time elasticsearch takes to refresh its datastructure to make these changes visible to users (index refresh setting), this value can be extracted using a script at P17040.

The data should be aggregated on:

  • the kind of update: revision based or page refresh (i.e. in cirrus world: LinksUpdatePrioritized vs LinksUpdate)
  • the target elasticsearch cluster

Out of scope for this ticket is the update lag for the mjolnir batch update pipeline.

AC:

  • a new set of metrics is available in graphite
  • a new grafana dashboard is created show the values of these metrics

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as High priority.Nov 21 2022, 4:38 PM
Gehel edited projects, added Discovery-Search; removed Discovery-Search (Current work).
Gehel moved this task from needs triage to Ops / SRE on the Discovery-Search board.

Change 896165 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Monitor update lag via LinksUpdateComplete

https://gerrit.wikimedia.org/r/896165

Change 896166 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Monitor update lag from onUploadComplete

https://gerrit.wikimedia.org/r/896166

Change 896327 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Monitor page deletion lag via onArticleDeleteComplete

https://gerrit.wikimedia.org/r/896327

Change 896165 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Monitor update lag via LinksUpdateComplete

https://gerrit.wikimedia.org/r/896165

Change 896166 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Monitor update lag from onUploadComplete

https://gerrit.wikimedia.org/r/896166

Change 896327 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Monitor page deletion lag via onArticleDeleteComplete

https://gerrit.wikimedia.org/r/896327

Change 929385 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Consider api-purge LinksUpdate non-prioritized

https://gerrit.wikimedia.org/r/929385

Change 929385 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Consider api-purge LinksUpdate non-prioritized

https://gerrit.wikimedia.org/r/929385