Page MenuHomePhabricator

EPIC: Convert CirrusSearch metrics to statslib
Closed, ResolvedPublic

Description

Task to track the efforts/progress made on converting CirrusSearch metrics from graphite to prometheus using statslib.

The work to do is:

The work does not have to be done all at once but working on metrics used by icinga should be prioritized according to T350597.

AC:

  • CirrusSearch no longer writes to (nor its operations depend on) graphite/icinga

Metrics initially identified:

  • MediaWiki.CirrusSearch.$cirrus_group.backend_failure.*
  • MediaWiki.CirrusSearch.$cirrus_group.backend_failure.*.rate
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.comp_suggest.p50
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.comp_suggest.p75
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.comp_suggest.p95
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.comp_suggest.p99
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.comp_suggest.sample_rate
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.full_text.p50
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.full_text.p75
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.full_text.p95
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.full_text.p99
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.full_text.sample_rate
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.more_like.p50
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.more_like.p75
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.more_like.p95
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.more_like.sample_rate
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.*.p50
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.prefix.p50
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.prefix.p75
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.prefix.p95
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.prefix.p99
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.prefix.sample_rate
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.*.sample_rate
  • MediaWiki.CirrusSearch.$cirrus_group.requestTimeMs.*.sum
  • MediaWiki.CirrusSearch.$cirrus_group.requestTime.p50
  • MediaWiki.CirrusSearch.$cirrus_group.requestTime.p75
  • MediaWiki.CirrusSearch.$cirrus_group.requestTime.p95
  • MediaWiki.CirrusSearch.$cirrus_group.requestTime.p99
  • MediaWiki.CirrusSearch.$cluster.backend_failure.failed.count
  • MediaWiki.CirrusSearch.$cluster.backend_failure.rejected.count
  • MediaWiki.CirrusSearch.$cluster.backend_failure.unknown.count
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.comp_suggest.p50
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.comp_suggest.p75
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.comp_suggest.p95
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.comp_suggest.p99
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.comp_suggest.sample_rate
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.full_text.p50
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.full_text.p75
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.full_text.p95
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.full_text.p99
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.full_text.sample_rate
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.more_like.p50
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.more_like.p75
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.more_like.p95
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.more_like.sample_rate
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.*.p50
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.prefix.p50
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.prefix.p75
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.prefix.p95
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.prefix.p99
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.prefix.sample_rate
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.*.sample_rate
  • MediaWiki.CirrusSearch.$cluster.requestTimeMs.*.sum
  • MediaWiki.CirrusSearch.$cluster.requestTime.p50
  • MediaWiki.CirrusSearch.$cluster.requestTime.p75
  • MediaWiki.CirrusSearch.$cluster.requestTime.p95
  • MediaWiki.CirrusSearch.$cluster.requestTime.p99
  • MediaWiki.CirrusSearch.$cluster.updates.all.lag.$change_type.mean
  • MediaWiki.CirrusSearch.cloudelastic.updates.all.*.rate
  • MediaWiki.CirrusSearch.codfw.requestTime.p95
  • MediaWiki.CirrusSearch.codfw.updates.all.*.rate
  • MediaWiki.CirrusSearch.codfw.updates.all.sent.rate
  • MediaWiki.CirrusSearch.codfw.updates.details.*.*.sent.rate
  • MediaWiki.CirrusSearch.eqiad.requestTime.p95
  • MediaWiki.CirrusSearch.eqiad.updates.all.lag.page_change.mean
  • MediaWiki.CirrusSearch.eqiad.updates.all.*.rate
  • MediaWiki.CirrusSearch.poolCounter.*
  • MediaWiki.CirrusSearch.poolCounter.$pool_counter.failureMs.sample_rate
  • MediaWiki.CirrusSearch.poolCounter.$pool_counter.successMs.sample_rate
  • MediaWiki.CirrusSearch.poolCounter.*.failureMs.sample_rate
  • MediaWiki.CirrusSearch.query_cache.more_like.hit.rate
  • MediaWiki.CirrusSearch.query_cache.more_like.miss.rate
  • MediaWiki.CirrusSearch.results.file_duplicates.count
  • MediaWiki.CirrusSearch.*.updates.all.doc_size.p95
  • MediaWiki.CirrusSearch.*.updates.all.doc_size.p99
  • MediaWiki.CirrusSearch.*.updates.all.doc_size.upper

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel triaged this task as High priority.Mar 4 2024, 4:28 PM
Gehel moved this task from needs triage to Current work on the Discovery-Search board.
Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.
dcausse renamed this task from Convert CirrusSearch metrics to statslib to EPIC: Convert CirrusSearch metrics to statslib.Mar 4 2024, 4:52 PM
dcausse added a project: Epic.
dcausse moved this task from Incoming to Epics on the Discovery-Search (Current work) board.

Change #1047513 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert DataSender metrics to new Stats library

https://gerrit.wikimedia.org/r/1047513

Change #1047514 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert ElasticaWrite metrics to new Stats library

https://gerrit.wikimedia.org/r/1047514

Change #1047515 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert Pool Counter metrics to new Stats library

https://gerrit.wikimedia.org/r/1047515

Change #1047516 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert query cache metrics to new Stats library

https://gerrit.wikimedia.org/r/1047516

Change #1047517 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert saneitizer metrics to new Stats library

https://gerrit.wikimedia.org/r/1047517

Change #1047518 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert file duplicate metrics to new Stats library

https://gerrit.wikimedia.org/r/1047518

Change #1047519 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert ElasticsearchIntermediary metrics to new Stats library

https://gerrit.wikimedia.org/r/1047519

Change #1047520 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Convert deepcat sparql timing metrics to new Stats library

https://gerrit.wikimedia.org/r/1047520

Change #1047514 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert ElasticaWrite metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047514

Change #1047515 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert Pool Counter metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047515

Change #1047516 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert query cache metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047516

Change #1047517 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert saneitizer metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047517

Change #1047518 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert file duplicate metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047518

Change #1047519 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert ElasticsearchIntermediary metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047519

Change #1047520 abandoned by DCausse:

[mediawiki/extensions/CirrusSearch@master] Convert deepcat sparql timing metrics to new Stats library

Reason:

squashed

https://gerrit.wikimedia.org/r/1047520

Change #1047513 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Convert CirrusSearch metrics to new Stats library

https://gerrit.wikimedia.org/r/1047513

Change #1051682 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Fix naming of new prometheus metrics

https://gerrit.wikimedia.org/r/1051682

Change #1051682 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Fix naming of new prometheus metrics

https://gerrit.wikimedia.org/r/1051682

Change #1053825 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Re-add CirrusSearch prefix to statsd metrics

https://gerrit.wikimedia.org/r/1053825

Change #1053838 had a related patch set uploaded (by DCausse; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@wmf/1.43.0-wmf.13] Re-add CirrusSearch prefix to statsd metrics

https://gerrit.wikimedia.org/r/1053838

Change #1053825 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Re-add CirrusSearch prefix to statsd metrics

https://gerrit.wikimedia.org/r/1053825

Change #1053838 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.43.0-wmf.13] Re-add CirrusSearch prefix to statsd metrics

https://gerrit.wikimedia.org/r/1053838

Mentioned in SAL (#wikimedia-operations) [2024-07-12T09:10:56Z] <dcausse@deploy1002> Started scap sync-world: Backport for [[gerrit:1053838|Re-add CirrusSearch prefix to statsd metrics (T359033)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-12T09:13:27Z] <dcausse@deploy1002> dcausse: Backport for [[gerrit:1053838|Re-add CirrusSearch prefix to statsd metrics (T359033)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-07-12T09:20:40Z] <dcausse@deploy1002> Finished scap: Backport for [[gerrit:1053838|Re-add CirrusSearch prefix to statsd metrics (T359033)]] (duration: 09m 44s)

Change #1054317 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/alerts@master] team-search-platform: migrate cirrus_cluster_checks

https://gerrit.wikimedia.org/r/1054317

Change #1054374 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/alerts@master] team-search-platform: migrate cirrus latencies & mem alert

https://gerrit.wikimedia.org/r/1054374

Change #1054647 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] elasticsearch: remove obsolete alerts

https://gerrit.wikimedia.org/r/1054647

Change #1054317 merged by jenkins-bot:

[operations/alerts@master] team-search-platform: migrate cirrus_cluster_checks

https://gerrit.wikimedia.org/r/1054317

Change #1054374 merged by jenkins-bot:

[operations/alerts@master] team-search-platform: migrate cirrus latencies & mem alert

https://gerrit.wikimedia.org/r/1054374

Change #1054647 merged by Bking:

[operations/puppet@production] elasticsearch: remove obsolete alerts

https://gerrit.wikimedia.org/r/1054647

Starting from MW 1.44.0-wmf.14 CirrusSearch should no longer push any metrics to graphite.

Starting from MW 1.44.0-wmf.14 CirrusSearch should no longer push any metrics to graphite.

\o/ thanks!

dcausse claimed this task.