Page MenuHomePhabricator

Audit legacy mediawiki stats used in production dashboards
Closed, ResolvedPublic

Description

The team has suggested a migration that is driven by value. To facilitate this migration, we will use this task to keep track of a list of metrics that are currently in use, which we define as being used in dashboards in Grafana.

Our objective is to generate a list of dashboards that need to be converted, which will serve as a guide during the migration process. We will link these metrics to their respective positions in the queue in the subsequent tasks, and use this task as a prioritized list for the conversion process.

  • scripted audit of dashboards using graphite datasources, emit metrics used
    • establish initial set of tracking metrics
  • identify mechanism to send metrics from script to prometheus (e.g. pushgateway)
  • create graphite metric status dashboard

Top 10 metrics used in dashboards from one time audit (full list P54396)

26 MediaWiki.timing.editResponseTime
14 mw.performance.save
 8 MediaWiki.RevisionSlider.timing.init
 7 MediaWiki.Parsoid.html2wt.setup
 7 MediaWiki.Parsoid.html2wt.selser.serialize
 7 MediaWiki.Parsoid.html2wt.selser.domDiff
 7 MediaWiki.Parsoid.html2wt.init
 6 MediaWiki.wikibase.quality.constraints.type.php.success.entities
 6 MediaWiki.Parsoid.html2wt.total
 6 MediaWiki.Parsoid.html2wt.timePerInputKB

Details

TitleReferenceAuthorSource BranchDest Branch
initial debian packagingrepos/sre/python-verlib2!1herronpackaging-wikimediamain
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedherron
OpenNone
DuplicateDAlangi_WMF
DuplicateNone
DuplicateNone
OpenJgiannelos
Resolvedcolewhite
ResolvedDAlangi_WMF
OpenNone
In ProgressNone
Resolvedcolewhite
In Progresscolewhite
Resolvedcolewhite
DuplicateNone
OpenDAlangi_WMF
OpenNone
ResolvedDAlangi_WMF
OpenNone
OpenNone
OpenNone
Resolvedlarissagaulia
Opencolewhite
ResolvedTarrow
Resolvedcolewhite
OpenNone
Resolvedcolewhite
OpenNone
In ProgressAnnWF
OpenNone
Openandrea.denisse
OpenNone
OpenNone
OpenNone
OpenNone
DuplicateNone
Resolvedandrea.denisse
ResolvedTK-999
OpenNone
OpenNone
OpenNone
Resolvedcolewhite
Resolvedcolewhite
Resolvedcolewhite
OpenNone
Resolvedcolewhite
In ProgressAnnWF
Resolvedcolewhite
OpenNone
Resolvedandrea.denisse
OpenNone
Resolvedandrea.denisse
OpenNone
DuplicateNone
OpenNone
OpenNone
ResolvedAnnWF
OpenNone
OpenNone
OpenNone
Resolvedcolewhite
OpenNone
Resolvedcolewhite
ResolvedTarrow
OpenNone
Resolvedcolewhite
OpenNone
Resolvedcolewhite
Resolvedcolewhite
DuplicateNone
OpenNone
DuplicateNone
DuplicateNone
DuplicateNone
DuplicateNone
OpenNone
OpenNone
DuplicateNone
DuplicateNone
Resolvedcolewhite
DuplicateNone
Resolvedcodebug
ResolvedTK-999
Openlarissagaulia
OpenNone
DuplicateNone
ResolvedAnnWF
OpenJgiannelos
Resolvedcolewhite
ResolvedTK-999
DuplicateNone
DeclinedNone
OpenTarrow
DuplicateNone
Resolvedcolewhite
OpenNone
ResolvedSecuritycolewhite
Resolvedcolewhite
OpenNone
OpenNone
OpenNone
Resolvedcolewhite
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
DuplicateNone
Resolvedlmata
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
In ProgressAnnWF
InvalidNone
OpenNone
OpenNone
ResolvedAnnWF
InvalidNone
ResolvedFGoodwin
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedcolewhite
InvalidNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
DuplicateNone
Resolvedcolewhite
OpenNone
OpenNone

Event Timeline

herron renamed this task from Audit & convert stats in use in production to statslib to Audit legacy mediawiki stats used in production.Nov 9 2023, 2:46 PM
herron renamed this task from Audit legacy mediawiki stats used in production to Audit legacy mediawiki stats used in production dashboards.
herron triaged this task as Medium priority.

I spent some time today experimenting with https://github.com/grafana/cortex-tools, specifically cortextool analyse grafana which looked promising, but unfortunately throws parse errors when it encounters a period in the metric name which makes it not suitable for graphite metrics.

So instead I've been working on a simple script to walk the dashboard api looking for dashboards with graphite datasource, and output the metrics used. However, instead of producing a one time/manual report here I'm thinking we should build some ongoing status reporting.

I'm thinking the next step here is to expand the script to output a few metrics that capture the ongoing state of graphite utilization to something like prometheus push gateway, and build a status dashboard using these metrics. With T350825 we could possibly annotate panels with relevant commits as well. I'll expand the task description to include high level steps for that.

Very draft metric list (to be expanded/refined/clarified)

  • Dashboards using graphite datasource
  • Annotations using graphite datasource
  • Panels using graphite datasource
  • Graphite metric count

Change 980048 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] grafana: add dashboard graphite usage exporter

https://gerrit.wikimedia.org/r/980048

herron updated the task description. (Show Details)

Change 980048 merged by Herron:

[operations/puppet@production] grafana: add dashboard datasource usage (graphite) exporter

https://gerrit.wikimedia.org/r/980048

herron closed this task as Resolved.EditedJan 17 2024, 8:47 PM

A custom grafana graphite datasource exporter, and a grafana dashboard using these metrics to outline current graphite datasource utilization have been deployed.

This will let us track real-time utilization in terms of how many dashboards and panels are still actively using the legacy graphite datasource (metrics updated hourly)

Dashboard is located at https://grafana.wikimedia.org/d/K6DEOo5Ik/grafana-graphite-datasource-utilization

With that I think we're done here!