Page MenuHomePhabricator

[Epic] Introduce metrics that help observe the impact of Recent Change related work
Open, Needs TriagePublic

Description

As a WMDE staff working on Wikidata and Wikimedia projects integration I want to be able to see the influence of our work on the performance of Recent Changes, so that I can understand the impact of our work and take necessary measures when needed.

Wikidata changes are likely to result in Recent Changes entries on wikis that make use of Wikidata data in their articles.

Recent Changes queries might become unnecessarily slow if there’s a high number of Wikidata-related rows that are not relevant for the most of Recent Changes queries.

A suggested (by @Ladsgroup ) rule of thumb/heuristic to distinguish between “intended” and “not intended” state seems to be to look at the ratio of entries in Recent Changes table at a given point in time: number of number entries related to Wikidata vs entries related to Mediawiki (article) edits. There’s no absolute “good” figure but general expected trend is to keep the ratio low. i.e. low ratio is intended, high ratio indicates there’s likely a high number of Recent Changes “noise” that’s not relevant for most/any editors making use of Recent Changes

Just observing the ratio of Recent Change entries per wiki would only provide a view on part of the "truth", it would describe the content of Recent Change table but itself it wouldn't express anything on how the database query performance is affected. Therefore that kind of metric should be observed together with a relevant metric signifying Recent Changes query performance/slowness.

To be defined further/refined

  • what metrics exactly should be observed (in particular, what database query metric to look at)
  • how to observe those (e.g. a dashboard)

Event Timeline

Unstructured notes from the meeting with @Neslihan_Turan_WMDE @seanleong-WMDE @SuzanneWood-WMDE and @Nicholusmuwonge_wmde on 2026-04-14

Might want to track the metrics daily (count per day, no need to monitor in more granular way)

  • tracking different entity usages (what exactly) - they correlated with Recent change but the exact relation is not
  • ratio in recent changes per wiki (amount of RC rows coming from wikibase / amount of all RC rows)
  • amount/ratio of RC entries that would have resulted in no change in article HTML (note: requires implementing Lil Diff)
  • would be useful to be able to track how different Properties as predicates of the statements used in a wiki are represented in RC rows

Concerns:

  • lack of baseline to compare

Tracking different entity usages

Ratio in recent changes per wiki

I discussed this a bit with @Neslihan_Turan_WMDE recently. One thing to note is that we need to be able to run PySpark based Airflow jobs for this work to be possible with Airflow. This is my understanding of the situation, and MariaDB tables including the various recentchanges tables are not in the data lake. We'd need to use wmfdata-python for this. I'll be doing some research into deploying Python modules in the coming weeks, and WIT engineers being included in this process would be welcome. This is also needed for operationalizing the WD changes preference metrics, likely, so WIT has more need for this kind of DAG (pipeline) than other teams.

We could also maybe do this with analytics/wmde/scripts - i.e. PHP and SQL queries? The resulting metrics would be on Grafana in this case. I'm fine to go in either direction, but it would be nice to know if there's a preference and if accessing data from recentchanges tables is possible in analytics/wmde/scripts.

Something to note that when wikidata injects a row, it bumps a metric in mediawiki called mediawiki_WikibaseClient_PageUpdates_InjectRCRecords_run_titles_total which is basis of what I use to check: https://grafana.wikimedia.org/d/000000378/ladsgroup-test?orgId=1&from=now-3d&to=now&timezone=utc&viewPanel=panel-32

It looks extremely spiky even after it's being averaged for an hour so this could lead to finding some issues there.

Lucyfediachambers renamed this task from Introduce metrics that help observe the impact of Recent Change related work to [Epic] Introduce metrics that help observe the impact of Recent Change related work.Fri, May 15, 12:15 PM
Lucyfediachambers added a project: Epic.