Page MenuHomePhabricator

Investigate low hanging fruit for further change propagation reduction
Open, Needs TriagePublic

Assigned To
Authored By
JoelyRooke-WMDE
Nov 20 2025, 1:58 PM
Referenced Files
Restricted File
Jan 6 2026, 9:04 AM
Restricted File
Jan 6 2026, 9:04 AM
F71447932: Screenshot From 2026-01-05 15-59-32.png
Jan 6 2026, 8:56 AM
F71447928: Screenshot From 2026-01-05 11-59-54.png
Jan 6 2026, 8:56 AM
F71447886: Screenshot From 2026-01-05 11-34-20.png
Jan 6 2026, 8:56 AM
F71145872: Screenshot from 2025-12-19 17-09-58.png
Dec 19 2025, 4:14 PM

Description

In our original epic, the scheduled interventions were expected to reduce wikibase change propagation in client wikis by 20-40%. As of 13th Nov, there are 4,176,855 fewer changes across all wikis (compared to 26th June), or -23.74%.

We had hoped for a greater impact, with an acceptance threshold of 30% fewer changes. However, it may be technically infeasible to achieve this. We should conduct another investigation, timeboxed to 1 week to identify potential low-hanging fruit for further reductions in change propagation.

One place to start could be by going through wikis with a high ratio of wikibase: local changes, and try to find notifications which are irrelevant and could be removed.

Event Timeline

Some avenues for investigation:

  • LinksUpdateJob is used for doing secondary data updates in other parts of core. We would need to adjust it slightly as I believe the current uses are all operating on core databases, rather than extensions, but one opportunity this presents is grouping multiple changes together.
    • e.g. we sometimes have the comment/ edit summary read "Wikidata Item updated" when multiple changes were included in one job, because the updates were very close together and on the same entity. This is a pretty unhelpful message as anything could have changed, but due to usage tracking we can at least assume that something relevant to the page was edited.
    • Wikidata entities often have several atomic edits in short time periods, as the changes are not saved altogether like a client page, but on each individual section, e.g. labels, each separate statement, and sitelinks.
    • The interval for batching these changes into one job could be lengthened. e.g. 10 minutes? (absolutely no basis for this magic number) therefore grouping multiple changes together. On pages which use modules like databox this could be particularly effective as almost every part of the entity is tracked for the page, reducing the changes propagated to one every 10 mins
  • We can investigate pages with the highest quantity of usage tracking to see their updates and HTML. Have a look and see if any usage tracking aspects don't seem relevant to the page parsing
  • We can investigate modules like databox or infobox and see if there's optimisations to reduce usage tracking. For example, if the module code touches any parts of the entity, regardless of whether it prints this info to the page, it will be registered as a usage. Maybe there is a way to simplify the code and therefore save on change propagation
  • Otherwise continue going through wikis with a high ratio of wikibase: local changes, and try to find notifications which are irrelevant and could be removed.

The dream scenario would be to have a way to know in advance of propagating the change to clients, whether the pages' html would be affected. However all the ways I can imagine to do this would place a lot of load on performance (e.g. parsing pages on every update, rather than when a user reads the page) and also extremely high dev work load. Maybe worth bearing in mind as an aspiration!

We can investigate modules like databox or infobox and see if there's optimisations to reduce usage tracking. For example, if the module code touches any parts of the entity, regardless of whether it prints this info to the page, it will be registered as a usage. Maybe there is a way to simplify the code and therefore save on change propagation

Seems like we already have a ticket for this one, added as the sub task: T403008

I'd suggest we turn off the "Language link added" changes.

Screenshot from 2025-12-19 17-09-58.png (1×1 px, 953 KB)

Arguably, e.g. in enwiki, all of these "Language link added" do not affect what is on the rendering of the enwiki page. The meaning of these is 'the wikidata item linked to this enwiki page is now also linked to this other language wiki page'. I also think this isn't relevant to the en wiki page editor because often this would be in a language they don't understand.

Here are some statistics from the recent changes table in enwiki as an example, on how many of these changes there are currently.
Language link added: 170341, 21.13% of total WB changes
Language link removed: 13440, 1.67% of total WB changes

Please see the ticket that relates to this that has been created - in that it was suggested to give the option to filter them out, whereas I'm suggesting just turning them off
https://phabricator.wikimedia.org/T224532

I am very wary of just removing the language links. They do have an impact on the article. They change which other language versions are linked to. Editors should at the very least have the option of tracking these changes.

I think we should at least think about an option to filter them out for users who wouldn't be interested with them. However, this doesn't improve anything on DB load issue.

A- Another option (that could also help with the DB) would be not tracking them by default but tracking them with an entity usage if specified in the article. Some projects (https://meta.wikimedia.org/wiki/WikiProject_Women's_Health) rely on this language information. It might be important for them to see these updates and they could include a tracking to their template (or we could provide a template for this purpose).


There are also some other logs worth to consider as they don't seem super helpful. However, we need some more time to calculate their impact, if we decide they are doable in principle. None of these below seemed to me like a low hanging fruits to be honest but sharing them as they might inspire one of us for other solutions.

B- Some claim updates by some person at the same minute. Could be merged to the same log? Even though they are definitely not the same change.

Screenshot From 2026-01-05 11-34-20.png (656×3 px, 794 KB)

Revision history:
{F71453348}
The closest example of this is when multiple qualifiers are added (they are saved at the same time so they are actually the same change in contrary to this scenario where the idea is merging different changes) they are displayed as:
{F71453394}

C- There is one more possible improvement, some language updates are displayed as remove and add.

Screenshot From 2026-01-05 11-59-54.png (232×3 px, 273 KB)

Even though there is a change type as which could store it as 1 log instead of 2 logs. What kind of log appears could be depending to Wikidata editor saving them together or separately.
Screenshot From 2026-01-05 15-59-32.png (230×3 px, 161 KB)

The downside of this, I don't think they are very common and we will gain much from them. Also implementing could be complex and challenging.


Summary:
A seems like the easiest to implement and highest impact, if we can come up with an idea to protect functionality as well. B is complex to implement but still is high impact. I am not sure if the benefit would worth to destruct one change/diff for one log structure of RC table or readability of these logs though. C seems both complex to implement and low impact.

Some other things that we discussed in our recent brainstorming (everything is arbitrary and nothing is decided yet) to have them written somewhere:

  • Having a time limit i.e. merging logs older than 20 days, removing language logs if they are older than 20 days.
  • How to merge them? Options would be in repo before change propagation, after they reach to client, after they are in DB or even more crazy on repo when they are created.
  • What to merge should be another thing to decide if we decide to merge. Some highlights were around logs at the same time, by same person, for the same operation, same statement edits etc. ----

As a separate topic, we all agreed filtering out option is nice to have for language added logs.

French wiki has some high numbers accessing all statements: https://phabricator.wikimedia.org/P87630
This could be related with Lua investigation T403008 or this could be something else.

French wiki has some high numbers accessing all statements: https://phabricator.wikimedia.org/P87630
This could be related with Lua investigation T403008 or this could be something else.

Note that I‌picked French because it had a pretty high number of wd db injection, you can pick many other wikis I've listed in P78713 as well.