Page MenuHomePhabricator

Edits to file description pages schedule recursive refreshLinks jobs unnecessarily
Closed, ResolvedPublic

Description

Edits to file description pages schedule recursive refreshLinks jobs unnecessarily. Right here: https://gerrit.wikimedia.org/g/mediawiki/core/+/8a3a9e2162ed771c696a4b30925e7ae50541d3c8/includes/deferred/LinksUpdate/LinksUpdate.php#317 (added in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/105830)

This is done "in case the title is or was a redirect", but the majority of edits do not change redirect targets. There is probably no other case where an edit to a file description page can affect the links tables of pages using a thumbnail of that file. It can't affect cascade protection either (since for files, it only protects the file and its description page, and doesn't cascade into transcluded pages). We should only schedule them when needed.

These jobs perform lots of unnecessary work. I've found this while investigating T343859 – many of the errors in that task on Commons can be traced to these unnecessary jobs. There are some huge, slow-to-parse pages, that include hundreds of thumbnails of actively edited files, e.g. https://commons.wikimedia.org/wiki/Commons:Quality_images_candidates (and its 30 translations), and these jobs are parsing them over and over again at all times.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 960757 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] DerivedPageDataUpdater: Only do RefreshLinks on file page redirect changes

https://gerrit.wikimedia.org/r/960757

While testing my changes, I re-discovered T238426, which is the opposite sort of bug for htmlCacheUpdate jobs.

Change 960757 merged by jenkins-bot:

[mediawiki/core@master] DerivedPageDataUpdater: Only do RefreshLinks on file page redirect changes

https://gerrit.wikimedia.org/r/960757

Before closing this, I'd like to wait for the deployment and see if it had a noticeable effect on anything measurable (like the number of jobs generated, or something like that).

By the way, I was very pleasantly surprised to find the wasRedirect() method already existing on DerivedPageDataUpdater. When I was starting this task, I thought I'd have to go on a big adventure just to ferry that piece of data from wherever it is generated to where I needed it, but it was already there! Thanks @daniel, amazing foresight :)

By the way, I was very pleasantly surprised to find the wasRedirect() method already existing on DerivedPageDataUpdater. When I was starting this task, I thought I'd have to go on a big adventure just to ferry that piece of data from wherever it is generated to where I needed it, but it was already there! Thanks @daniel, amazing foresight :)

I love it when a plan comes together ;)

Before closing this, I'd like to wait for the deployment and see if it had a noticeable effect on anything measurable (like the number of jobs generated, or something like that).

I guess this is the chart to look at: https://grafana.wikimedia.org/d/LSeAShkGz/jobqueue

image.png (1×3 px, 489 KB)

Sadly it can't be filtered by wiki, and only Commons suffers from this problem, so probably there won't even be a blip.

Should have been deployed on 5th – no obvious impact:

image.png (695×1 px, 90 KB)