Page MenuHomePhabricator

Page was not purged at edit
Closed, DuplicatePublicBUG REPORT

Description

@Kusma pointed me to a thread on the English Wikipedia's technical village pump regarding a page (User:Bearian/Deletions, admin-only link) which appears to have taken 3 years to appear in a category (a deletion requestion category), initially thinking they couldn't see a reason for it due to an suppressed revision etc.

There was no suppressed revision, but checking logstash for the page title I found 5e325f4f-324a-406d-9823-1e560a619dfd, which shows Bot1058 requesting a RecursiveLinkPurge for the page, probably related to this edit going by the timestamps.

The time between the purge (Mar 28, 2022 @ 10:01), and the deletion (Mar 28, 2022 @ 11:54), suggest that this purge caused the page to appear in the category.

For some reason, a page tagged with a template (db-user) which should have added it to a category (Category:Candidates for speedy deletion) in July 2018 (diff, admin-only link) did not have its cache purged at edit, nor since, until today.

Event Timeline

2018 is too long ago to investigate today. It is possible that there were some jobqueue updates at the time were lost and thus caused the edit or its propagation to be unable to save it in the database.

The passing of time between then and today is not indicative of a malfunction as there is no code in place that would run refreshLinks on any particular page merely for time having passed.

Upon viewing of a page, we do re-parse pages even if there were no edits to the page or any of its template links for 30 days. This ensures visitors to an article see up-to-date content automatically even in the case of temporary problems. However we do not run refreshLInks or otherwise perform database writes during page views. refreshLinks is only run on edits, or when the page is (indirectly) purged due to a relevant event (e.g. template edit, wikidata, redlink creation, bluelink deletion, etc.)

If there is no such trigger, and if the last refreshLinks attempt failed in a non-recoverable way due to a technical problem, then it remains that way until a manaul purge or null-edit, which anyone can do.

Note that, while refreshLinks can fail for any number of reasons, we have automatic re-tries of such failures so a non-recoverable failure is fortunately very rare (short of a prolonged outage, or the JobQueue failing to store the event in the first place, e.g. due to T249745.). There is a task from 2016 with some ideas about periodically running refreshLinks on all content. This would likely require some infrastructure consideration. Details at T135964: Force pages to be fully re-parsed occasionally.

Unless a pattern of more recent examples is known, or a way to reproduce this, I suggest we decline this task, or merge it into T135964.

Krinkle triaged this task as Medium priority.Mar 28 2022, 4:17 PM
Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

Sounds fair enough @Krinkle!

I do with we could react to comments instead of just the task — that thumbs up was really just meant for the comment.. 😅