Page MenuHomePhabricator

Deleting a file from a wiki AFTER moving to commons: pages on the original wiki linking to the file may have broken links until the page is purged
Closed, ResolvedPublic

Description

Deleting a file from a wiki AFTER moving to commons: pages on the wiki linking to the file can have broken links until the page is purged

For example, after moving FiveThirtyEight_Logo.svg from enwiki to commons, a page with a link to the file had a broken image link

pasted_file_1.png (177×345 px, 14 KB)

... until the page was purged and then the image showed fine

pasted_file_2.png (172×330 px, 18 KB)

The above can be reproduced on a local vagrant instance of mediawiki as follows

  1. Enable and provision 'commons' role
  2. Download a file from commons, upload it to local
  3. Disable 'commons' role
  4. Link to the file via [[File:filename|thumb]] on Page_X
  5. Verify that the File page has links to commons and to Page_X
  6. Delete the File page
  7. Reload Page_X
  8. Observe that it has a broken link to the file, unless the page is purged (and then it'll link to the file on commons)

The reason for the broken link is that pages containing backlinks are not being purged automatically when a file is deleted, and the reason in turn for that is

  1. when the file is uploaded in the first place its set of backlinks (obvs an empty set) are cached in memcache
  2. when the file is deleted it reads the backlinks that need to be purged from memcache, and that set is empty, so Page_X does not get purged

So any backlink missing from the backlink cache at the time of deletion of the file will NOT be purged when the file is deleted, which may be the root cause of this bug on production

The obvious solution would be to clear the backlink cache on file deletion, but BacklinkCache has been implemented in such a way as to make this difficult - code that calls BacklinkCache methods passes string data that is used to construct cache keys, and so it's impossible to clear the cache effectively without knowledge of the code that made the calls to populate the cache.

The problem could be mitigated by lowering the ttl on the memcache storage, but a better solution would be to re-engineer BacklinkCache so that it's easy to cleanly clear the cache for a title

Event Timeline

Cparle triaged this task as Medium priority.Dec 21 2017, 4:48 PM
Cparle created this task.

As this seems to be an issue with the backlink cache I'm removing the 'multimedia' tag

The obvious solution would be to clear the backlink cache on file deletion, but BacklinkCache has been implemented in such a way as to make this difficult - code that calls BacklinkCache methods passes string data that is used to construct cache keys, and so it's impossible to clear the cache effectively without knowledge of the code that made the calls to populate the cache.

See WANObjectCache::touchCheckKey().

Thanks @Tgr ... I think I can see how to implement it, might be back to you with questions

Change 403901 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/core@master] Clear the backlink cache on file delete

https://gerrit.wikimedia.org/r/403901

Change 403901 merged by jenkins-bot:
[mediawiki/core@master] Clear the backlink cache on file delete

https://gerrit.wikimedia.org/r/403901

@Magog_the_Ogre I'm hoping this solves at least part of the problem you were having with T162532 - could you check?

Yes, I've noticed for a week or two this has been fixed. Many thanks.

I have closed the other defect.