Page MenuHomePhabricator

File uploads do not appear to get indexed properly
Closed, ResolvedPublic5 Estimated Story Points

Description

Reported here: https://www.mediawiki.org/wiki/Topic:Xmhvuwy6efcpf1lc

Example: https://commons.wikimedia.org/w/index.php?title=File:Metal036_8K_NormalGL.png&action=history shows a revision at 785892294 but https://commons.wikimedia.org/wiki/File:Metal036_8K_NormalGL.png?action=cirrusDump reports 774815091.
Performing a null edit on the file does seem to propagate the right information.

AC:

  • the search index is properly updated after a new version of a file is uploaded

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel set the point value for this task to 5.Aug 7 2023, 3:25 PM

I've been poking at this and so far I'm unable to reproduce locally, but it's clearly happening in prod. My current best guess is a race condition, I'll add a patch that forces READ_LATEST when loading metadata on a recently edited file, which is sensible but I have no firm proof that this is the problem we are facing or that it will fix the issue. It makes sense that it could, but I'm having difficulty getting proof.

After some more investigation, and some help from Timo, we think this is caused by T344285. It looks like in T278209 when we made cirrus run it's LinksUpdate jobs from not only the LinksUpdateComplete hook but also the UploadComplete hook we were papering over a deeper issue within mediawiki. Essentially it looks like mediawiki core is failing to run it's own LinksUpdate after a null edit, which means our LinksUpdateComplete is not fired after a null edit. The proper fix, which is needed not only for Cirrus correctness, will be to resolve the issue in core.

For the cirrus side one thing we can do to reduce the frequency of the error is to use lazyPush instead of push from UploadComplete (and probably all the other hooks, after review). By using push we are sending a job into the queue in the middle of the upload web request. This increases the chances that our job runs before the replicas update, or even potentially before the web request commit's it's transaction.

Change 949562 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] ChangeListener: Always use lazyPush

https://gerrit.wikimedia.org/r/949562

Attached patch should partially mitigate the issue, but it's not a strong guarantee of correctness. To get that we will need to get the subtask prioritized.

Change 949562 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] ChangeListener: Always use lazyPush

https://gerrit.wikimedia.org/r/949562