Page MenuHomePhabricator

Mime type of files in search index is out of sync with mime type as registered in MW database
Closed, DeclinedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
These search results for audio list various items which are actually videos

What should have happened instead?:
They should not show up in the audio matches, but in the video matches

This is actually a follow up of T226311. It seems most of these files were previously mis registered and since corrected. They show up correctly in the the database and in api responses: https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=imageinfo&meta=&titles=File%3AHof%20vs.%20Corona%2020200319%20WebM%20ohne%20Ton%20002.webm&iiprop=timestamp%7Cuser%7Cmetadata%7Cmediatype%7Cmime

Suspicion is that the cirrus search search index either does its own mime detection or keeps a copy that for some reason was not updated after these files had a hard refresh.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks for letting us know. We have a regular process (the Sanetizer) that corrects errors such as this in our indexes. This should be resolved automatically when it next runs in a few months. If you are still having issues after that, please feel free to reopen.