Page MenuHomePhabricator

WAV files being uploaded with wrong MIME type
Closed, ResolvedPublic

Description

See https://commons.wikimedia.org/wiki/Commons:Village_pump/Technical#c-This,_that_and_the_other-20260108120700-WAV_files_being_uploaded_with_wrong_MIME_type:

All WAV files uploaded since 21:30, 7 January 2026 have been stored with the incorrect MIME type unknown/wav instead of the correct audio/wav, meaning that the file is not recognised as audio and the player UI is not shown. The last non-buggy file to be uploaded was File:LL-Q1860_(eng)-Knabrupt-Appalachian.wav and the first buggy file was File: LL-Q1860 (eng)-Knabrupt-caramel.wav.
The Lingua Libre tool is not to blame: File:Myhouse.wad Spoken article.wav is a non-LL file that is also buggy.

The timing of the bug appears to coincide with the deployment train.

Event Timeline

Aklapper triaged this task as Unbreak Now! priority.Jan 8 2026, 12:33 PM
Aklapper added a subscriber: Umherirrender.

I wonder if rMW5542962f2f0cdf1763ef352ed8ac94366e91d93a could have had any influence here. Maybe @Umherirrender has an idea?

(Setting UBN priority because marked as train blocker.)

I wonder if rMW5542962f2f0cdf1763ef352ed8ac94366e91d93a could have had any influence here.

git bisecting locally, that commit (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1222605) does appear to be the cause here FWICS

Change #1224675 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/core@master] MimeAnalyzer: Fix syntax error in const array

https://gerrit.wikimedia.org/r/1224675

A_smart_kitten changed the task status from Open to In Progress.Jan 8 2026, 1:47 PM
A_smart_kitten claimed this task.

Change #1224679 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/core@wmf/1.46.0-wmf.10] MimeAnalyzer: Fix syntax error in MAJOR_MIME_TYPES array

https://gerrit.wikimedia.org/r/1224679

Change #1224675 merged by jenkins-bot:

[mediawiki/core@master] MimeAnalyzer: Fix syntax error in MAJOR_MIME_TYPES array

https://gerrit.wikimedia.org/r/1224675

mimes are also uploaded to swift as the content-type header, so those files might have to be corrected. Not sure if the maintenance script for metadata handles that situation.

Change #1224679 merged by jenkins-bot:

[mediawiki/core@wmf/1.46.0-wmf.10] MimeAnalyzer: Fix syntax error in MAJOR_MIME_TYPES array

https://gerrit.wikimedia.org/r/1224679

Mentioned in SAL (#wikimedia-operations) [2026-01-08T19:53:12Z] <zabe@deploy2002> Started scap sync-world: Backport for [[gerrit:1224679|MimeAnalyzer: Fix syntax error in MAJOR_MIME_TYPES array (T414077)]]

Mentioned in SAL (#wikimedia-operations) [2026-01-08T19:55:07Z] <zabe@deploy2002> zabe: Backport for [[gerrit:1224679|MimeAnalyzer: Fix syntax error in MAJOR_MIME_TYPES array (T414077)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-01-08T20:08:43Z] <zabe@deploy2002> Finished scap sync-world: Backport for [[gerrit:1224679|MimeAnalyzer: Fix syntax error in MAJOR_MIME_TYPES array (T414077)]] (duration: 15m 31s)

Need a script run to fix the mime type on the files (mostly for commonswiki, but could be on other wikis as well)

php maintenance\run.php refreshImageMetadata --mediatype AUDIO --mime unknown/wav --force

Current list on wiki: https://commons.wikimedia.org/wiki/Special:MIMESearch/unknown/wav

Mentioned in SAL (#wikimedia-operations) [2026-01-08T20:26:32Z] <zabe> zabe@deploy2002:~$ mwscript refreshImageMetadata.php commonswiki --mediatype AUDIO --mime unknown/wav --force # T414077

Zabe edited projects, added DBA; removed Patch-For-Review.

Also ran the script on all other wikis.

If possible, please run also with --oldimage to fix up some old versions (the scripts works on one table per run). Thanks

Mentioned in SAL (#wikimedia-operations) [2026-01-08T20:33:10Z] <zabe> zabe@deploy2002:~$ foreachwiki refreshImageMetadata.php --mediatype AUDIO --mime unknown/wav --force # T414077

Mentioned in SAL (#wikimedia-operations) [2026-01-08T20:35:02Z] <zabe> zabe@deploy2002:~$ foreachwiki refreshImageMetadata.php --mediatype AUDIO --mime unknown/wav --force --oldimage # T414077

It seems we missed unknown/flac and unknown/mpeg are also possible suddenly.
https://commons.wikimedia.org/wiki/Special:MediaStatistics

Theres more fallout. It seems that the files with unknown issue, caused lots of files to get struck in the videoscalers or something and performance grinded to a halt (although this can be a coincidence). I purged a couple hundred from the queue and that seems to have improved a few things

The queue was 10000 media files, with some 1500 in progress (which i managed to reduce to 460 [now 17], by resetting transcodes of wav files and putting them back on the job queue)

There's also lots of entries of 720p and 480p files in the transcode tables now, because ladsgroup removed those, but i guess the code doesn't handle that situation that well. @bvibber might know how to fix that.

I don't think we have a grafana dashboard for the performance of the videoscalers, so i cant really tell what the actual load is, but someone should check, because i don't think it's managing to keep up.

There's also lots of entries of 720p and 480p files in the transcode tables now, because ladsgroup removed those, but i guess the code doesn't handle that situation that well. @bvibber might know how to fix that

Should we just delete the files or the rows from transcode. I think I can do both but not sure in which order, probably the rows?

There's also lots of entries of 720p and 480p files in the transcode tables now, because ladsgroup removed those, but i guess the code doesn't handle that situation that well. @bvibber might know how to fix that

Should we just delete the files or the rows from transcode. I think I can do both but not sure in which order, probably the rows?

I don't know what @bvibber did, but i'd delete the files first, because without the table entries you might not have an easy way to iterate over them to find them. In the end it probably doesn't matter much, they should not be in use and thus it only takes up space.

transcode_keys that are in active use: ('1080p.vp9.webm', '480p.vp9.webm', '240p.vp9.webm', '360p.webm', '144p.mjpeg.mov', 'ogg', 'mp3' ) a total of 7313453 entries

transcode_keys that are NOT in use: ( '2160p.vp9.webm', '1440p.vp9.webm', '720p.vp9.webm', '360p.vp9.webm', '180p.vp9.webm', '480p.webm', '240p.webm', '160p.webm', '720p.video.vp9.mp4', '480p.video.vp9.mp4', '360p.video.vp9.mp4', '120p.vp9.webm', '144p.video.mjpeg.mov', '1080p.vp9.mp4', '1080p.video.vp9.mp4', '480p.vp9.mp4', '240p.video.vp9.mp4', 'stereo.audio.mp3', 'stereo.audio.opus.mp4' ) a total of 657327 entries