Page MenuHomePhabricator

Specific revisions of multiple files missing from Swift - 404 Not Found returned
Open, MediumPublic

Description

The previous revision of https://commons.wikimedia.org/wiki/File:Clinton_power_station_1.jpg can not be accessed or found.

https://upload.wikimedia.org/wikipedia/commons/archive/7/7d/20080122163443%21Clinton_power_station_1.jpg:

404 Not Found

The resource could not be found.

File not found: /v1/AUTH_mw/wikipedia-commons-local-public.7d/archive/7/7d/20080122163443%21Clinton_power_station_1.jpg

See also:

  • T41615 (404 error for all revisions of some files)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Aklapper renamed this task from File not found (404 - The resource could not be found) to Specific revision of file on Commons triggers 404 (not found).Jan 20 2016, 1:36 PM
Poyekhali renamed this task from Specific revision of file on Commons triggers 404 (not found) to Specific revision of a file on Commons triggers 404 (not found).Apr 13 2016, 4:00 AM
Poyekhali triaged this task as High priority.
Poyekhali updated the task description. (Show Details)
Poyekhali subscribed.

Same with https://upload.wikimedia.org/wikipedia/en/archive/9/9a/20080218114104%21Internet_Download_Accelerator.png vs https://en.wikipedia.org/wiki/File:Internet_Download_Accelerator.png

404 Not Found

The resource could not be found.

File not found: /v1/AUTH_mw/wikipedia-en-local-public.9a/archive/9/9a/20060321114104%21Internet_Download_Accelerator.png
aaron removed aaron as the assignee of this task.Aug 2 2016, 12:13 AM
aaron added a project: Multimedia.

And first revision of https://en.wikipedia.org/wiki/File:Parisian.PNG is also missing

https://upload.wikimedia.org/wikipedia/en/archive/6/68/20070717000058%21Parisian.PNG;

File not found: /v1/AUTH_mw/wikipedia-en-local-public.68/archive/6/68/20070717000058%21Parisian.PNG

Josve05a renamed this task from Specific revision of a file on Commons triggers 404 (not found) to Specific revision of a file triggers 404 (not found).Aug 11 2016, 7:03 PM

First revision of https://en.wikipedia.org/wiki/File:Greater_London_Authority_logo.png as well

https://upload.wikimedia.org/wikipedia/en/archive/7/78/20080828203602%21Greater_London_Authority_logo.png

File not found: /v1/AUTH_mw/wikipedia-en-local-public.78/archive/7/78/20080828203602%21Greater_London_Authority_logo.png

Josve05a renamed this task from Specific revision of a file triggers 404 (not found) to Specific revision of a files triggers 404 (not found).Aug 11 2016, 8:45 PM
Josve05a renamed this task from Specific revision of a files triggers 404 (not found) to Specific revisions of multiple files triggers 404 (not found).

Image https://commons.wikimedia.org/wiki/File:Cactaceae_(1082183341).jpg missing (there's only one revisions). File was uploaded 01:25, 7 June 2016

The 120px thumbnail is there, though.

Platonides renamed this task from Specific revisions of multiple files triggers 404 (not found) to Specific revisions of multiple files missing from Swift - 404 Not Found returned.Sep 16 2016, 4:46 PM

This problems seems to increase. There was one thread on the the English VP today, and two on the German Forum.

MarkTraceur lowered the priority of this task from High to Medium.Dec 2 2016, 9:57 PM
MarkTraceur moved this task from Untriaged to Triaged on the Multimedia board.
MarkTraceur subscribed.

I don't see this as "high" priority, but I'm willing to be convinced otherwise. Since these are not current versions of files, it doesn't seem to affect most day-to-day work.

It doesn't seem like these files have anything in common. There are PNGs and JPGs, different file sizes, different resolutions, different upload dates...and different re-upload dates, too. There are even files from at least two different wikis.

Can anyone offer any insight into the similarities between these files?

First revision of https://commons.wikimedia.org/wiki/File:J.J._Burns_NSRW1-0009.jpg as well

https://upload.wikimedia.org/wikipedia/commons/archive/b/b8/20071019085735%21J.J._Burns_NSRW1-0009.jpg

File not found: /v1/AUTH_mw/wikipedia-commons-local-public.b8/archive/b/b8/20071019085735%21J.J._Burns_NSRW1-0009.jpg

To be honest, I find priority "normal" to be worrying. If potentially unrecoverable data loss is not a high priority then what is?

@Srittau: Feel free to elaborate why this task is more urgent than other tasks on the project workboards. "Data loss" in general sounds like a severity categorization. Priority rather depends on how often and recent things happen I'd say...

Aklapper added a subscriber: Jojr149.

@Jojr149: Please do not remove user projects from tasks.

AlexisJazz subscribed.

Copied from my duplicate task:

https://commons.wikimedia.org/wiki/File:Accuracy_International_Arctic_Warfare_-_Psg_90.jpg

First revision is from 23:38, 1 December 2006 but the file does not exist. Current revision from 23:33, 3 November 2008 is fine though. (fine is a big word, the quality is shitty, but Wikimedia can't help that)

I tried 00 to ff for https://upload.wikimedia.org/wikipedia/commons/2/2e/Burbuja_%281496994920%29.jpg (guessing it might be misplaced) but it's just not there. https://commons.wikimedia.org/w/api.php?action=query&list=allimages&aisha1=5c898098c147202d918a63e2b3731b3eb9779051 does work, so the upload was fine.

Here is a list things, may not all be related to this issue.

https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Mohit_Manke.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Kleinbardorf_Judenh%C3%BCgel_032.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:User_record_for_the_mangle_in_Zaschendorf.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:TehsilPirmahalz_8.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Sysadmin_emc_unisphere.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Voortrekker_Monument,_Pretoria.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:US_Navy_110607-N-KB666-173_Multination_divers_from_the_Caribbean_conduct_a_hull_search_of_a_suspicious_vessel_during_a_training_exercise_with_Mobile.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Terezinha_Guilhermina_-_2013_IPC_Athletics_World_Championships-2.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Lavoir_08313.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Louis_Payette_Pierre_tombale.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:M%C4%9Bsto_ve_h%C5%99e.png
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Kit_left_arm_palmeiras13c.png
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Trippa_e_patate_in_Calabria.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Psophocarpus_tetragonolobus_in_Hainan_-_02.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Saint_Joseph_and_Jesus_Christ_as_a_baby,_Bas%C3%ADlica_da_Estrela,_Lisbon.jpg

https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:L%27%C3%A9glise_de_Sadirac.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Anne_Paugam,_directrice_g%C3%A9n%C3%A9rale_de_l%27AFD.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Image:Gauteng_eastrand_map.jpg

https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Burghausen05.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Z%C3%A1smuky,_Pol%C3%A1kova.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:HC-081_(Hex-CruZ)_2014-04-17_22-17.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Rio_limon_venezuela31.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Schladming_wm2013_1881_13-02-07.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:2010-08-04_(27)_Purpur-Fruchtwanze,_Carpocoris_purpureipennis.test.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Charlie_und_die_Schokoladenfabrik_(Film)_Logo.png
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:B02_Ecuador_012_small_village_at_Rio_Misahuall%C3%AD,_February_1985.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Adam-Stegerwald-Haus_in_K%C3%B6nigswinter.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Rio_limon_venezuela27.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Cmd_prompt_Befehle.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Sphagnum_girgensohnii_(a,_150137-481740)_7191.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Olof_Hellstr%C3%B6m_Kluven.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Capuchin_monastery_in_Lubart%C3%B3w,_Poland.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:0_Domus_Augustana_-_Mt_Palatin_-_Rome_(1).JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:A_picture_of_Rachel_Michelle_King-_2014-04-17_16-20.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Bollywood_Poster_Shop,_Chor_Bazaar.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Klagenfurt_Heuplatz_3_14072009_55.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:HSchBtl_108_(V1).jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:CDs-P-10256_Alfio_Giuffrida-AG_Sinnwerke.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Javanese_illumination_of_an_unknown_text.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Nehemiya,_au_stand_de_Wikipedia_au_forum_mondial_de_la_langue_francaise.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Estas_son_las_calles_de_barrio_Mar%C3%ADa_Auxiliadora_Concepcion_del_Uruguay_Entre_Rios.png
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Picking_strawberries_Mr_Pyiiuui%27s_orchard,_Kelow-na,_British_Columbia_(HS85-10-21212)_original.tif
https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2016/11 (https://commons.wikimedia.org/wiki/File:Denkmal2016-St.Petersburg-Drache_02.jpg, https://commons.wikimedia.org/wiki/File:Leko_H%C3%BCbner_2000_Dortmund.jpg, https://commons.wikimedia.org/wiki/File:Bamboo_bridge_over_Nam_Khan_LP.jpg)
https://commons.wikimedia.org/wiki/Commons:Forum/Archiv/2016/January (https://commons.wikimedia.org/wiki/File:Brotjacklriegel_FMT.jpg) (T125140 ?)

https://commons.wikimedia.org/wiki/Category:Files_with_404_errors
From this category: https://commons.wikimedia.org/wiki/File:Cimbriani_shield_pattern.svg (png thumbnail works, cached probably, svg missing)

Glitch?
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Long_Beach_Comic_%26_Horror_Con_2011_-_Aquamen_(6301708270).jpg

Possibly different case, "A handful of them got corrupted on upload." (but should that have resulted in 404?)
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Australian_Census_2011_demographic_map_-_New_South_Wales_by_POA_-_BCP_field_0147_Visitor_from_Different_SA2_in_Victoria_Age_0_14_years.svg

Unclear:
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Alfred_Bunge.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Image:Grossesschauspielhaus.jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:IM0G040002.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:RusEmb_-_Phnom_Penh.JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:0_Combretum_fruticosum_1,_Bs_As_(E_Haene).jpg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Hulluch_-_Cit%C3%A9_n%C2%B0_13_de_Lens_(02).JPG
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Homesteadcommercial.ogg
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/File:Cartogram_of_countries_by_number_of_Nobel_Laureates.svg
https://commons.wikimedia.org/wiki/Commons:Upload_Wizard_feedback/Archive/2014/02 (https://commons.wikimedia.org/wiki/File:WM.Sollnerstr.41.jpg)
https://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard/Archive_55 (https://commons.wikimedia.org/wiki/File:Temde.JPG)

Original of https://commons.wikimedia.org/wiki/File:ATX-Netzteil.jpg is missing too: File not found: /v1/AUTH_mw/wikipedia-commons-local-public.e9/e/e9/ATX-Netzteil.jpg

Original of https://commons.wikimedia.org/wiki/File:ATX-Netzteil.jpg is missing too: File not found: /v1/AUTH_mw/wikipedia-commons-local-public.e9/e/e9/ATX-Netzteil.jpg

I've reuploaded it to https://commons.wikimedia.org/wiki/File:HEC_350W_ATX_power_supply.jpg because it was in use. Commons now reports it as duplicate. It remembers the hash, but the original file is gone.

The most suspicious development in this story is that these are Wikimedia users—some not even sysops on any wiki where a revision is missing—try to trace causes of the bug, whereas all people having shell access to Wikimedia servers are silent. Are files really missing from directories or only unreadable for some reason (permissions, file-system damage, I/O error… )? For files which are missing, how recently did they exist in the past? Which procedures (such as moving files from one storage to another) could potentially cause the data loss?

@Incnis_Mrsi at T198177 I also found some missing revisions. But the things that take me hours could be done in seconds with shell access.

But the things that take me hours could be done in seconds with shell access.

How do you think that would happen? If you have specific steps in mind, maybe we could expose those to power users. I can't think of any investigation that would be much more complicated without shell access though.

But the things that take me hours could be done in seconds with shell access.

How do you think that would happen? If you have specific steps in mind, maybe we could expose those to power users. I can't think of any investigation that would be much more complicated without shell access though.

For T198177 I made hundreds of requests for filenames like https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085337%21WikipediaTomeRaider3.png using an educated guess for the approximate timestamp, then bruteforcing it.

Just typing "locate Clinton_power_station_1.jpg" and "locate ATX-Netzteil.jpg" would be way faster. (my revision finder didn't work for those anyway, but the files may still exist elsewhere.. I just don't know where.

Swift has filename prefix search but not substring search AFAIK.

Image https://commons.wikimedia.org/wiki/File:Cactaceae_(1082183341).jpg missing (there's only one revisions). File was uploaded 01:25, 7 June 2016

The 120px thumbnail is there, though.

The revision appears to be showing now?

The file was reuploaded by Denniss on September 2016, but the original file is still missing, TheSandDoctor.

https://en.wikipedia.org/wiki/File:Wilderness_Society_in_front_of_Kindness_House.jpg

Only one revision and it's gone. (I'll reupload it though)

"Upload error

The file "mwstore://local-multiwrite/local-public/b/b2/Wilderness_Society_in_front_of_Kindness_House.jpg" is in an inconsistent state within the internal storage backends"

I guess I'm not reuploading anything!

Wtf: Internet archive archived the file on April 2, 2021. Yesterday! And now it's gone? How?

https://en.wikipedia.org/wiki/File:Logo_of_the_International_Practical_Shooting_Confederation.png
One revision, uploaded 17:17, 23 August 2021 aaaand it's gone.

https://upload.wikimedia.org/wikipedia/en/d/de/Logo_of_the_International_Practical_Shooting_Confederation.png stated: File not found: /v1/AUTH_mw/wikipedia-en-local-public.de/d/de/Logo_of_the_International_Practical_Shooting_Confederation.png

Now this just links to my overwrite. There is no thumbnail for the initial revision, no link. It is reported as 288 × 346 (54 KB) but it's nowhere to be seen.

FlightTime moved the file on 06:35, 27 August 2021‎. Before that it was working.

@MarkTraceur @CBogen @Tgr can someone investigate https://upload.wikimedia.org/wikipedia/en/d/de/Logo_of_the_International_Practical_Shooting_Confederation.png ? This one's interesting because it's both recent and the only revision (before my overwrite) went missing.

@MarkTraceur @Tgr Could someone with shell access at least look in some of the relevant archive directories and see if anything is in there. Maybe the filenames have been changed or corrupted or it's an encoding issue.

https://upload.wikimedia.org/wikipedia/en/archive/2/2f/20080118004233%21Roosevelt_High_School_%28St._Louis%29.jpg
/v1/AUTH_mw/wikipedia-en-local-public.2f/archive/2/2f/20080118004233%21Roosevelt_High_School_%28St._Louis%29.jpg

https://upload.wikimedia.org/wikipedia/commons/archive/0/06/20070422000344%21President_Lula_and_Marisa.jpg
/v1/AUTH_mw/wikipedia-commons-local-public.06/archive/0/06/20070422000344%21President_Lula_and_Marisa.jpg

These aren't directories, they are Swift containers. Our documentation on how to do common filesystem-like operations on Swift is not great. In theory it seems like calling FileBackend::getFileList() should work.

tgr@mwmaint1002:~$ mwscript shell.php commonswiki
Psy Shell v0.11.5 (PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf5 — cli) by Justin Hileman

>>> $srv = MediaWiki\MediaWikiServices::getInstance()
=> MediaWiki\MediaWikiServices {#89}

>>> $fb = $srv->getFileBackendGroup()->get( 'local-swift-eqiad' )
=> SwiftFileBackend {#3685}

>>> $fb->getFileList( [ 'dir' => 'mwstore://local-swift-eqiad/wikipedia-commons-local-public.06/archive/0/06' ] );
=> null

I guess that wasn't the right way to do it. (Wouldn't it be super nice if we used those mwstore:// urls consistently in error messages?)

mwstore://local-swift-eqiad/local-public/archive/0/06 at least returns an iterator, but it's empty. One could probably trace the logic in thumb.php to find the right set of commands to turn a HTTP URL into an mwstore:// URL, but I don't have the time right now.

@MarkTraceur @CBogen @Tgr can someone investigate https://upload.wikimedia.org/wikipedia/en/d/de/Logo_of_the_International_Practical_Shooting_Confederation.png ? This one's interesting because it's both recent and the only revision (before my overwrite) went missing.

Seemingly I didn't link the file page? Here: https://en.wikipedia.org/wiki/File:Logo_of_the_International_Practical_Shooting_Confederation.png

The initial revision by Pbrks from 17:17, 23 August 2021 still has a revert link, so it's not deleted, but there's no thumbnail. The revert link is https://en.wikipedia.org/w/index.php?title=File:Logo_of_the_International_Practical_Shooting_Confederation.png&action=revert&oldimage= which strikes me as odd.

Listings look empty:

>>> $be = MediaWiki\MediaWikiServices::getInstance()->getFileBackendGroup()->get( 'local-multiwrite' );

>>> $iter = $be->getFileList( [ 'dir' => 'mwstore://local-multiwrite/local-public/archive/2/2f' ] );
=> SwiftFileBackendFileList {#6191}

>>> foreach ( $iter as $f ) { if ( str_starts_with( $f, '20080118004233' ) ) echo "$f\n"; }
>>> $iter = $be->getFileList( [ 'dir' => 'mwstore://local-multiwrite/local-public/archive/0/06' ] );
=> SwiftFileBackendFileList {#6278}

>>> foreach ( $iter as $f ) { if ( str_starts_with( $f, '20070422000344' ) ) echo "$f\n"; }

If there were entries, they'd show like:

>>> foreach ( $iter as $f ) { if ( str_starts_with( $f, '200704' ) ) echo "$f\n"; }
20070425123203!Flag_map_of_Tunisia.svg