Page MenuHomePhabricator

404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites
Closed, ResolvedPublic

Description

Right now https://commons.wikimedia.org/wiki/File:Fawiki500k_celebration_by_Behdad_Abedi_(180).jpg doesn't show anything and when you click on "Original file" it gives 404 "File not found: /v1/AUTH_mw/wikipedia-commons-local-public.e6/e/e6/Fawiki500k_celebration_by_Behdad_Abedi_%28180%29.jpg"

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 30 2017, 7:13 PM
saper renamed this task from Images with missing source to 404 error while accessing some images files e.g. djvu and jpg.Mar 30 2017, 10:46 PM
saper triaged this task as Unbreak Now! priority.
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptMar 30 2017, 10:46 PM
EBernhardson added a subscriber: EBernhardson.EditedMar 30 2017, 11:55 PM

Random debugging:

hphpd> $f = wfFindFile(Title::newFromText('File:Fawiki500k_celebration_by_Behdad_Abedi_(180).jpg'))
$f = wfFindFile(Title::newFromText('File:Fawiki500k_celebration_by_Behdad_Abedi_(180).jpg'))
hphpd> =$f->exists();
=$f->exists();
true
hphpd> =$f->repo->backend->fileExists( ['src' => $f->repo->resolveToStoragePath( $f->getVirtualUrl() ) ]);
=$f->repo->backend->fileExists( ['src' => $f->repo->resolveToStoragePath( $f->getVirtualUrl() ) ]);
false

Vs a known good file:

$f = wfFindFile(Title::newFromText('File:Voltairine_de_Cleyre_(Age_35).jpg'))
hphpd> =$f->exists();
=$f->exists();
true
hphpd> =$f->repo->backend->fileExists( ['src' => $f->repo->resolveToStoragePath( $f->getVirtualUrl() ) ]);
=$f->repo->backend->fileExists( ['src' => $f->repo->resolveToStoragePath( $f->getVirtualUrl() ) ]);
true

So at a minimum, swift certainly thinks the file doesn't exist, while the mediawiki database thinks it does.

@Ladsgroup not related to thumbor no since thumbor isn't in production yet (though "thumbor in production" is a Q4 goal now)

The timeline lines up with the current swift in eqiad expansion going on in T160640, we have experienced files disappearing in the past (e.g. in T111838) though related to files moves.

At the time of T111838 a script was published in https://gerrit.wikimedia.org/r/#/c/249494/ to find files present in filebackend but not in mediawiki, I believe we'd need a similar script to do an audit the other way, find all files in mediawiki missing in swift

At the time of T111838 a script was published in https://gerrit.wikimedia.org/r/#/c/249494/ to find files present in filebackend but not in mediawiki, I believe we'd need a similar script to do an audit the other way, find all files in mediawiki missing in swift

T153565: MediaWiki file operations are fragile, causing occasional data loss, you might find a lot :(

At the time of T111838 a script was published in https://gerrit.wikimedia.org/r/#/c/249494/ to find files present in filebackend but not in mediawiki, I believe we'd need a similar script to do an audit the other way, find all files in mediawiki missing in swift

T153565: MediaWiki file operations are fragile, causing occasional data loss, you might find a lot :(

Indeed it is one of the long-standing bugs in media-storage and likely the interaction between mediawiki and swift :(

I can successfully see https://commons.wikimedia.org/wiki/File:Fawiki500k_celebration_by_Behdad_Abedi_(180).jpg now, ditto for https://commons.wikimedia.org/wiki/File:PL_J%C3%B3zef_Ignacy_Kraszewski-Poezye_tom_2.djvu but not https://commons.wikimedia.org/wiki/File:Wykolejony_%28Gruszecki%29_24.jpg yet so it might be a sign of swift converging.

I'm assuming the first two were not re-uploaded again (not seeing any new uploads in the File: history) ?
I'm still looking into why the files 404'd from swift's point of view, as a rebalance/expansion shouldn't be causing files to disappear of course

fgiunchedi lowered the priority of this task from Unbreak Now! to High.Mar 31 2017, 2:08 PM

Since some files linked here seem to 200 now (instead of 404) I'm lowering to "high", I'll keep looking at what might be causing this during a rebalance

Wieralee added a comment.EditedMar 31 2017, 2:39 PM

The book https://commons.wikimedia.org/wiki/File:PL_J%C3%B3zef_Ignacy_Kraszewski-Poezye_tom_2.djvu is transcribed at Polish Wikisource: we have many pages without scans now...
The same with https://pl.wikisource.org/wiki/Indeks:Andrzej_Kijowski_-_Listopadowy_wiecz%C3%B3r.djvu

Can you upload these files from archives?

Aklapper renamed this task from 404 error while accessing some images files e.g. djvu and jpg to 404 error while accessing some images files (e.g. djvu, jpg, png) on Commons and other sites.Mar 31 2017, 3:21 PM

Hi, I'll add another one: https://commons.wikimedia.org/wiki/File:MyanmarChin.png

https://upload.wikimedia.org/wikipedia/commons/9/93/MyanmarChin.png

The text 404 is File not found: /v1/AUTH_mw/wikipedia-commons-local-public.93/9/93/MyanmarChin.png (maybe is missing a slash between public and 93, instead a point?)

Can we set the status back to unbreak? Too many duplicate tasks are being created.

Poyekhali raised the priority of this task from High to Unbreak Now!.Apr 1 2017, 1:28 AM

Setting back to UBN!, due to the number of duplicate tasks and that there may be more files affected other than those reported here. We cannot afford to wait more files to be lost unexpectedly.

De728631 added a subscriber: De728631.EditedApr 1 2017, 1:59 AM

This seems to be a gradual process. When I commented on https://commons.wikimedia.org/wiki/File:Yaroslava_Shvedova.JPG at COM:AN, I could still access the thumbnail in the file history. Now this is gone too.

Update: on the other hand, https://commons.wikimedia.org/wiki/File:Wykolejony_%28Gruszecki%29_24.jpg is back again, but its upload log is missing.

Ladsgroup updated the task description. (Show Details)Apr 1 2017, 7:04 AM
Ankry added a comment.EditedApr 1 2017, 7:14 AM

Update: on the other hand, https://commons.wikimedia.org/wiki/File:Wykolejony_%28Gruszecki%29_24.jpg is back again, but its upload log is missing.

All pages from this book have empty upload log, so it is rather nothing strange here.
The file was renamed; the upload log is available under its previous name: https://commons.wikimedia.org/w/index.php?title=Special:Log&page=File%3AWykolejony013+a24.jpg.

Also, already came back:
https://commons.wikimedia.org/wiki/File:Andrzej_Kijowski_-_Listopadowy_wiecz%C3%B3r.djvu
https://de.wikipedia.org/wiki/Datei:Taiwan.JPG
https://de.wikipedia.org/wiki/Datei:Mayr_Andreas.jpg https://commons.wikimedia.org/wiki/File:MyanmarChin.png

Reported previously and not accessible at the moment:
https://commons.wikimedia.org/wiki/File:50_%D0%B4%D0%BE%D0%BC._%D0%A3%D0%BB%D0%B8%D1%86%D0%B0_%D0%9D%D0%B5%D0%BA%D1%80%D0%B0%D1%81%D0%BE%D0%B2%D0%B0._%D0%93%D0%BE%D1%80%D0%BE%D0%B4_%D0%A1%D0%B5%D0%B2%D0%B5%D1%80%D0%BE%D0%B4%D0%B2%D0%B8%D0%BD%D1%81%D0%BA._%D0%A4%D0%BE%D1%82%D0%BE_%D0%90%D0%BB%D0%B5%D0%BA%D1%81%D0%B5%D1%8F_%D0%A9%D0%B5%D0%BA%D0%B8%D0%BD%D0%BE%D0%B2%D0%B0.jpg
https://commons.wikimedia.org/wiki/File:Vladimir_Frolochkin.JPG
https://commons.wikimedia.org/wiki/File:School_Gyrls_at_Paramount_Studios.jpg
https://commons.wikimedia.org/wiki/File:Yaroslava_Shvedova.JPG

Files reported earlier have already appeared again. The ones reported later are still inaccessible.
Note, that if no special action has been taken concerning the files that are already available, it may mean that random files are still disappearing for some (quite long) period of time and then appearing again.

Users are reporting webm videos as breaking too when trying to re transcode them.

Paladox renamed this task from 404 error while accessing some images files (e.g. djvu, jpg, png) on Commons and other sites to 404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites.Apr 1 2017, 11:37 AM
Paladox added a subscriber: brion.
Paladox added a subscriber: Revent.Apr 1 2017, 11:40 AM

@Revent reported these urls

(the video works but scrolling to bottom and re transcoding them fails)

The video 'works' because when you simply play it, you view some transcode based on your preferences. If you try to view the original video by clicking the link directly under the thumbnail, you get a 404.

greg added a subscriber: greg.Apr 10 2017, 5:23 PM

Since some files linked here seem to 200 now (instead of 404) I'm lowering to "high", I'll keep looking at what might be causing this during a rebalance

Any update on that?

Poyekhali lowered the priority of this task from Unbreak Now! to High.Apr 11 2017, 9:27 AM

It seems this is no longer affecting our files, so I lowered the priority. Is this issue still occuring?

I think all disappearing files should be back now as rebalance has finished. We are working on bringing swift all to the same version in T162609: Swift version and distro upgrade which should be completed in the next few days, after that I'll issue another rebalance which shouldn't have the same side effects reported here.

fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.

@Revent thanks for your report! It looks like those file were moved by steinsplitterbot (ogg -> ogv) which I suspect is an instance of another bug related to moving files (e.g. T64057)

fgiunchedi closed this task as Resolved.May 11 2017, 2:56 PM
fgiunchedi claimed this task.
fgiunchedi lowered the priority of this task from High to Normal.

We are rebalancing both swift clusters but haven't seen a reoccurence of this (namely files disappearing and then reappearing. Tentatively closing but please reopen if this happens again.