Page MenuHomePhabricator

The file "XXX" is in an inconsistent state within the internal storage backends
Open, MediumPublic

Description

When trying to process T289781: Server side upload for PantheraLeo1359531, I see the following error:

[urbanecm@mwmaint1002 ~/T289781]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='PantheraLeo1359531' .
Importing Files

Importing BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm...failed. (An unknown error occurred in storage backend "local-swift-codfw".)

Found: 1
Failed: 1
[urbanecm@mwmaint1002 ~/T289781]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='PantheraLeo1359531' .
Importing Files

Importing BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm...failed. (The file "mwstore://local-multiwrite/local-public/f/fd/BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm" is in an inconsistent
state within the internal storage backends)

Found: 1
Failed: 1
[urbanecm@mwmaint1002 ~/T289781]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='PantheraLeo1359531' .
Importing Files

Importing BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm...failed. (The file "mwstore://local-multiwrite/local-public/f/fd/BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm" is in an inconsistent
state within the internal storage backends)

Found: 1
Failed: 1
[urbanecm@mwmaint1002 ~/T289781]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='PantheraLeo1359531' .
Importing Files

Importing BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm...failed. (The file "mwstore://local-multiwrite/local-public/f/fd/BurningShip_Wiki_x264_CRF4_20210820_4500p60_002.webm" is in an inconsistent
state within the internal storage backends)

Found: 1
Failed: 1
[urbanecm@mwmaint1002 ~/T289781]$

As you can see, I tried to process the request several times, failing all the times.

This also applies to T290722, see:

[urbanecm@mwmaint1002 ~/uploads2]$ mwscript importImages.php --wiki=commonswiki --sleep=30 --comment-ext=txt --user='Nikola Smolenski' .
Importing Files

Importing Policiski recnik 1.pdf...done.
Importing Policiski recnik 2.pdf...done.
Importing Policiski recnik 3.pdf...failed. (An unknown error occurred in storage backend "local-swift-codfw".)

Found: 3
Added: 2
Failed: 1
[urbanecm@mwmaint1002 ~/uploads2]$ mwscript importImages.php --wiki=commonswiki --sleep=30 --comment-ext=txt --user='Nikola Smolenski' .
Importing Files

Policiski recnik 1.pdf exists, skipping
Policiski recnik 2.pdf exists, skipping
Importing Policiski recnik 3.pdf...failed. (The file "mwstore://local-multiwrite/local-public/c/c4/Policiski_recnik_3.pdf" is in an inconsistent state within the internal storage backends)

Found: 3
Skipped: 2
Failed: 1
[urbanecm@mwmaint1002 ~/uploads2]$ mwscript importImages.php --wiki=commonswiki --sleep=30 --comment-ext=txt --user='Nikola Smolenski' .
Importing Files

Policiski recnik 1.pdf exists, skipping
Policiski recnik 2.pdf exists, skipping
Importing Policiski recnik 3.pdf...failed. (The file "mwstore://local-multiwrite/local-public/c/c4/Policiski_recnik_3.pdf" is in an inconsistent state within the internal storage backends)

Found: 3
Skipped: 2
Failed: 1
[urbanecm@mwmaint1002 ~/uploads2]$

I'm unable to figure out what's wrong -- turning over to you for investigation.

Event Timeline

If it is of any help, when I try to open the file with Adobe's acroread, it reports "stat buffer overflow" error, while evince opens it nicely (if slowly).

Marostegui triaged this task as Medium priority.Sep 20 2021, 5:10 AM

Mentioned in SAL (#wikimedia-operations) [2021-10-23T15:45:16Z] <urbanecm> Start server-side upload for 1 video file (T289781), testing whether T291137 is still an issue

Mentioned in SAL (#wikimedia-operations) [2021-10-23T15:45:16Z] <urbanecm> Start server-side upload for 1 video file (T289781), testing whether T291137 is still an issue

It still was an issue.

Can someone investigate please? This is pending here for more than a month, while it's an issue that breaks a certain part of MW (with no workaround available). In another words, this looks like a major severity issue to me :)).

The original video file is gone; is there still an issue with the PDF collection? Thumbor has had problems handling PDFs (cf T337649), alternatively it might be that the offending PDF is malformed in some way (thumbor is pickier about files fitting their spec than some desktop tools, not unreasonably).
I'm afraid we don't really have enough visibility through the upload stack to be able to investigate this sort of problem very well (since obviously uploads in general are working OK).
@Frostly for future reference, tasks already tagged sre-swift-storage needn't be referred to the clinic duty person (whose role is more about ensuring unassigned tasks get passed to the relevant team).

I got these messages

Some or all of the undeletion failed: The file "mwstore://local-multiwrite/local-public/b/ba/Update_40220_Overview_-_Age_of_Empires_II-_DE.webm" is in an inconsistent state within the internal storage backends
Some or all of the undeletion failed: The file "mwstore://local-multiwrite/local-public/5/5a/Age_Royale_-_Age_II-_DE_Battle_Royal_Debut_Tournament.webm" is in an inconsistent state within the internal storage backends

Now I got while undeleting the same batch https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Files_found_with_insource:youtube.com/user/officialageofempires

Request from 89.248.174.2 via cp3067 cp3067, Varnish XID 954523496
Error: 503, Backend fetch failed at Wed, 13 Dec 2023 12:12:23 GMT

and yesterday I got T353272 while undeleting another batch of videos.

Again

Some or all of the undeletion failed: The file "mwstore://local-multiwrite/local-public/c/c6/Logo_AoE_III_DE_-_Mexico_Civilization_02.png" is in an inconsistent state within the internal storage backends
Some or all of the undeletion failed: The file "mwstore://local-multiwrite/local-public/9/9a/Age_of_Empires_III_Definitive_Edition_-_Overview.webm" is in an inconsistent state within the internal storage backends

Just for clarification so that I understand: Afaik, the main reason for deletion the Age of Empires-related contents was the doubt regarding the Creative Commons license that was added to the respective video on YouTube by the Age of Empires YouTube channel (https://www.youtube.com/@ageofempires). The rationale was COM:PCP. Did the opinion change on this topic?

Also, the AoE channel seems have removed the Creative Commons license note from their videos. This might indicate that the licensing was not proper (but this is not confirmed) (compare https://www.youtube.com/watch?v=O67dQQ9ZAqs and https://web.archive.org/web/20221030072141/https://www.youtube.com/watch?v=O67dQQ9ZAqs) and other videos.

I just want to be careful and want to make sure that, in case of an undeletion, the licensing is valid. :)

(Of course, CC license cannot be revoked, but only if a CC-BY license has been granted by the rights holder.

Thank you and greetings :)

The free license is irrevocable, and all the files were license reviewed.

https://commons.wikimedia.org/w/index.php?title=File:Reserve_Bank_of_Zimbabwe_5_Dollars_2019_obseve.jpg can't be undeleted due to this bug:

Some or all of the undeletion failed: The file "mwstore://local-multiwrite/local-public/e/e3/Reserve_Bank_of_Zimbabwe_5_Dollars_2019_obseve.jpg" is in an inconsistent state within the internal storage backends

@Yann Please open new tickets if you have a new object you want looking at, otherwise these phab tickets just become a series of loosely-related issues - the underlying problem is with mw's rather ropey grasp of object metadata, which isn't something that the Swift service owners can help with (so should be raised elsewhere).

That said; your deleted object is in both swift clusters's storage of deleted objects (per swift stat wikipedia-commons-local-deleted.eq e/q/l/eqlwt75pym1gwi9c3u6y6tbzk048ozw.jpg, last-modified Sun, 15 May 2022 09:23:25 GMT). The undeleted location is occupied in the eqiad cluster (per swift stat wikipedia-commons-local-public.e3 e/e3/Reserve_Bank_of_Zimbabwe_5_Dollars_2019_obseve.jpg, last-modified Mon, 11 Oct 2021 12:27:22 GMT), with what I suspect is the same object, but not in the codfw cluster.

Which I assume is why the undeletion didn't work; it's not clear to me whether that single copy has always been there (and if so, why no subsequent sync run has copied it to codfw or removed it from eqiad to make it consistent). I can find no non-HEAD requests of that object in the swift logs (but we only keep these for a few days).

The old reports are too old and the logs have been purged but for the Zimbawe $5 picture, I found two logs that might be useful:
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.10.10?id=QIvhdZIBa-PL6vFeQ0-Z
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-mediawiki-1-7.0.0-1-2024.10.10?id=P4vhdZIBa-PL6vFeQ0-Z

The first one says it's failed to resync different swift backends and the second one says that (from codfw) it didn't have rights to delete the file on eqiad:

Wikimedia\FileBackend\FileBackendMultiWrite::resyncFiles: not allowed to delete file 'mwstore://local-swift-eqiad/local-public/e/e3/Reserve_Bank_of_Zimbabwe_5_Dollars_2019_obseve.jpg'

Is the ACL for that container correct? there might be some missing rights for remote mw appservers in some containers (how does remote connection even work?). Also it's quite possible that the error is something else but mw is showing permission denied error instead. I wouldn't trust anything in mw filebackend.

I will need to spend some time and dig deep on this but not sure I will able to do so right now. Maybe early next week.

FWIW, I ran a check on all containers of commons and their ACLs and none were a black swan.