Page MenuHomePhabricator

Ongoing media storage errors: backend-fail-internal on deletions and ~2000 read errors/s
Closed, ResolvedPublic

Event Timeline

When deleting https://commons.wikimedia.org/wiki/File:Froggo.png I get Error deleting file: An unknown error occurred in storage backend "local-swift-eqiad".

Error message:

An error occurred while trying to do the requested action.
A detailed description of the error is shown below:
API request failed (backend-fail-internal): An unknown error occurred in storage backend "local-swift-eqiad". <i>at Thu, 25 Aug 2022 09:24:19 GMT</i> <u>served by mw1422</u>

Now with https://commons.wikimedia.org/wiki/File:Shaheen_Afridi_2022.jpg I get Error deleting file: Could not create directory "mwstore://local-multiwrite/local-deleted/9/s/2".

jcrespo triaged this task as Unbreak Now! priority.Aug 25 2022, 10:17 AM
jcrespo renamed this task from server-glitch hampering deletions: backend-fail-internal to Ongoing media storage errors: backend-fail-internal on deletions and ~2000 read errors/s.Aug 25 2022, 10:22 AM
jcrespo updated the task description. (Show Details)

We have focused on updating primarily the status page (https://www.wikimediastatus.net), but we believe we have identified the main issue or issues, a fix has been applied and now we are waiting to make sure everything is working as normal.

jcrespo lowered the priority of this task from Unbreak Now! to High.EditedAug 25 2022, 12:55 PM

We believe this is solved now- RFO seemed to be an issue with some proxies for the media storage service, afecting a subset of the reads and writes of users at random. Full report will be soon created on WikiTech wiki and linked here. Data (media files) itself was not affected, other than those requests that failed to be uploaded while the issue was ongoing- approximately 609 unsuccessful upload attempts since yesterday at 15:00 UTC.

Leaving this open until a full outage report has been generated.

I again get the same error while deleting files, i.e. https://commons.wikimedia.org/wiki/File:The_Thililua.jpg and other files from https://commons.wikimedia.org/wiki/User_talk:Pedrorexspeculanary
The error disappears after some time, but it is annoying.

@Yann This issue described on this ticket refers to a very particular media storage proxy/storage servers incident, which was fixed and has not reappeared, as you can see on the following graph: https://grafana.wikimedia.org/goto/ZlhR8IM4k

Now that doesn't mean there could be another, different issue on deletion (even if outputting the same error)- but I recommend you reporting it on a separate ticket- otherwise it will likely be ignored, as it will be buried on the comments for the original issue on the 25 Aug (the only reason this ticket is still open is to make sure a report is written, otherwise it will not receive more attention. Use the tags Commons and Wikimedia-production-error and the timestamp of the errors to make sure the new problems are properly attended and routed to the right people.

OK, thanks for the information. Now this new error seem to be difficult to reproduce, and may already be gone. I don't know if it is worth opening a new ticket.

Krinkle claimed this task.
Krinkle moved this task from Untriaged to Aug 2022 on the Wikimedia-production-error board.
Krinkle removed a project: Wikimedia-Incident.