⚓ T153565 MediaWiki file operations are fragile, causing occasional data loss

Tgr created this task.Dec 18 2016, 12:47 AM

Restricted Application added projects: Multimedia, Commons. · View Herald TranscriptDec 18 2016, 12:47 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Tgr mentioned this in T153540: Metro Mad Linea 7.png file half-disappeared - it can't be used.Dec 18 2016, 12:52 AM

Tgr mentioned this in T53001: Image tarball dumps on your.org are not being generated.Dec 18 2016, 12:55 AM

The typical way (or maybe not, but the one we understand) for files to get lost is that a storage operation fails but the corresponding DB operation succeeds (or the other way around) and the filename in the DB ends up wrong; the file still sits on the disk but we don't know where. There are a few things that could be done about this:

T149847: RFC: Use content hash based image / thumb URLs will avoid this in some cases but not all (notably file deletion/undeletion)
we could create some abstract concept of transactions in MediaWiki which could cover operations with DB and non-DB elements. IIRC @daniel has been working on this.
or we could just store original files in the database, in which case it would become the single source of truth. We are talking about a hundred terabytes though.

Tgr renamed this task from File update operations are fragile, causing occasional data loss to MediaWiki file operations are fragile, causing occasional data loss.Dec 18 2016, 1:08 AM

The filerepos do have support for using a lockmanager

We don't want our files to be permanently lost, if truly the file operations are fragile, then more files may be affected by this bug.

Poyekhali moved this task from Untriaged to Triaged on the Multimedia board.Dec 18 2016, 1:31 AM

Paladox subscribed.Dec 18 2016, 1:32 AM

zhuyifei1999 subscribed.Dec 18 2016, 6:06 AM

Aklapper raised the priority of this task from High to Needs Triage.Dec 18 2016, 8:57 AM

matmarex subscribed.Dec 18 2016, 11:15 AM

fgiunchedi subscribed.Dec 19 2016, 6:58 PM

Nemo_bis awarded a token.Dec 21 2016, 8:49 AM

Nemo_bis subscribed.

In T153565#2884406, @Platonides wrote:

The filerepos do have support for using a lockmanager

Locking just avoids race conditions. For rollback/atomicity you need to remember the old state.

try {
    copy file to new name
    update file name in database
    delete file under old name
} catch {
    delete file under new name
    restore old file name in database
}

or something like that. It can be tricky, especially when considering that the rollback operations can themselves fail, especially if the original failure is caused by some external problem. (And I suspect copying huge files instead of moving them can get tricky on its own right.)

zhuyifei1999 updated the task description. (Show Details)Mar 16 2017, 5:49 PM

zhuyifei1999 mentioned this in T161836: 404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites.Mar 31 2017, 10:58 AM

MarkTraceur moved this task from Triaged to Tracking on the Multimedia board.Jun 5 2017, 3:09 PM

Tgr updated the task description. (Show Details)Jun 28 2018, 3:01 PM

Tgr mentioned this in T198177: Due to PHP fatal, a new version upload overwrote a file (the original is gone).Jun 28 2018, 3:14 PM

Tgr mentioned this in T198350: Rising lock wait timeout SQL errors upon 1.32.0-wmf.10 group1 deployment.Jun 28 2018, 9:55 PM

jcrespo updated the task description. (Show Details)Jun 29 2018, 5:38 AM

Yann subscribed.Jun 29 2018, 10:13 AM

Ciencia_Al_Poder subscribed.Jul 2 2018, 6:07 PM

Liuxinyu970226 awarded a token.Jul 6 2018, 4:03 AM

Liuxinyu970226 subscribed.

Alexia mentioned this in T211442: Database errors during MovePage operations for file moves causes data loss..Dec 7 2018, 6:32 PM