Page MenuHomePhabricator

MediaWiki file operations are fragile, causing occasional data loss
Open, Needs TriagePublic

Assigned To
None
Authored By
Tgr
Dec 18 2016, 12:47 AM
Referenced Files
None
Tokens
"Burninate" token, awarded by ToBeFree."The World Burns" token, awarded by Liuxinyu970226."The World Burns" token, awarded by Nemo_bis.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The typical way (or maybe not, but the one we understand) for files to get lost is that a storage operation fails but the corresponding DB operation succeeds (or the other way around) and the filename in the DB ends up wrong; the file still sits on the disk but we don't know where. There are a few things that could be done about this:

  • T149847: RFC: Use content hash based image / thumb URLs will avoid this in some cases but not all (notably file deletion/undeletion)
  • we could create some abstract concept of transactions in MediaWiki which could cover operations with DB and non-DB elements. IIRC @daniel has been working on this.
  • or we could just store original files in the database, in which case it would become the single source of truth. We are talking about a hundred terabytes though.
Tgr renamed this task from File update operations are fragile, causing occasional data loss to MediaWiki file operations are fragile, causing occasional data loss.Dec 18 2016, 1:08 AM

The filerepos do have support for using a lockmanager

Poyekhali moved this task from Incoming to Backlog on the Commons board.
Poyekhali subscribed.

We don't want our files to be permanently lost, if truly the file operations are fragile, then more files may be affected by this bug.

Aklapper raised the priority of this task from High to Needs Triage.Dec 18 2016, 8:57 AM

The filerepos do have support for using a lockmanager

Locking just avoids race conditions. For rollback/atomicity you need to remember the old state.

try {
    copy file to new name
    update file name in database
    delete file under old name
} catch {
    delete file under new name
    restore old file name in database
}

or something like that. It can be tricky, especially when considering that the rollback operations can themselves fail, especially if the original failure is caused by some external problem. (And I suspect copying huge files instead of moving them can get tricky on its own right.)

There was an attempt to handle this via journaling (but I think it never got actually enabled). See rMW20fd877da4ef: Drop experimental FileJournal system without deprecation.