bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz40927.
McZusatz created this task.Via LegacyOct 10 2012, 5:29 PM
bzimport added a comment.Via ConduitOct 12 2012, 6:39 PM

sumanah wrote:

Marco, thanks for reporting this. Can you find any pattern in which files are affected?

bzimport added a comment.Via ConduitOct 12 2012, 6:43 PM

sumanah wrote:

For ease in Bugzilla searching, the error message:

API request failed (internal_api_error_DBQueryError): Database query error <i>at Sun, 07 Oct 2012 15:27:11 GMT</i>

Task: movePage
NextTask: removeTemplate
LastTask: reloadPage
Page: File:Hl._Lukas_-_Reliquienschrein_und_Prozessionskerzenhalter.jpg
Skin: vector
[{{fullurl:Special:Contributions|target=Aa1bb2cc3dd4ee5&offset=20121007152843}} Contribs before error]

So, unless this is happening a lot, I think this might have just been a one-time API service outage/hiccup.

bzimport added a comment.Via ConduitOct 12 2012, 10:16 PM

Bryan.TongMinh wrote:

(In reply to comment #2)

So, unless this is happening a lot, I think this might have just been a
one-time API service outage/hiccup.

Still, the file moving process should really be resilient against these situations.

Aklapper added a comment.Via ConduitOct 12 2012, 10:21 PM

Marco: Is this problem reproducible?

aaron added a comment.Via ConduitOct 12 2012, 10:24 PM

(In reply to comment #0)

Moving of some files results in erroneous behavior:
Only partial contents of the files get moved to the new name. The rest of the
information is either lost or kept on the old file name.

What do you mean by "partial contents of the files"?

MaxSem added a comment.Via ConduitOct 12 2012, 10:42 PM

Sun Oct 7 15:26:38 UTC 2012 srv214 commonswiki LocalFile::lock 10.0.6.41 1205 Lock wait timeout exceeded; try restarting transaction (10.0.6.41) SELECT 1 FROM image WHERE img_name = 'Hl._Lukas_-_Reliquienschrein_und_Prozessionskerzenhalter.jpg' LIMIT 1 FOR UPDATE

Sun Oct 7 15:28:02 UTC 2012 srv252 commonswiki LocalFile::lock 10.0.6.41 1205 Lock wait timeout exceeded; try restarting transaction (10.0.6.41) SELECT 1 FROM image WHERE img_name = 'Hl._Lukas_-_Reliquienschrein_und_Prozessionskerzenhalter.jpg' LIMIT 1 FOR UPDATE

McZusatz added a comment.Via ConduitOct 13 2012, 7:58 AM

(In reply to comment #1)

Marco, thanks for reporting this. Can you find any pattern in which files are
affected?

(In reply to comment #4)

Marco: Is this problem reproducible?

Sorry, I did not find any pattern and I could also not reproduce the behavior.

(In reply to comment #5)

What do you mean by "partial contents of the files"?

Description is one part and the image file is the other part. So only one part gets moved and the other stays as far as I could observe.

aaron added a comment.Via ConduitOct 17 2012, 12:16 AM

Title.php has to deal with some transactional things:
a) The file moves
b) The file DB metadata moves
c) The actual description page move

The first two are wrapped in a transaction as best as possible, it's possibly for partitions/errors to cause:
a) no move but duplicate files to be at the destination (failure to delete target files on error)
b) files at the destination and description/metadata still at the target (failure to commit()).

The first two are in a transaction by themselves for contention reasons. One could wrap some code in Title::moveTo() in $file->lock()/$file->unlock() calls to get all three in on DB transaction, but that would hold locks while thumbnails are purged. For this to work, the purges would need to be deferred till the description page is moved. Even still, it would be vulnerable to network partitions (failure to revert files on error or commit failure).

People have made various tweaks to this area to "improve" it (one of which caused a race condition that resulted in numerous lost files). To really fix this, we'd want to redo the file layout so we only create files (under hash names or uuid4s) but don't mutate them. A MW layor would do the "pretty name" and authentication (public vs deleted files) mapping to the actual files. File delete/restore/move would just be shuffling DB rows around (plus some purges outside of the transaction), which could be in one ACID transaction.

I have my doubts that anyone is willing to invest in redoing are stupid file layout anytime soon.

Aklapper added a comment.Via ConduitNov 3 2012, 8:26 PM

Not reproducible for the reporter, and no indicators that this happens a lot. => Lowering priority.

Denniss added a comment.Via ConduitFeb 8 2013, 1:10 PM

Two more failed moves have been added (linked from post above). All three failed moves have one commons issue, one move entry in the history for the source page but two move entries on the target page. The move order did not move the original description page but the new description page after file deletion.

Denniss added a comment.Via ConduitFeb 10 2013, 5:16 AM
  • Bug 44831 has been marked as a duplicate of this bug. ***
Rillke added a comment.Via ConduitFeb 14 2013, 10:20 AM

Lost file description pages can have legal implications.

aaron added a comment.Via ConduitNov 26 2013, 6:38 PM

Has this occurred recently?

Rillke added a comment.Via ConduitNov 26 2013, 9:17 PM

(In reply to comment #15)

Has this occurred recently?

Yes, at https://commons.wikimedia.org/wiki/File:Cambridge_Public_Libary_-_Cambridge,_MA_-_DSC00103.JPG , the file disappeared after moving.

Bawolff added a comment.Via ConduitMay 22 2014, 11:04 PM

Aaron's https://gerrit.wikimedia.org/r/#/c/127460/ will probably help this situation.

Rillke added a comment.Via ConduitMay 22 2014, 11:14 PM

(In reply to Bawolff (Brian Wolff) from comment #17)
+++ support, support, support

this has been something i was always in favour of; it allows using MW with IIS properly, file renaming operations will be much faster etc. etc. -- push this!

MarkTraceur added a comment.Via ConduitMay 22 2014, 11:15 PM

We're trying to get tests built into it first - I took a stab but was struggling to get it working right.

Aklapper added a comment.Via ConduitAug 4 2014, 1:25 PM

(In reply to Bawolff (Brian Wolff) from comment #17)

Aaron's https://gerrit.wikimedia.org/r/#/c/127460/ will probably help this
situation.

Patch still awaiting review / merge...

gerritbot added a comment.Via ConduitAug 14 2014, 5:26 PM

Change 153990 had a related patch set uploaded by Aaron Schulz:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/153990

gerritbot added a comment.Via ConduitAug 21 2014, 3:38 PM

Change 153990 merged by jenkins-bot:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/153990

gerritbot added a comment.Via ConduitAug 21 2014, 4:19 PM

Change 155575 had a related patch set uploaded by Aaron Schulz:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/155575

gerritbot added a comment.Via ConduitAug 21 2014, 11:51 PM

Change 155575 merged by jenkins-bot:
Made LocalFile move/delete/restore handle network partitions better

https://gerrit.wikimedia.org/r/155575

Aklapper added a comment.Via ConduitAug 31 2014, 9:17 PM

All patches merged - what's left to do here?

Aklapper added a comment.Via ConduitOct 8 2014, 11:47 AM

(In reply to Andre Klapper from comment #25)

All patches merged - what's left to do here?

No reply to comment 25. Assuming this bug is FIXED.
If that is not the case: Please reopen and elaborate what is left to do here to get this report fixed.

Gilles added a project: Multimedia.Via WebDec 4 2014, 9:24 AM
Gilles raised the priority of this task from "Normal" to "Unbreak Now!".Via WebDec 4 2014, 10:11 AM
Gilles moved this task to Closed on the Multimedia workboard.
Gilles lowered the priority of this task from "Unbreak Now!" to "Normal".Via ConduitDec 4 2014, 11:21 AM

Add Comment