Page MenuHomePhabricator

Due to PHP fatal, a new version upload overwrote a file (the original is gone)
Open, HighPublicPRODUCTION ERROR

Description

https://commons.wikimedia.org/wiki/File:WikipediaTomeRaider3.png

I cropped some nonfree software parts from this screenshot and uploaded the new version. When I hit the upload button, I was presented this, in red:

[WzH-EQpAADsAABKNuaoAAADH] 2018-06-26 08:53:40: Fatal exception of type "MWException"

I just hit back and tried again. But now, the original revision is gone! It's still there, but the actual file now shows my new version.

Update:
Within a few minutes, all of Commons went down. "Our servers are currently under maintenance or experiencing a technical problem. Please try again in a few minutes." The old revision is still gone.

This was an effect of T198350: Rising lock wait timeout SQL errors upon 1.32.0-wmf.10 group1 deployment, which was dues to erronous locking during page deletion, see T198350#4322869. While it's unavoidable that sometimes operations fail due to outages or bugs, the data loss described here should be avoided by improving the code that manages database updates during file uploads. See also @jcrespo's comment at T198350#4323879.

Event Timeline

File storage issue → SRE-swift-storage on Wikimedia wikis.

Within a few minutes, all of Commons went down.

That was an incident triggered by scap in https://phabricator.wikimedia.org/T190327, sorry for that (https://wikitech.wikimedia.org/wiki/Server_Admin_Log 2018-06-26 at 09:33)!

Could you please verify that the problem still exists? This might have been a temporary issue / cache issues on the client side, if I understand the SRE team correctly.

If it still exists, please provide some extra context about what you are seeing vs. what you expect seeing (clarify the "the original revision is gone!"). The error you saw at the time was a (real) but unrelated incident, not caused by your upload.

Could you please verify that the problem still exists? This might have been a temporary issue / cache issues on the client side, if I understand the SRE team correctly.

I don't understand. I mean, I wouldn't have reported this if I hadn't already ruled out local cache. (seemed so obvious I didn't even mention it) Maybe they want me to run wget --no-cache? Okay, if that makes the SRE team happy, why not: same result.

https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085559%21WikipediaTomeRaider3.png shows the same image as https://upload.wikimedia.org/wikipedia/commons/4/42/WikipediaTomeRaider3.png. I expected the uncropped version which I've now uploaded to http://tinypic.com/r/or2fwy/9.

jcrespo added a subscriber: brooke.

Sorry, I didn't understand your original report, @AlexisJazz - you were mentioning 2 errors that were not the main issue here, and I got distracted by that. For what you say on your last comment, uploading a new version of the image overwrote the original one- that is the real problem and definitely shouldn't happen, in any circumstance. I am adding Multimedia team to see if they can think of a reason/debug how could that happen- as either the new upload should have failed completely, or succeeded, but never overwrite one another.

Multimedia, Can you help debugging? I cannot even thing how this would be possible as objects, as far I know, they are not overwritten, just new versions get uploaded? Definitely a data loss, so I would consider this high priority.

@jcrespo it would be an incredible coincidence if the MWException error had nothing to do with this. Unfortunately, I can't remember if I even checked what the page looked like right after that error.

I think I looked at it in another tab (before hitting back and trying the upload again) and found it to look as if nothing had happened (no new revision), but at that point I may well have been looking at my browser cache for the file contents so I wouldn't know if the file had changed without adding a new revision. But this is just what I think I remember, so don't take my word for it. I'm not even 100% sure I looked at it before trying the upload again.

I need to try something..and.. lo and behold.. I FOUND IT!!!!

https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085337!WikipediaTomeRaider3.png

How confusing is that?

It certainly makes no sense to me, according to medatada, we have:

root@db1068.eqiad.wmnet[commonswiki]> select * FROM image where img_name = 'WikipediaTomeRaider3.png'\G
*************************** 1. row ***************************
       img_name: WikipediaTomeRaider3.png
       img_size: 2470
      img_width: 240
     img_height: 191
...
  img_user_text: Alexis Jazz
  img_timestamp: 20180626085600
       img_sha1: rqibi39elqnh1lycpcywhnepf31rtv0
...

That is ok and expected, but then we have:

root@db1068.eqiad.wmnet[commonswiki]> select * FROM oldimage where oi_name = 'WikipediaTomeRaider3.png'\G
*************************** 1. row ***************************
          oi_name: WikipediaTomeRaider3.png
  oi_archive_name: 20180626085559!WikipediaTomeRaider3.png
          oi_size: 6258
         oi_width: 240
        oi_height: 320
...
     oi_user_text: BotMultichill
         oi_actor: 0
     oi_timestamp: 20071219163024
...
    oi_media_type: BITMAP
    oi_major_mime: image
    oi_minor_mime: png
       oi_deleted: 0
          oi_sha1: t9f0gwoh010831fajmaj6cqkttx9ile

Which has the right (old) dimensions and I probably the right sha1 (different from the new one), but points to the wrong archive name. That shouldn't happen, even if there was any error in the upload or code (!?). I will check the binary log to see the database write that happened and see if it is a medata update failure or a file data failure.

This is a summary of the transaction, and the weird thing is the insert at the beginning- that insert shouldn't have been there- the image should be existing, and even if it was tried, it should have failed because of unique image name(Primary key img_name)? It is an insert ignore.

DELIMITER /*!*/;
# at 926100740
BEGIN
INSERT /* LocalFile::recordUpload2  */ IGNORE INTO `image` (img_name,img_size,img_width,img_height,img_bits,img_media_type,img_major_mime,img_minor_mime,img_timestamp,img_metadata,img_sha1,img_description,img_user,img_user_text) VALUES ('WikipediaTomeRaider3.png','2470','240','191','8','BITMAP','image','png','20180626085600'...,'crop nonfree software',...,'Alexis Jazz')
SET TIMESTAMP=1530003360/*!*/;
INSERT /* CommentStore::insertInternal  */  INTO `image_comment_temp` (imgcomment_name,imgcomment_description_id) VALUES ('WikipediaTomeRaider3.png','17919156')
SET TIMESTAMP=1530003360/*!*/;
INSERT /* LocalFile::recordUpload2  */  INTO `oldimage` (oi_name,oi_archive_name,oi_size,oi_width,oi_height,oi_bits,oi_timestamp,oi_metadata,oi_media_type,oi_major_mime,oi_minor_mime,oi_sha1,oi_description,oi_description_id,oi_user,oi_user_text) VALUES ('WikipediaTomeRaider3.png','20180626085559!WikipediaTomeRaider3.png','6258','240','320','8','20071219163024',...,'BotMultichill')
UPDATE /* LocalFile::recordUpload2  */  `image` SET img_size = '2470',img_width = '240',img_height = '191',img_bits = '8',img_media_type = 'BITMAP',img_major_mime = 'image',img_minor_mime = 'png',img_timestamp = '20180626085600',..., ,img_user_text = 'Alexis Jazz' WHERE img_name = 'WikipediaTomeRaider3.png'
SET TIMESTAMP=1530003360/*!*/;
DELETE /* LocalFile::recordUpload2  */ FROM `image_comment_temp` WHERE imgcomment_name = 'WikipediaTomeRaider3.png'
SET TIMESTAMP=1530003360/*!*/;
INSERT /* CommentStore::insertInternal  */  INTO `image_comment_temp` (imgcomment_name,imgcomment_description_id) VALUES ('WikipediaTomeRaider3.png','17919155')
SET INSERT_ID=272374007/*!*/;
SET TIMESTAMP=1530003360/*!*/;
INSERT /* ManualLogEntry::insert  */  INTO `logging` (log_type,log_action,log_timestamp,log_namespace,log_title,log_page,log_params,log_comment,log_comment_id,log_user,log_user_text) VALUES ('upload','overwrite','20180626085600','6','WikipediaTomeRaider3.png',...,'Alexis Jazz')
SET INSERT_ID=17919287/*!*/;
SET TIMESTAMP=1530003360/*!*/;
INSERT /* CommentStore::createComment  */  INTO `comment` (comment_hash,comment_text,comment_data) VALUES ('-204616721','Alexis Jazz uploaded a new version of [[File:WikipediaTomeRaider3.png]]',NULL)
SET INSERT_ID=308173505/*!*/;
SET TIMESTAMP=1530003360/*!*/;
INSERT /* MediaWiki\Storage\RevisionStore::insertRevisionOn  */  INTO `revision` (rev_page,rev_parent_id,rev_text_id,rev_minor_edit,rev_timestamp,rev_deleted,rev_len,rev_sha1,rev_comment,rev_user,rev_user_text,rev_content_model,rev_content_format) VALUES ('3250455','174472206','171267983', ...)
SET TIMESTAMP=1530003360/*!*/;
INSERT /* CommentStore::insertInternal  */  INTO `revision_comment_temp` (revcomment_rev,revcomment_comment_id) VALUES ('308173505','17919287')
/*!*/;
UPDATE /* WikiPage::updateRevisionOn  */  `page` SET page_latest = '308173505',page_touched = '20180626085600',page_is_new = '0',page_is_redirect = '0',page_len = '680',page_content_model = 'wikitext' WHERE page_id = '3250455'
SET TIMESTAMP=1530003360/*!*/;
DELETE /* WikiPage::updateRedirectOn  */ FROM `redirect` WHERE rd_from = '3250455'
SET INSERT_ID=110354855/*!*/;
SET TIMESTAMP=1530003360/*!*/;
INSERT /* WatchedItemStore::addWatchBatchForUser  */ IGNORE INTO `watchlist` (wl_user,wl_namespace,wl_title,wl_notificationtimestamp) VALUES ('208920','6','WikipediaTomeRaider3.png',NULL),('208920','7','WikipediaTomeRaider3.png',NULL)
/*!*/;
SET TIMESTAMP=1530003360/*!*/;
DELETE /* GlobalUsage::deleteLinksToFile  */ FROM `globalimagelinks` WHERE gil_wiki = 'commonswiki' AND gil_to = 'WikipediaTomeRaider3.png'
/*!*/;
COMMIT/*!*/;

My initial conclusion would be that somehow, the original image was missing or deleted- otherwise the insert would have not happened? It is inserted and then moved to old_image I don't know the code, maybe it is expected?

@jcrespo I don't know how the systems work, so I'm just guessing here. When I uploaded the first time, I think the old image was moved from https://upload.wikimedia.org/wikipedia/commons/4/42/WikipediaTomeRaider3.png to https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085337!WikipediaTomeRaider3.png and the new cropped image was placed at https://upload.wikimedia.org/wikipedia/commons/4/42/WikipediaTomeRaider3.png. For some reason, I think no database write happened at all. The new image took the place of the old.

After that I hit back and tried again. The already-cropped image from https://upload.wikimedia.org/wikipedia/commons/4/42/WikipediaTomeRaider3.png was moved to https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085559%21WikipediaTomeRaider3.png and the new (identical) upload was placed at https://upload.wikimedia.org/wikipedia/commons/4/42/WikipediaTomeRaider3.png. The system just assumed the first overwrite never happened because the database had no record.

I don't know how these things work so I could be talking complete nonsense here.

Oh, I just saw the INSERT IGNORE, so that should have just silently failed- it happens every time. I am going to check, however, if I can, what where the original values of that row from the backups.

This is the first relevant write to oldimage- it writes the wrong filename, 20180626085559!WikipediaTomeRaider3.png instead of 20180626085337!WikipediaTomeRaider3.png to the archive- all other fields, however, are correct (size, etc.), so maybe something went wrong on the file handling but the database edits continued somehow.

#180626  8:56:00 server id 171978775  end_log_pos 522416525     Write_rows: table id 438 flags: STMT_END_F
### INSERT INTO `commonswiki`.`oldimage`
### SET
###   @1='WikipediaTomeRaider3.png' /* VARSTRING(255) meta=255 nullable=0 is_null=0 */
###   @2='20180626085559!WikipediaTomeRaider3.png' /* VARSTRING(255) meta=255 nullable=0 is_null=0 */
###   @3=6258 /* INT meta=0 nullable=0 is_null=0 */
###   @4=240 /* INT meta=0 nullable=0 is_null=0 */
###   @5=320 /* INT meta=0 nullable=0 is_null=0 */

This is the limit of what I can see that happened, the why is something that I need Multimedia team to determine and see if it can happen again.

@AlexisJazz I could repoint manually to the right file(fixing the immediate issue), but I don't want to touch anything until better research is done by the feature owners.

The long term fix is T28741, where only inserts to a hypothetical image_revision table happen, simplifying the process and making it less error prone- but the migration is not a short term actionable.

@jcrespo Although something went wrong, the outcome is actually desirable. I may have requested a revdel otherwise. The original (https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085337!WikipediaTomeRaider3.png) shows some copyrighted elements which are not allowed on Commons, that's why I uploaded the crop in the first place.

It didn't write the wrong filename? That happened at 8:56. 20180626085559!WikipediaTomeRaider3.png was the right filename for that. The problem was, 20180626085337!WikipediaTomeRaider3.png already existed at that time.

@AlexisJazz This is important because it may be a sign of a way to delete existing content- which will be undesirable on most other cases.

This was the content of image one week ago:

("WikipediaTomeRaider3.png",6258,240,320,"a:3:{s:10:\"frameCount\";i:0;s:9:\"loopCount\";i:1;s:8:\"duration\";d:0;}",8,"BITMAP","image","png","\n\n{{BotMoveToCommons|nl.wikipedia}}\n{{Information\n|Description={{nl|schermdump Pocket PC zelf gemaakt}}\n|Source=Transfered from [http://nl.wikipedia.org nl.wikipedia]\n|Date=2003-06-29 (original upload date)\n|Author=Original uploader was [[:nl:User:Erik Za",0,211386,"BotMultichill",0,"20071219163024","t9f0gwoh010831fajmaj6cqkttx9ile",0),

Which means some issue happened with the name (or the storage) on archiving the old one (the rest of the metadata is ok).

No relation to T198349? Could just be a coincidence.

@jcrespo see https://commons.wikimedia.org/wiki/File:Commons_database_error_DBQueryError.png. I tried upload a new version. (cropped from 808 to 800 pixels) I got this:

A database query error has occurred. This may indicate a bug in the software.

[WzPlqwpAIDAAAENRhT8AAADP] 2018-06-27 19:30:05: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

The crop WAS uploaded. You see only one revision which is reported to be 808 pixels, but it's really 800 pixels.

The 808 pixel version is almost certainly also still there, I'll find it in a minute.

https://upload.wikimedia.org/wikipedia/commons/archive/0/03/20180627192947%21Commons_database_error_DBQueryError.png

jcrespo triaged this task as Unbreak Now! priority.Jun 28 2018, 6:59 AM

I am unilaterly going to put this as unbreak now, as as far as I can see, overwriting images can lead to data loss.

I tried to reproduced it with [[File:Test_T198177.png]], but it didn't happen, the next try would be for files older than 2015.

I also tested it on an older image: https://commons.wikimedia.org/wiki/File:Wikipedia-es-GFDL-version.png but couldn't reproduce it. @Yann could you share exactly the steps done- which method of upload you used, did you get any error like @AlexisJazz did?

Hi, It happened when using CropTool: T198353. So I guess this is also related to T198350.

The stack trace is

Function: WikiPage::lockAndGetLatest
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.77)
#0 /srv/mediawiki/php-1.32.0-wmf.8/includes/libs/rdbms/database/Database.php(1413): Wikimedia\Rdbms\Database->makeQueryException(string, integer, string, string)
#1 /srv/mediawiki/php-1.32.0-wmf.8/includes/libs/rdbms/database/Database.php(1186): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /srv/mediawiki/php-1.32.0-wmf.8/includes/libs/rdbms/database/Database.php(1640): Wikimedia\Rdbms\Database->query(string, string)
#3 /srv/mediawiki/php-1.32.0-wmf.8/includes/libs/rdbms/database/Database.php(1465): Wikimedia\Rdbms\Database->select(string, string, array, string, array, array)
#4 /srv/mediawiki/php-1.32.0-wmf.8/includes/page/WikiPage.php(3045): Wikimedia\Rdbms\Database->selectField(string, string, array, string, array)
#5 /srv/mediawiki/php-1.32.0-wmf.8/includes/page/WikiPage.php(2846): WikiPage->lockAndGetLatest()
#6 /srv/mediawiki/php-1.32.0-wmf.8/includes/api/ApiDelete.php(127): WikiPage->doDeleteArticleReal(string, boolean, integer, boolean, string, User, NULL)
#7 /srv/mediawiki/php-1.32.0-wmf.8/includes/api/ApiDelete.php(75): ApiDelete::delete(WikiPage, User, string, NULL)
#8 /srv/mediawiki/php-1.32.0-wmf.8/includes/api/ApiMain.php(1581): ApiDelete->execute()
#9 /srv/mediawiki/php-1.32.0-wmf.8/includes/api/ApiMain.php(535): ApiMain->executeAction()
#10 /srv/mediawiki/php-1.32.0-wmf.8/includes/api/ApiMain.php(506): ApiMain->executeActionWithErrorHandling()
#11 /srv/mediawiki/php-1.32.0-wmf.8/api.php(83): ApiMain->execute()
#12 /srv/mediawiki/w/api.php(3): include(string)
#13 {main}

So yeah, probably a combination of T198350: Rising lock wait timeout SQL errors upon 1.32.0-wmf.10 group1 deployment and T153565: MediaWiki file operations are fragile, causing occasional data loss.

Vvjjkkii renamed this task from Overwritten a file on Commons, literally, the original is gone to eaaaaaaaaa.Jul 1 2018, 1:01 AM
Vvjjkkii lowered the priority of this task from Unbreak Now! to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
JJMC89 renamed this task from eaaaaaaaaa to Overwritten a file on Commons, literally, the original is gone.Jul 1 2018, 2:44 AM
JJMC89 raised the priority of this task from High to Unbreak Now!.
JJMC89 updated the task description. (Show Details)
JJMC89 added a subscriber: Aklapper.
Jdforrester-WMF renamed this task from Overwritten a file on Commons, literally, the original is gone to Due to PHP fatal, a new version upload overwrote a file (the original is gone).Jul 18 2018, 10:27 AM
Jdforrester-WMF lowered the priority of this task from Unbreak Now! to High.
Jdforrester-WMF subscribed.

I'm de-prioritising this as it's unrecoverable and the general comments are valid but not new.

I'm de-prioritising this as it's unrecoverable and the general comments are valid but not new.

@Jdforrester-WMF this is not true. This old revision of WikipediaTomeRaider3.png can still be found at https://upload.wikimedia.org/wikipedia/commons/archive/4/42/20180626085337%21WikipediaTomeRaider3.png. The old revision of Commons_database_error_DBQueryError.png can be found at https://upload.wikimedia.org/wikipedia/commons/archive/0/03/20180627192947%21Commons_database_error_DBQueryError.png. It's just really time-consuming to find without shell access, but clearly recoverable.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM

I outlined a way to avoid this data corruption here: T263301#6487019.