Page MenuHomePhabricator

Importing https://commons.wikimedia.org/wiki/File:Labs_Tools_topology.png using FileImporter fails "Failed to commit operations."
Closed, ResolvedPublic8 Estimated Story Points

Description

@Andrew-WMDE came across this some time ago and I don't think we filed a ticket back then.

Image: https://commons.wikimedia.org/wiki/File:Labs_Tools_topology.png

[20fd00bb52c494c4db382be6] /index.php?title=Special:ImportFile RuntimeException from line 119 of /var/www/mediawiki/extensions/FileImporter/src/Services/Importer.php: Failed to commit operations.

Backtrace:

#0 /var/www/mediawiki/extensions/FileImporter/src/SpecialImportFile.php(226): FileImporter\Services\Importer->import(User, FileImporter\Data\ImportPlan)
#1 /var/www/mediawiki/extensions/FileImporter/src/SpecialImportFile.php(135): FileImporter\SpecialImportFile->doImport(FileImporter\Data\ImportPlan)
#2 /var/www/mediawiki/includes/specialpage/SpecialPage.php(522): FileImporter\SpecialImportFile->execute(NULL)
#3 /var/www/mediawiki/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(NULL)
#4 /var/www/mediawiki/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)
#5 /var/www/mediawiki/includes/MediaWiki.php(851): MediaWiki->performRequest()
#6 /var/www/mediawiki/includes/MediaWiki.php(523): MediaWiki->main()
#7 /var/www/mediawiki/index.php(43): MediaWiki->run()
#8 {main}

More problematic photos with a similar error:
https://commons.wikimedia.org/wiki/File:A_Brouhot_car_in_Paris,_1910.jpg
https://commons.wikimedia.org/wiki/File:Joan_of_Arc_miniature_graded.jpg

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

From andrew when first spotted:

And the debug logs shows:

image.png (293×1 px, 233 KB)

This behavior would make sense if there were left over files from a previous failed import but I explicitly used ./down (docker system) and cleared the mediawiki/images folder.

Weird enough when I try to upload the file again it fails with a different file name conflict:

image.png (292×1 px, 231 KB)

So, it looks like the issue is coming from FileOp line 390 in core,

$status->fatal( 'backend-fail-alreadyexists', $this->params['dst'] );

This happens on a totally fresh install / totally clean file backend, so need to do some more investigation / stepping through this!

Looked a bit into it and it always seems to fail when trying to import
https://upload.wikimedia.org/wikipedia/commons/archive/8/8c/20151121185130%21Labs_Tools_topology.png
"Update for current state of tool labs"

It looks like this is still happening, tested on https://test.wikimedia.beta.wmflabs.org/wiki/Special:ImportFile

[WoMEcwpEE4AAADivxrAAAAAQ] /wiki/Special:ImportFile RuntimeException from line 109 of /srv/mediawiki/php-master/extensions/FileImporter/src/Services/Importer.php: Failed to commit operations.

Backtrace:

#0 /srv/mediawiki/php-master/extensions/FileImporter/src/SpecialImportFile.php(238): FileImporter\Services\Importer->import(User, FileImporter\Data\ImportPlan)
#1 /srv/mediawiki/php-master/extensions/FileImporter/src/SpecialImportFile.php(144): FileImporter\SpecialImportFile->doImport(FileImporter\Data\ImportPlan)
#2 /srv/mediawiki/php-master/includes/specialpage/SpecialPage.php(522): FileImporter\SpecialImportFile->execute(NULL)
#3 /srv/mediawiki/php-master/includes/specialpage/SpecialPageFactory.php(579): SpecialPage->run(NULL)
#4 /srv/mediawiki/php-master/includes/MediaWiki.php(288): SpecialPageFactory::executePath(Title, RequestContext)
#5 /srv/mediawiki/php-master/includes/MediaWiki.php(861): MediaWiki->performRequest()
#6 /srv/mediawiki/php-master/includes/MediaWiki.php(524): MediaWiki->main()
#7 /srv/mediawiki/php-master/index.php(42): MediaWiki->run()
#8 /srv/mediawiki/w/index.php(3): include(string)
#9 {main}
Lea_WMDE triaged this task as Medium priority.Feb 27 2018, 3:33 PM
Lea_WMDE set the point value for this task to 8.

One thing I ran into the last time looking into this is

$archiveName = wfTimestamp( TS_MW ) . '!' . $this->getName();

in LocalFile::publishTo() I am not sure is this could cause problems when importing the archived files very quickly.

https://gerrit.wikimedia.org/g/mediawiki/core/+/master/includes/filerepo/file/LocalFile.php#1861

I think this is it. When I look at an example file like https://commons.wikimedia.org/wiki/File:Test.svg, the versions are named like this:

That's down to a second, without a random seed to avoid conflicts when two actions happen the same second. The LocalFile class contains two places where a filename like this is build: One actively tries to increment the timestamp to avoid conflicts. One doesn't, but simply uses the current timestamp:

function publishTo(  ) {
	
	$archiveName = wfTimestamp( TS_MW ) . '!' . $this->getName();
	$archiveRel = 'archive/' . $this->getHashPath() . $archiveName;

There is also one line of code in ImportableUploadRevisionImporter::import that builds such an archive file name.

Note the timestamps in the filenames are not the timestamps when the file revision was originally uploaded! An example can be seen at https://commons.wikimedia.org/wiki/File:Joan_of_Arc_miniature_graded.jpg. I guess the archive names are sometimes updated when an archived file revision is restored, for example.

I was able to prove my idea locally: With the file https://commons.wikimedia.org/wiki/File:Joan_of_Arc_miniature_graded.jpg I can reproduce the exception. The moment I added a sleep( 1 ) in FileRevisionFromRemoteUrl::commit the import succeeded.

Next issue: The history of file versions is totally messed up in my tests. All file versions point to the same file name in the archive. Can anybody confirm this?

Change 426959 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/core@master] Fix bad archive file names in ImportableUploadRevisionImporter

https://gerrit.wikimedia.org/r/426959

Change 426960 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] [DNM] Experiment with sleep( 1 ) to avoid archive clash

https://gerrit.wikimedia.org/r/426960

Change 426959 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/core@master] Fix bad archive file names in ImportableUploadRevisionImporter

https://gerrit.wikimedia.org/r/426959

That patch works pretty damn well for me here :-)! The file can be imported with no errors and looks fine .... so maybe that's it?

Change 426959 merged by jenkins-bot:
[mediawiki/core@master] Fix bad archive file names in ImportableUploadRevisionImporter

https://gerrit.wikimedia.org/r/426959

Change 427438 had a related patch set uploaded (by Krinkle; owner: Thiemo Kreuz (WMDE)):
[mediawiki/core@REL1_31] Fix bad archive file names in ImportableUploadRevisionImporter

https://gerrit.wikimedia.org/r/427438

Change 427438 merged by jenkins-bot:
[mediawiki/core@REL1_31] Fix bad archive file names in ImportableUploadRevisionImporter

https://gerrit.wikimedia.org/r/427438

@demon It is done but still in an ongoing sprint. So we will close it on Wednesday :)

Change 426960 abandoned by Thiemo Kreuz (WMDE):
[DNM] Experiment with sleep( 1 ) to avoid archive clash

https://gerrit.wikimedia.org/r/426960