Page MenuHomePhabricator

Older version with broken svg content resulted in undecipherable error message during Import
Closed, ResolvedPublic5 Estimated Story Points

Description

I tried to move en:File:Arbeiderpartiet 2008 results.svg to Commons using the FileExporter/FileImporter beta feature. The preview page opened fine, but when I tried to import the file it generated an error:

Internal error

[XGwgcApAMFAAAEfEWo0AAACV] 2019-02-19 15:27:46: Fatal exception of type "RuntimeException"

Steps to reproduce:

  1. Enable the FileExporter beta feature on enwiki
  2. Navigate to https://en.wikipedia.org/wiki/File:Arbeiderpartiet_2008_results.svg
  3. Click the "Export to Wikimedia Commons" button (sometimes in the More dropdown in Vector)
  4. Click "Import" at the bottom of the page that opens (Special:ImportFile)
  5. Observe the error

I have not been able to reproduce this error outside of this file, but it occurred multiple times with this file.

Event Timeline

Restricted Application added a project: archived--TCB-Team. · View Herald TranscriptFeb 19 2019, 3:37 PM
message
Failed to validate operations.
trace
#0 /srv/mediawiki/php-1.33.0-wmf.17/extensions/FileImporter/src/Services/Importer.php(171): FileImporter\Services\Importer->validateImportOperations(FileImporter\Data\ImportOperations)
#1 /srv/mediawiki/php-1.33.0-wmf.17/extensions/FileImporter/src/Services/Importer.php(119): FileImporter\Services\Importer->importInternal(User, FileImporter\Data\ImportPlan)
#2 /srv/mediawiki/php-1.33.0-wmf.17/extensions/FileImporter/src/SpecialImportFile.php(291): FileImporter\Services\Importer->import(User, FileImporter\Data\ImportPlan)
#3 /srv/mediawiki/php-1.33.0-wmf.17/extensions/FileImporter/src/SpecialImportFile.php(181): FileImporter\SpecialImportFile->doImport(FileImporter\Data\ImportPlan)
#4 /srv/mediawiki/php-1.33.0-wmf.17/includes/specialpage/SpecialPage.php(569): FileImporter\SpecialImportFile->execute(NULL)
#5 /srv/mediawiki/php-1.33.0-wmf.17/includes/specialpage/SpecialPageFactory.php(558): SpecialPage->run(NULL)
#6 /srv/mediawiki/php-1.33.0-wmf.17/includes/MediaWiki.php(288): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#7 /srv/mediawiki/php-1.33.0-wmf.17/includes/MediaWiki.php(867): MediaWiki->performRequest()
#8 /srv/mediawiki/php-1.33.0-wmf.17/includes/MediaWiki.php(517): MediaWiki->main()
#9 /srv/mediawiki/php-1.33.0-wmf.17/index.php(42): MediaWiki->run()
#10 /srv/mediawiki/w/index.php(3): require(string)
#11 {main}
Pikne added a subscriber: Pikne.Mar 12 2019, 10:37 AM

This SVG file seems to be invalid. My browser displays its XML tree instead of image, and validator.w3.org is unable to check it.

Possibly it relates to T218083, in case there is some sort of discrepancy between file extension and the actual content.

Pikne added a comment.EditedMar 19 2019, 8:10 PM

I now ran into another SVG that gives this error:
https://et.wikipedia.org/wiki/Fail:Gruusia_t%C3%A4hestik.svg
W3C validator doesn't like the version of this SVG, but I've imported several other SVG-s of the same version successfully.

Edit: I now notice that older versions of this SVG are invalid. This and cases where older versions are with non-matching file extension (issue mentioned above) make me wonder whether it would be an option to run that kind of checks only on the latest version?

Lea_WMDE renamed this task from Importing File:Arbeiderpartiet 2008 results.svg to Commons results in an error to Older version with non-matching file ending resulted in undecipherable error message during Import.May 15 2019, 12:24 PM
Pikne added a comment.May 28 2019, 7:20 AM

File name endings of both SVG files mentioned in this task seem to be correct and intended, but there is apparently something wrong with file content, and both the latest version (first SVG has one version) and older versions are checked. Also, it doesen't seem to be related to abuse filters.

Lea_WMDE renamed this task from Older version with non-matching file ending resulted in undecipherable error message during Import to Older version with broken svg content resulted in undecipherable error message during Import.Jun 11 2019, 2:52 PM

Now that more specific error messages are exposed to users, it says "Cannot upload SVG files that contain a non-standard DTD declaration.". Currently there doesn't seem to be much else to do with such files than: 1) upload fixed version to source wiki, 2) delete the file, 3) restore only the valid file version, 4) import. Which feels counter-productive as file history will be lost.

I'm still wondering if this check and possibly some other checks (like the one for non-matching file extension mentioned above) can be skipped for either old file versions or maybe all versions. This defective content is there in Wikimedia environment already anyway. So I would say not much of new harm would be done. Individual files just need to be fixed and it'd be nice if file history can be preserved properly. If certain checks were run on only the latest version then this would hopefully ensure that problems were fixed before importing and file would need no further attention on Commons. Would it be technically possible or otherwise an option?

Lena_WMDE set the point value for this task to 5.

From today's discussion: We realized this message (upload-scripted-dtd) is one of many that are about protecting users from malicious files that, for example, contain scripts or ping external domains. This rules out one of the options we discussed: We should never skip this security check, not even for older revisions of a file.

For now, we decided to entirely block imports that fail any of these security checks, even if it's just an older revision of a file. But we want to improve the error message.

By the way: the exact same argument applies to AbuseFilter checks (see T223288). We don't want to import old revisions that may be insecure, or contain stuff that might be illegal. But it does not apply to files marked with an OTRS ticket (see T224802) as the purpose of an OTRS ticket is not to mark files that are potentially bad, but files that are known to be good.

Later, we might re-visit all of this and see if it makes sense to remove bad revisions from a file's history during import. But as this conflicts with the original purpose of this product, we are not going to make this decision before we have more data.

Andrew-WMDE moved this task from Sprint Backlog to Doing on the WMDE-QWERTY-Sprint-2020-05-27 board.

I believe this "DTD" error is not the only one that can occur, and we should fetch all of them. Suggestion for a generic error message: "This file or an older revision of this file contains elements that can't be accepted for security reasons, e.g. scripts or URLs loading resources from external domains."

I like @thiemowmde 's suggestion. I'm not sure it gives a less technical user enough information to fix the issue though. Would it also list a specific error number or something to help them find the source of the problem?

@ecohen how about something along the lines of:

This file or an older revision of this file contains elements that can't be accepted for security reasons: "Cannot upload SVG files that contain a non-standard DTD declaration."

So we would include the original error to make it easier for more technical users to remedy the issue while still providing a message which is suitable for less technical users.

Change 602307 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[mediawiki/extensions/FileImporter@master] Improve error messages for files which fail security checks

https://gerrit.wikimedia.org/r/602307

Thanks @Andrew-WMDE for the idea. I think this works well. I reworded slightly below for clarity and to have the different errors formatted similarly.

This file (or an older revision of this file) contains elements that cannot be accepted for security reasons: {{specific error message from below as related to their specific file/issue}}.

  • the SVG file contains a non-standard DTD declaration.
  • unsafe CSS was found in the style element of the uploaded SVG file.
  • the ZIP file contains a Java .class file.
  • the file contains a virus.

Change 602307 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Improve error messages for files which fail security checks

https://gerrit.wikimedia.org/r/602307

Pikne added a comment.Jun 5 2020, 7:56 AM

So files with invalid versions are expected to be imported manually without preserving file history, even if original author already uploaded fixed version?

Change 602610 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[mediawiki/extensions/FileImporter@master] Update message for files that fail security checks

https://gerrit.wikimedia.org/r/602610

Change 602610 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Update message for files that fail security checks

https://gerrit.wikimedia.org/r/602610

So files with invalid versions are expected to be imported manually without preserving file history, even if original author already uploaded fixed version?

@Pikne, for now, yes. We can't support importing files that are known to have security issues, even if it's just an old file revision. While it might be possible to skip affected file revisions, doing so would be in conflict with what the FileImporter extension is meant to do: to replicate the original history as close as possible. Simply skipping something would result in a different history.

For the same reason we currently don't support importing files that have a hidden file revision.

A possible workaround is to actually "revision delete" the problematic file revisions before triggering the import. I haven't tested this but believe it should work.

We are tracking all of this, count how often users run into any of these issues, and will consider further changes later.

Pikne added a comment.EditedJun 8 2020, 2:44 PM

A possible workaround is to actually "revision delete" the problematic file revisions before triggering the import. I haven't tested this but believe it should work.

Workaround which I've used for a few times when some old version for whatever reason blocks the import is to delete the entire file, then do partial restore without problematic file versions, and then import. I now imported example file "Gruusia tähestik.svg" (mentioned above) this way, too. Revision delete by means of Special:RevisionDelete doesn't work as that results in a hidden file version.

WMDE-Fisch closed this task as Resolved.Jun 9 2020, 11:54 AM
WMDE-Fisch moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2020-05-27 board.