Page MenuHomePhabricator

XMPReader::parse exceptions
Closed, ResolvedPublic

Description

In the past 6 hours, these three are in the top 10 at https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors:

messagehits
XMPReader::parse Exception caught while parsing: No root element found in XML.248
XMPReader::parse : Error reading XMP content: not well-formed (invalid token) (line: 1 column: 2941 byte offset: 2943)245
XMPReader::parse Exception caught while parsing: character data where not expected. (mode 14)214

For example:

content
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">; <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/"; xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"; xmp:CreateDate="2015-11-16T12:37:19" xmp:ModifyDate="2015-11-16T12:37:19" xmp:CreatorTool="Photos 1.2" photoshop:DateCreated="2015-11-16T12:37:19"/> </rdf:RDF> </x:xmpmeta>   
<?xpacket end="w"?>


exception.class		RuntimeException
exception.code		0
exception.file		/srv/mediawiki/php-1.27.0-wmf.6/includes/media/XMP.php:364
exception.message	No root element found in XML.
exception.trace
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/BitmapMetadataHandler.php:173,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/Jpeg.php:101,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/ExifBitmap.php:179,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/MediaHandler.php:438,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/filerepo/file/UnregisteredLocalFile.php:118,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/filerepo/file/UnregisteredLocalFile.php:129,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiQueryImageInfo.php:421,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/upload/UploadBase.php:1901,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiUpload.php:112,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiMain.php:1309,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiMain.php:466,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiMain.php:438,
	/srv/mediawiki/php-1.27.0-wmf.6/api.php:88
level	  	WARNING
message	  	XMPReader::parse Exception caught while parsing: No root element found in XML.

Event Timeline

Krinkle created this task.Nov 17 2015, 12:35 AM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.
Restricted Application added a project: Commons. · View Herald TranscriptNov 17 2015, 12:35 AM
Restricted Application added subscribers: StudiesWorld, Steinsplitter, Matanya, Aklapper. · View Herald Transcript

I think XMP errors would be more appropriate as the info level severity (Or maybe even lower). Also the filename should definitely be included in the log.

Some percentage of files are going to have invalid metadata. If there's a trend of a lot of files erroring, we should investigate to make sure its the file, and not us that are in the wrong, but ultimately, these types of errors should be considered low priority.

zhuyifei1999 moved this task from Incoming to Backlog on the Commons board.Nov 17 2015, 10:06 AM
bd808 added a subscriber: bd808.Dec 11 2015, 3:26 AM

I think XMP errors would be more appropriate as the info level severity (Or maybe even lower).

Easy enough to do. I jacked up the logging level from info to warning while working on T89532 and started logging warning and fatal errors on the xmp channel in production. Prior to that we never logged these at all on the WMF cluster.

Also the filename should definitely be included in the log.

XMP::parse() is called with XML data as a string and no filename. We would need to either change the signature of the method to take a filename for error reporting or switch the return value to be a Status object so the caller could get the error message to deal with itself.

BitmapDataHandler::Jpeg (which is the caller in this case) doesn't even check the status of the current boolean return value. :/

MarkTraceur triaged this task as Normal priority.Dec 21 2015, 9:37 PM
MarkTraceur added a subscriber: MarkTraceur.
MarkTraceur lowered the priority of this task from Normal to Low.Dec 5 2016, 10:12 PM
MarkTraceur moved this task from Untriaged to Triaged on the Multimedia board.

Lowering priority given lack of activity on this report for nearly a year.

Restricted Application added a subscriber: Poyekhali. · View Herald TranscriptDec 5 2016, 10:12 PM
demon raised the priority of this task from Low to High.Nov 30 2017, 8:19 PM
demon added a subscriber: demon.

Raising priority because it's one of the most spammy exceptions as of today.

Krinkle removed a subscriber: Krinkle.Dec 2 2017, 12:55 AM

Change 394826 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[mediawiki/core@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/394826

Change 394827 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[mediawiki/core@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/394827

Change 394826 merged by jenkins-bot:
[mediawiki/core@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/394826

Change 394827 merged by jenkins-bot:
[mediawiki/core@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/394827

Change 431288 had a related patch set uploaded (by Legoktm; owner: Bryan Davis):
[mediawiki/libs/XMPReader@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/431288

Change 431289 had a related patch set uploaded (by Legoktm; owner: Bryan Davis):
[mediawiki/libs/XMPReader@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/431289

Change 431288 merged by jenkins-bot:
[mediawiki/libs/XMPReader@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/431288

Change 431289 merged by jenkins-bot:
[mediawiki/libs/XMPReader@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/431289

The Error reading XMP content error no longer appears in Logstash (past 7 days).

However, the others still persist:

channel:XMP (2,779 warnings in the last 7 days).

  • Warning No root element found – 2,570 times.
    • commons.wikimedia.org 2,158
    • jobrunner.discovery.wmnet 386
    • en.wikipedia.org
  • Warning character data where not expected – 156 times.
  • Warning Expected <rdf:*> but got http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq. – 52 times.
Krinkle lowered the priority of this task from High to Normal.Jul 11 2018, 1:51 AM
Krinkle moved this task from Backlog to Incoming on the Commons board.
Krinkle moved this task from Incoming to Backlog on the Commons board.
Krinkle moved this task from Triaged to Untriaged on the Multimedia board.
TheDJ added a subscriber: TheDJ.Aug 22 2019, 1:14 PM
TheDJ added a subscriber: Krinkle.

Since I'm touching this library for gpano, I might take a look. @Krinkle can you get me a list of some of the filenames throwing these log lines ?

Krinkle added a comment.EditedTue, Sep 10, 4:01 PM

From Logstash, type:mediawiki channel:XMP:

  • 33,298 warnings logged in the last 7 days. Mostly from commonswiki (no surprise there).

A few anonymised samples included below:

wiki: commonswiki
url: /w/api.php
referer: https://commons.wikimedia.org/wiki/Special:UploadWizard
phpversion: hhvm
file: /tmp/localcopy_e8263db90fd1.jpg
content:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.0.0"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/" xmlns:xmp="http://ns.adobe.com/xap/1.0/"></rdf:Description></rdf:RDF></x:xmpmeta><?xpacket end="w"?>

RuntimeException: No root element found in XML.

in /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php:368
Stack trace:
#0 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/BitmapMetadataHandler.php(175): Wikimedia\XMPReader\Reader->parse()
#1 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/JpegHandler.php(105): BitmapMetadataHandler::Jpeg()
#2 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/ExifBitmapHandler.php(177): JpegHandler->getMetadata()
#3 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/MediaHandler.php(405): ExifBitmapHandler->getImageSize()
#4 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/UnregisteredLocalFile.php(123): MediaHandler->getPageDimensions()
#5 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/UnregisteredLocalFile.php(134): UnregisteredLocalFile->cachePageDimensions()
#6 /srv/mediawiki/php-1.34.0-wmf.21/includes/api/ApiQueryImageInfo.php(433): UnregisteredLocalFile->getWidth()
#7 /srv/mediawiki/php-1.34.0-wmf.21/includes/upload/UploadBase.php(2144): ApiQueryImageInfo::getInfo()
commonswiki, api, UploadWizard, hhvm
file: /tmp/localcopy_2b14e1e00d92.jpg
content:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/" photoshop:DateCreated="2019-06-24T21:54:33"/> </rdf:RDF> </x:xmpmeta>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 <?xpacket end="w"?>


RuntimeException: No root element found in XML.

in /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php:368
Stack trace:
#0 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/BitmapMetadataHandler.php(175): Wikimedia\XMPReader\Reader->parse('<?xpacket begin...')
#1 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/JpegHandler.php(105): BitmapMetadataHandler::Jpeg('/tmp/localcopy_...')
#2 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/ExifBitmapHandler.php(177): JpegHandler->getMetadata(Object(UploadStashFile), '/tmp/localcopy_...')
commonswiki, api, hhvm
referer: https://commons.wikimedia.org/wiki/File:Ford_Grand_Torino_(29725737128).jpg
content (truncated):
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:darktable="http://darktable.sf.net/" xmlns:lr="http://ns.adobe.com/lightroom/1.0/" xmp:Rating="2" xmpMM:DerivedFrom="DSCF7446.raf" darktable:xmp_version="2" darktable:raw_params="0" darktable:auto_presets_applied="1" darktable:history_end="7"> <dc:creator> <rdf:Seq> <rdf:li>Thomas Vogt</rdf:li> </rdf:Seq> </dc:creator> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">Ford Grand Torino</rdf:li>…

RuntimeException: Expected <rdf:Bag> but got http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq.

in /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php:914
Stack trace:
#0 /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php(1296): Wikimedia\XMPReader\Reader->startElementModeBag()
#1 (): Wikimedia\XMPReader\Reader->startElement()
#2 /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php(372): xml_parse()
#3 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/BitmapMetadataHandler.php(175): Wikimedia\XMPReader\Reader->parse()
#4 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/JpegHandler.php(105): BitmapMetadataHandler::Jpeg()
#5 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/ExifBitmapHandler.php(177): JpegHandler->getMetadata()
#6 /srv/mediawiki/php-1.34.0-wmf.21/includes/utils/MWFileProps.php(86): ExifBitmapHandler->getImageSize()
#7 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/FileRepo.php(1551): MWFileProps->getPropsFromPath()
#8 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/LocalFile.php(413): FileRepo->getFileProps()
#9 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/LocalFile.php(721): LocalFile->loadFromFile()
#10 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/LocalFile.php(700): LocalFile->upgradeRow()
#11 /srv/mediawiki/php-1.34.0-wmf.21/includes/deferred/MWCallableUpdate.php(38): Closure$LocalFile::maybeUpgradeRow()
#12 /srv/mediawiki/php-1.34.0-wmf.21/includes/deferred/DeferredUpdates.php(382): MWCallableUpdate->doUpdate()
TheDJ added a comment.EditedTue, Sep 10, 9:04 PM

https://commons.wikimedia.org/wiki/File:Ford_Grand_Torino_(29725737128).jpg
RuntimeException: Expected <rdf:Bag> but got http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq.

This is expected as the file is not valid. The dc:subject property has a sequence, but the adobe spec declares it as an unordered list (aka bag)

dc:subject
Unordered array of Text
DCMI definition: The topic of the resource.
DCMI comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the dc:coverage element.
XMP addition: XMP usage is a list of descriptive phrases or keywords that specify the content of the resource.

RuntimeException: No root element found in XML.

I've seen this a number of times, and a pattern starts to become clear.. Sometimes this presents as "Invalid document end" (which you get depends on if you have a multiline or a singleline string as input)

This seems to have to do with bytes AFTER <?xpacket end="w"?>. If i trim() the string before feeding it to the XMP library, the errors do not present. Interestingly enough, this does not seem to be part of the JPEG original data, indicating a problem inside JpegMetadataExtractor::segmentSplitter( $filename ).

It seems there is a trailing 0xFF 0x00 byte, which trips the XML parser.

Change 535951 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/core@master] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535951

TheDJ added a comment.EditedWed, Sep 11, 10:18 PM

so.... turns out we are missing the XMP data from a significant number of files most likely. I suspect that Photoshop actually triggers this bug, so ;)

Change 535951 merged by jenkins-bot:
[mediawiki/core@master] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535951

Change 535977 had a related patch set uploaded (by Reedy; owner: TheDJ):
[mediawiki/core@REL1_33] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535977

Change 535978 had a related patch set uploaded (by Reedy; owner: TheDJ):
[mediawiki/core@REL1_32] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535978

Change 535979 had a related patch set uploaded (by Reedy; owner: TheDJ):
[mediawiki/core@REL1_31] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535979

Change 535979 merged by jenkins-bot:
[mediawiki/core@REL1_31] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535979

Change 535978 merged by jenkins-bot:
[mediawiki/core@REL1_32] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535978

Change 535977 merged by jenkins-bot:
[mediawiki/core@REL1_33] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535977

Jdforrester-WMF closed this task as Resolved.Thu, Sep 12, 11:31 PM
Jdforrester-WMF assigned this task to TheDJ.