Page MenuHomePhabricator

XMPReader::parse exceptions
Closed, ResolvedPublic

Description

In the past 6 hours, these three are in the top 10 at https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-errors:

messagehits
XMPReader::parse Exception caught while parsing: No root element found in XML.248
XMPReader::parse : Error reading XMP content: not well-formed (invalid token) (line: 1 column: 2941 byte offset: 2943)245
XMPReader::parse Exception caught while parsing: character data where not expected. (mode 14)214

For example:

content
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">; <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/"; xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"; xmp:CreateDate="2015-11-16T12:37:19" xmp:ModifyDate="2015-11-16T12:37:19" xmp:CreatorTool="Photos 1.2" photoshop:DateCreated="2015-11-16T12:37:19"/> </rdf:RDF> </x:xmpmeta>   
<?xpacket end="w"?>


exception.class		RuntimeException
exception.code		0
exception.file		/srv/mediawiki/php-1.27.0-wmf.6/includes/media/XMP.php:364
exception.message	No root element found in XML.
exception.trace
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/BitmapMetadataHandler.php:173,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/Jpeg.php:101,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/ExifBitmap.php:179,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/media/MediaHandler.php:438,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/filerepo/file/UnregisteredLocalFile.php:118,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/filerepo/file/UnregisteredLocalFile.php:129,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiQueryImageInfo.php:421,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/upload/UploadBase.php:1901,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiUpload.php:112,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiMain.php:1309,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiMain.php:466,
	/srv/mediawiki/php-1.27.0-wmf.6/includes/api/ApiMain.php:438,
	/srv/mediawiki/php-1.27.0-wmf.6/api.php:88
level	  	WARNING
message	  	XMPReader::parse Exception caught while parsing: No root element found in XML.

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle subscribed.
Restricted Application added subscribers: StudiesWorld, Steinsplitter, Matanya, Aklapper. · View Herald Transcript

I think XMP errors would be more appropriate as the info level severity (Or maybe even lower). Also the filename should definitely be included in the log.

Some percentage of files are going to have invalid metadata. If there's a trend of a lot of files erroring, we should investigate to make sure its the file, and not us that are in the wrong, but ultimately, these types of errors should be considered low priority.

I think XMP errors would be more appropriate as the info level severity (Or maybe even lower).

Easy enough to do. I jacked up the logging level from info to warning while working on T89532 and started logging warning and fatal errors on the xmp channel in production. Prior to that we never logged these at all on the WMF cluster.

Also the filename should definitely be included in the log.

XMP::parse() is called with XML data as a string and no filename. We would need to either change the signature of the method to take a filename for error reporting or switch the return value to be a Status object so the caller could get the error message to deal with itself.

BitmapDataHandler::Jpeg (which is the caller in this case) doesn't even check the status of the current boolean return value. :/

MarkTraceur subscribed.
MarkTraceur lowered the priority of this task from Medium to Low.Dec 5 2016, 10:12 PM
MarkTraceur moved this task from Untriaged to Triaged on the Multimedia board.

Lowering priority given lack of activity on this report for nearly a year.

demon raised the priority of this task from Low to High.Nov 30 2017, 8:19 PM
demon subscribed.

Raising priority because it's one of the most spammy exceptions as of today.

Change 394826 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[mediawiki/core@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/394826

Change 394827 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[mediawiki/core@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/394827

Change 394826 merged by jenkins-bot:
[mediawiki/core@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/394826

Change 394827 merged by jenkins-bot:
[mediawiki/core@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/394827

Change 431288 had a related patch set uploaded (by Legoktm; owner: Bryan Davis):
[mediawiki/libs/XMPReader@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/431288

Change 431289 had a related patch set uploaded (by Legoktm; owner: Bryan Davis):
[mediawiki/libs/XMPReader@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/431289

Change 431288 merged by jenkins-bot:
[mediawiki/libs/XMPReader@master] xmp: Lower severity of XMP parse failure log events

https://gerrit.wikimedia.org/r/431288

Change 431289 merged by jenkins-bot:
[mediawiki/libs/XMPReader@master] XMPReader: Log filename if available

https://gerrit.wikimedia.org/r/431289

The Error reading XMP content error no longer appears in Logstash (past 7 days).

However, the others still persist:

channel:XMP (2,779 warnings in the last 7 days).

  • Warning No root element found – 2,570 times.
    • commons.wikimedia.org 2,158
    • jobrunner.discovery.wmnet 386
    • en.wikipedia.org
  • Warning character data where not expected – 156 times.
  • Warning Expected <rdf:*> but got http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq. – 52 times.
Krinkle lowered the priority of this task from High to Medium.Jul 11 2018, 1:51 AM
Krinkle moved this task from Backlog to Incoming on the Commons board.
Krinkle moved this task from Incoming to Backlog on the Commons board.
Krinkle moved this task from Triaged to Untriaged on the Multimedia board.
TheDJ added a subscriber: Krinkle.

Since I'm touching this library for gpano, I might take a look. @Krinkle can you get me a list of some of the filenames throwing these log lines ?

From Logstash, type:mediawiki channel:XMP:

  • 33,298 warnings logged in the last 7 days. Mostly from commonswiki (no surprise there).

A few anonymised samples included below:

wiki: commonswiki
url: /w/api.php
referer: https://commons.wikimedia.org/wiki/Special:UploadWizard
phpversion: hhvm
file: /tmp/localcopy_e8263db90fd1.jpg
content:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.0.0"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/" xmlns:xmp="http://ns.adobe.com/xap/1.0/"></rdf:Description></rdf:RDF></x:xmpmeta><?xpacket end="w"?>

RuntimeException: No root element found in XML.

in /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php:368
Stack trace:
#0 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/BitmapMetadataHandler.php(175): Wikimedia\XMPReader\Reader->parse()
#1 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/JpegHandler.php(105): BitmapMetadataHandler::Jpeg()
#2 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/ExifBitmapHandler.php(177): JpegHandler->getMetadata()
#3 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/MediaHandler.php(405): ExifBitmapHandler->getImageSize()
#4 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/UnregisteredLocalFile.php(123): MediaHandler->getPageDimensions()
#5 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/UnregisteredLocalFile.php(134): UnregisteredLocalFile->cachePageDimensions()
#6 /srv/mediawiki/php-1.34.0-wmf.21/includes/api/ApiQueryImageInfo.php(433): UnregisteredLocalFile->getWidth()
#7 /srv/mediawiki/php-1.34.0-wmf.21/includes/upload/UploadBase.php(2144): ApiQueryImageInfo::getInfo()
commonswiki, api, UploadWizard, hhvm
file: /tmp/localcopy_2b14e1e00d92.jpg
content:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/" photoshop:DateCreated="2019-06-24T21:54:33"/> </rdf:RDF> </x:xmpmeta>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 <?xpacket end="w"?>


RuntimeException: No root element found in XML.

in /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php:368
Stack trace:
#0 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/BitmapMetadataHandler.php(175): Wikimedia\XMPReader\Reader->parse('<?xpacket begin...')
#1 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/JpegHandler.php(105): BitmapMetadataHandler::Jpeg('/tmp/localcopy_...')
#2 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/ExifBitmapHandler.php(177): JpegHandler->getMetadata(Object(UploadStashFile), '/tmp/localcopy_...')
commonswiki, api, hhvm
referer: https://commons.wikimedia.org/wiki/File:Ford_Grand_Torino_(29725737128).jpg
content (truncated):
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:darktable="http://darktable.sf.net/" xmlns:lr="http://ns.adobe.com/lightroom/1.0/" xmp:Rating="2" xmpMM:DerivedFrom="DSCF7446.raf" darktable:xmp_version="2" darktable:raw_params="0" darktable:auto_presets_applied="1" darktable:history_end="7"> <dc:creator> <rdf:Seq> <rdf:li>Thomas Vogt</rdf:li> </rdf:Seq> </dc:creator> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">Ford Grand Torino</rdf:li>…

RuntimeException: Expected <rdf:Bag> but got http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq.

in /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php:914
Stack trace:
#0 /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php(1296): Wikimedia\XMPReader\Reader->startElementModeBag()
#1 (): Wikimedia\XMPReader\Reader->startElement()
#2 /srv/mediawiki/php-1.34.0-wmf.21/vendor/wikimedia/xmp-reader/src/Reader.php(372): xml_parse()
#3 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/BitmapMetadataHandler.php(175): Wikimedia\XMPReader\Reader->parse()
#4 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/JpegHandler.php(105): BitmapMetadataHandler::Jpeg()
#5 /srv/mediawiki/php-1.34.0-wmf.21/includes/media/ExifBitmapHandler.php(177): JpegHandler->getMetadata()
#6 /srv/mediawiki/php-1.34.0-wmf.21/includes/utils/MWFileProps.php(86): ExifBitmapHandler->getImageSize()
#7 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/FileRepo.php(1551): MWFileProps->getPropsFromPath()
#8 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/LocalFile.php(413): FileRepo->getFileProps()
#9 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/LocalFile.php(721): LocalFile->loadFromFile()
#10 /srv/mediawiki/php-1.34.0-wmf.21/includes/filerepo/file/LocalFile.php(700): LocalFile->upgradeRow()
#11 /srv/mediawiki/php-1.34.0-wmf.21/includes/deferred/MWCallableUpdate.php(38): Closure$LocalFile::maybeUpgradeRow()
#12 /srv/mediawiki/php-1.34.0-wmf.21/includes/deferred/DeferredUpdates.php(382): MWCallableUpdate->doUpdate()

https://commons.wikimedia.org/wiki/File:Ford_Grand_Torino_(29725737128).jpg
RuntimeException: Expected <rdf:Bag> but got http://www.w3.org/1999/02/22-rdf-syntax-ns# Seq.

This is expected as the file is not valid. The dc:subject property has a sequence, but the adobe spec declares it as an unordered list (aka bag)

dc:subject
Unordered array of Text
DCMI definition: The topic of the resource.
DCMI comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the dc:coverage element.
XMP addition: XMP usage is a list of descriptive phrases or keywords that specify the content of the resource.

RuntimeException: No root element found in XML.

I've seen this a number of times, and a pattern starts to become clear.. Sometimes this presents as "Invalid document end" (which you get depends on if you have a multiline or a singleline string as input)

This seems to have to do with bytes AFTER <?xpacket end="w"?>. If i trim() the string before feeding it to the XMP library, the errors do not present. Interestingly enough, this does not seem to be part of the JPEG original data, indicating a problem inside JpegMetadataExtractor::segmentSplitter( $filename ).

It seems there is a trailing 0xFF 0x00 byte, which trips the XML parser.

Change 535951 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/core@master] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535951

so.... turns out we are missing the XMP data from a significant number of files most likely. I suspect that Photoshop actually triggers this bug, so ;)

Change 535951 merged by jenkins-bot:
[mediawiki/core@master] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535951

Change 535977 had a related patch set uploaded (by Reedy; owner: TheDJ):
[mediawiki/core@REL1_33] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535977

Change 535978 had a related patch set uploaded (by Reedy; owner: TheDJ):
[mediawiki/core@REL1_32] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535978

Change 535979 had a related patch set uploaded (by Reedy; owner: TheDJ):
[mediawiki/core@REL1_31] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535979

Change 535979 merged by jenkins-bot:
[mediawiki/core@REL1_31] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535979

Change 535978 merged by jenkins-bot:
[mediawiki/core@REL1_32] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535978

Change 535977 merged by jenkins-bot:
[mediawiki/core@REL1_33] Fix XMP parser errors due to trailing nullchar

https://gerrit.wikimedia.org/r/535977

I'm sorry if this is a stupid question, but: for all of my photographs uploaded to commons on 21 September 2019 or later (for example https://commons.wikimedia.org/wiki/File:Argenvors%C3%A4%C3%9F_von_S%C3%BCden_2.jpg) a lot less EXIF data is recognized than for my photographs uploaded on 16 September or before (for example https://commons.wikimedia.org/wiki/File:Wiesbadener_H%C3%BCtte_von_Nordwesten.jpg). Most notably, the GPS coordinates are not recognized any more. Could it be that it's a bug introduced with this change?

@Reinhard_Mueller i tested your files locally, and there i get the same results with the code from early september as now..

I wonder... if this is an HHVM vs PHP exif capabilities issue.. I create T240509: PHP7 exif_read_data doesn't read all the EXIF key/values that HHVM version did

Quick debugging reveals that all exif elements after MakerNote (which has a binary data value) are missing.

I disabled the "suppresswarnings" around exif_read_data() php call and the console then gives me:

PHP Warning: exif_read_data(Argenvorsäß_von_Süden_2.jpg): corrupt EXIF header: maximum directory nesting level reached in /Users/hartman/Development/wikimedia-git/mediawiki-core/includes/media/Exif.php on line 296

All this strengthens my HHVM vs PHP suspicions. Possibly https://bugs.php.net/bug.php?id=75424 ?