Page MenuHomePhabricator

CommonsMetadata bad wfTimestamp call
Open, Unbreak Now!PublicBUG REPORT

Description

Showing up in beta logstash. The following code is generating problems:

/**
 * Normalizes the metadata to wfTimestamp()'s TS_DB format
 * @param array &$metadata
 */
protected function normalizeMetadataTimestamps( array &$metadata ) {
	$fieldsToNormalize = [ 'DateTime', 'DateTimeOriginal' ];
	foreach ( $fieldsToNormalize as $field ) {
		if ( isset( $metadata[$field] ) && isset( $metadata[$field]['value'] ) ) {
			$parsedTs = wfTimestamp( TS_DB, $metadata[$field]['value'] );
			if ( $parsedTs ) {
				$metadata[$field]['value'] = $parsedTs;
			}
		}
	}
}

$metadata[$field]['value'] is sometimes an array, but wfTimestamp's second parameter shouldn't be an array (should be string|int|float|bool|DateTime based on documentation of ConvertibleTimestamp::convert, but wfTimestamp is just documented as accepting mixed)

This results in at least three different possible errors in logstash

PHP Warning: preg_match() expects parameter 2 to be string, array given

#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-master/vendor/wikimedia/timestamp/src/ConvertibleTimestamp.php(188): preg_match(string, array, NULL)
#2 /srv/mediawiki/php-master/vendor/wikimedia/timestamp/src/ConvertibleTimestamp.php(165): Wikimedia\Timestamp\ConvertibleTimestamp->setTimestamp(array)
#3 /srv/mediawiki/php-master/vendor/wikimedia/timestamp/src/ConvertibleTimestamp.php(275): Wikimedia\Timestamp\ConvertibleTimestamp->__construct(array)
#4 /srv/mediawiki/php-master/includes/GlobalFunctions.php(1816): Wikimedia\Timestamp\ConvertibleTimestamp::convert(integer, array)
#5 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/DataCollector.php(459): wfTimestamp(integer, array)
#6 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/DataCollector.php(96): CommonsMetadata\DataCollector->normalizeMetadataTimestamps(array)
#7 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/HookHandler.php(75): CommonsMetadata\DataCollector->collect(array, ForeignAPIFile)
#8 /srv/mediawiki/php-master/includes/HookContainer/HookContainer.php(333): CommonsMetadata\HookHandler::onGetExtendedMetadata(array, ForeignAPIFile, RequestContext, boolean, integer)
#9 /srv/mediawiki/php-master/includes/HookContainer/HookContainer.php(140): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#10 /srv/mediawiki/php-master/includes/HookContainer/HookRunner.php(1867): MediaWiki\HookContainer\HookContainer->run(string, array)
#11 /srv/mediawiki/php-master/includes/media/FormatMetadata.php(1814): MediaWiki\HookContainer\HookRunner->onGetExtendedMetadata(array, ForeignAPIFile, RequestContext, boolean, integer)
#12 /srv/mediawiki/php-master/includes/media/FormatMetadata.php(1731): FormatMetadata->getExtendedMetadataFromHook(ForeignAPIFile, array, integer)
#13 /srv/mediawiki/php-master/includes/api/ApiQueryImageInfo.php(584): FormatMetadata->fetchExtendedMetadata(ForeignAPIFile)
#14 /srv/mediawiki/php-master/includes/api/ApiQueryImageInfo.php(184): ApiQueryImageInfo::getInfo(ForeignAPIFile, array, ApiResult, NULL, array)
#15 /srv/mediawiki/php-master/includes/api/ApiQuery.php(263): ApiQueryImageInfo->execute()
#16 /srv/mediawiki/php-master/includes/api/ApiMain.php(1601): ApiQuery->execute()
#17 /srv/mediawiki/php-master/includes/api/ApiMain.php(581): ApiMain->executeAction()
#18 /srv/mediawiki/php-master/includes/api/ApiMain.php(552): ApiMain->executeActionWithErrorHandling()
#19 /srv/mediawiki/php-master/api.php(90): ApiMain->execute()
#20 /srv/mediawiki/php-master/api.php(45): wfApiMain()
#21 /srv/mediawiki/w/api.php(3): require(string)
#22 {main}
PHP Notice: Array to string conversion

#0 /srv/mediawiki/php-master/vendor/wikimedia/timestamp/src/ConvertibleTimestamp.php(240): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-master/vendor/wikimedia/timestamp/src/ConvertibleTimestamp.php(165): Wikimedia\Timestamp\ConvertibleTimestamp->setTimestamp(array)
#2 /srv/mediawiki/php-master/vendor/wikimedia/timestamp/src/ConvertibleTimestamp.php(275): Wikimedia\Timestamp\ConvertibleTimestamp->__construct(array)
#3 /srv/mediawiki/php-master/includes/GlobalFunctions.php(1816): Wikimedia\Timestamp\ConvertibleTimestamp::convert(integer, array)
#4 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/DataCollector.php(459): wfTimestamp(integer, array)
#5 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/DataCollector.php(96): CommonsMetadata\DataCollector->normalizeMetadataTimestamps(array)
#6 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/HookHandler.php(75): CommonsMetadata\DataCollector->collect(array, ForeignAPIFile)
#7 /srv/mediawiki/php-master/includes/HookContainer/HookContainer.php(333): CommonsMetadata\HookHandler::onGetExtendedMetadata(array, ForeignAPIFile, RequestContext, boolean, integer)
#8 /srv/mediawiki/php-master/includes/HookContainer/HookContainer.php(140): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#9 /srv/mediawiki/php-master/includes/HookContainer/HookRunner.php(1867): MediaWiki\HookContainer\HookContainer->run(string, array)
#10 /srv/mediawiki/php-master/includes/media/FormatMetadata.php(1814): MediaWiki\HookContainer\HookRunner->onGetExtendedMetadata(array, ForeignAPIFile, RequestContext, boolean, integer)
#11 /srv/mediawiki/php-master/includes/media/FormatMetadata.php(1731): FormatMetadata->getExtendedMetadataFromHook(ForeignAPIFile, array, integer)
#12 /srv/mediawiki/php-master/includes/api/ApiQueryImageInfo.php(584): FormatMetadata->fetchExtendedMetadata(ForeignAPIFile)
#13 /srv/mediawiki/php-master/includes/api/ApiQueryImageInfo.php(184): ApiQueryImageInfo::getInfo(ForeignAPIFile, array, ApiResult, NULL, array)
#14 /srv/mediawiki/php-master/includes/api/ApiQuery.php(263): ApiQueryImageInfo->execute()
#15 /srv/mediawiki/php-master/includes/api/ApiMain.php(1601): ApiQuery->execute()
#16 /srv/mediawiki/php-master/includes/api/ApiMain.php(581): ApiMain->executeAction()
#17 /srv/mediawiki/php-master/includes/api/ApiMain.php(552): ApiMain->executeActionWithErrorHandling()
#18 /srv/mediawiki/php-master/api.php(90): ApiMain->execute()
#19 /srv/mediawiki/php-master/api.php(45): wfApiMain()
#20 /srv/mediawiki/w/api.php(3): require(string)
#21 {main}
PHP Notice: Array to string conversion

#0 /srv/mediawiki/php-master/includes/GlobalFunctions.php(1818): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/DataCollector.php(459): wfTimestamp(integer, array)
#2 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/DataCollector.php(96): CommonsMetadata\DataCollector->normalizeMetadataTimestamps(array)
#3 /srv/mediawiki/php-master/extensions/CommonsMetadata/src/HookHandler.php(75): CommonsMetadata\DataCollector->collect(array, ForeignAPIFile)
#4 /srv/mediawiki/php-master/includes/HookContainer/HookContainer.php(333): CommonsMetadata\HookHandler::onGetExtendedMetadata(array, ForeignAPIFile, RequestContext, boolean, integer)
#5 /srv/mediawiki/php-master/includes/HookContainer/HookContainer.php(140): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#6 /srv/mediawiki/php-master/includes/HookContainer/HookRunner.php(1867): MediaWiki\HookContainer\HookContainer->run(string, array)
#7 /srv/mediawiki/php-master/includes/media/FormatMetadata.php(1814): MediaWiki\HookContainer\HookRunner->onGetExtendedMetadata(array, ForeignAPIFile, RequestContext, boolean, integer)
#8 /srv/mediawiki/php-master/includes/media/FormatMetadata.php(1731): FormatMetadata->getExtendedMetadataFromHook(ForeignAPIFile, array, integer)
#9 /srv/mediawiki/php-master/includes/api/ApiQueryImageInfo.php(584): FormatMetadata->fetchExtendedMetadata(ForeignAPIFile)
#10 /srv/mediawiki/php-master/includes/api/ApiQueryImageInfo.php(184): ApiQueryImageInfo::getInfo(ForeignAPIFile, array, ApiResult, NULL, array)
#11 /srv/mediawiki/php-master/includes/api/ApiQuery.php(263): ApiQueryImageInfo->execute()
#12 /srv/mediawiki/php-master/includes/api/ApiMain.php(1601): ApiQuery->execute()
#13 /srv/mediawiki/php-master/includes/api/ApiMain.php(581): ApiMain->executeAction()
#14 /srv/mediawiki/php-master/includes/api/ApiMain.php(552): ApiMain->executeActionWithErrorHandling()
#15 /srv/mediawiki/php-master/api.php(90): ApiMain->execute()
#16 /srv/mediawiki/php-master/api.php(45): wfApiMain()
#17 /srv/mediawiki/w/api.php(3): require(string)
#18 {main}

Appears to have started earlier today, though I don't see any recent changes to the extension that might have caused it
Impact so far: 14 errors https://logstash-beta.wmflabs.org/goto/64cdab090a58a42a9331d8d04e68aae6

Event Timeline

DannyS712 created this task.Mon, Nov 2, 3:30 PM
Restricted Application added a project: User-DannyS712. · View Herald TranscriptMon, Nov 2, 3:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 moved this task from Unsorted to Reports on the User-DannyS712 board.Mon, Nov 2, 3:30 PM
DannyS712 changed the subtype of this task from "Task" to "Bug Report".
Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptMon, Nov 2, 6:18 PM
Krinkle triaged this task as High priority.Mon, Nov 2, 6:18 PM
thcipriani raised the priority of this task from High to Unbreak Now!.Mon, Nov 2, 6:48 PM
TJH2018 added a subscriber: TJH2018.Mon, Nov 2, 7:07 PM

Something somewhere was touched because code doesn't just break without someone touching it somewhere. Has the $metadata[$field]['value'] been editied recently

brennen added a subscriber: brennen.Wed, Nov 4, 4:02 PM

@TJH2018: Please avoid posting commonplaces which don't help tracking down the problem. Thanks a lot!

cscott added a subscriber: cscott.EditedWed, Nov 4, 11:29 PM

I might be the guilty party: 916a1777ef8796605aa7eab20c97694aee99e553 added a callback mechanism for formatted data which uses an array marker. Perhaps that's somehow sneaking out of the FormatMetadata class? But I can't see anything obviously at fault here.

The only thing that sets $metadata['DateTime']['value'] that I see is FormatMetadata::getExtendedMetadataFromFile and that hasn't been touched since 2016.

Other possible culprits are: 098bcb38e83bfb4c67e37e5149787ebb8d7c77f2 and fe252d715a3b3d231c82bd8ac251a30857324ae9

All the logstash entries appear to be API requests for the same .jpg file, which probably says the metadata-related changes to PdfHandler are innocent.

cscott added a comment.Thu, Nov 5, 3:07 AM

I've looked into this. If you install CommonsMetadata locally, and set up some config:

$wgMainCacheType    = CACHE_NONE;
$wgUseInstantCommons = true;
$wgCommonsMetadataForceRecalculate = true;
wfLoadExtension('CommonsMetadata');

then you can trigger this bug via a url similar to:
http://localhost/~cananian/mediawiki/api.php?action=query&prop=imageinfo&format=json&iiprop=extmetadata&iilimit=10&titles=File%3AIndischer_Maler_des_6._Jahrhunderts_001.jpg

All 14 warnings are generated for the same single API request, and they only occur the first time this particular API is called to the file; subsequent hits to this URL will hit the cache. (That's why $wgMainCacheType is in the LocalSettings.php fragment above.)

If you dump the $metadata array in normalizeMetadataTimestamps just before the foreach, you get:

[error] array (
  'DateTime' => 
  array (
    'value' => '2012-03-14 19:42:42',
    'source' => 'mediawiki-metadata',
    'hidden' => '',
  ),
[...]
  'DateTimeOriginal' => 
  array (
    'value' => 
    array (
      'de' => '6. Jh.',
      'en' => '6th century',
      '_type' => 'lang',
    ),
    'source' => 'commons-desc-page',
  ),
[...]
)

So the issue is the _lang pseudotag (not the _formatted pseudotag I added in 916a1777ef8796605aa7eab20c97694aee99e553). I think this bug has been there for a long time, we just happened to hit an image triggering it for the first time while we were watching logs (the commons metadata for that image has not recently changed). I can probably patch it now that I know what's going on, but I bet this is not actually a train blocker.

Change 639397 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/CommonsMetadata@master] Don't pass a multilanguage array to wfTimestamp

https://gerrit.wikimedia.org/r/639397

cscott claimed this task.Thu, Nov 5, 2:34 PM

Change 639536 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/extensions/CommonsMetadata@master] Try harder to parse multilanguage DateTime fields

https://gerrit.wikimedia.org/r/639536

I think this bug has been there for a long time, we just happened to hit an image triggering it for the first time while we were watching logs (the commons metadata for that image has not recently changed). I can probably patch it now that I know what's going on, but I bet this is not actually a train blocker.

Thanks for the investigation here. I'm going to go ahead and remove this as a train blocker.

Change 639397 merged by jenkins-bot:
[mediawiki/extensions/CommonsMetadata@master] Don't pass a multilanguage array to wfTimestamp

https://gerrit.wikimedia.org/r/639397