Page MenuHomePhabricator

Wikidata constraint doesn't recognise Commons creator link
Open, Needs TriagePublic

Description

The creator page on Commons https://commons.wikimedia.org/wiki/Creator:Louis_Huard_(1813-1874) is connected to the Wikidata item https://www.wikidata.org/wiki/Q21555912
This item has the property P1472 Commons creator page https://www.wikidata.org/wiki/Property:P1472 linking to the correct page. However, the constraint Commons link constraint https://www.wikidata.org/wiki/Property:P1472#P1472$51803ECB-1D7F-46CD-B21C-E66E2C5F076F doesn't recognise the link and displays the message "Commons link should exist."

This happens on some Items only (see also https://www.wikidata.org/wiki/Q28837128 )

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 11 2019, 8:32 AM

I'm pretty sure there is another ticket for this but I couldn't find it, feel free to merge if you have a clue.

@Lucas_Werkmeister_WMDE could you check if something is wrong with the constraint?

Well, we’ve certainly had issues with this constraint before, such as:

As far as I can tell, it’s currently ill-defined: it’s unclear whether the namespace parameter means “the value must include this namespace, and a page with that title must exist” or “a page with this title (no namespace) and the namespace from the parameter must exist”. The “geographic shape” and “tabular data” data types require the former interpretation, I believe (since the “Data:” or namespace is part of the value), whereas “Commons Creator page” and also “Commons category” require the latter.

It’s notable that “Commons category” doesn’t suffer from this bug, though. I think this is because it’s not specially handled in this method of CommonsLinkChecker:

	private function getCommonsNamespace( $namespace ) {
		switch ( $namespace ) {
			case '':
				return [ NS_MAIN, '' ];
			// extra namespaces, see operations/mediawiki-config.git,
			// wmf-config/InitialiseSettings.php, 'wgExtraNamespaces' key, 'commonswiki' subkey
			case 'Creator':
				return [ 100, '' ];
			case 'TimedText':
				return [ 102, '' ];
			case 'Sequence':
				return [ 104, '' ];
			case 'Institution':
				return [ 106, '' ];
			// extension namespace, see mediawiki/extensions/JsonConfig.git,
			// extension.json, 'namespaces' key, third element
			case 'Data':
				return [ 486, '' ];
			default:
				return [ NS_MAIN, $namespace . ':' ];
		}
	}

This function used to be necessary because we used to parse the value using a TitleParser for Wikidata (i. e. one that was not configured to know about Commons’ custom namespaces) and then look up the resulting namespace number and database key in Commons’ database. However, nowadays (specifically, after the fixes to the above tasks, I think), we no longer do that, and in fact the only caller of this method immediately throws the namespace part of the return value away:

			$prefix = $this->getCommonsNamespace( $namespace )[1];
			$normalizedTitle = $this->pageNameNormalizer->normalizePageName(
				$prefix . $commonsLink,
				'https://commons.wikimedia.org/w/api.php'
			);

So the “Category” namespace works because it falls through to the default case, where the prefix becomes "Category:" instead of "" and so the Commons API (called through the PageNameNormalizer) sees the correct title.

I suppose we could fix this issue by hard-coding in CommonsLinkChecker that namespace parameters of “Creator”, “Category”, etc.(?) mean a corresponding prefix should be added to the title before sending it to the PageNameNormalizer, whereas a namespace parameter of “Data” (and any others?) should include no prefix (because it’s expected to be part of the value). That’s not nice at all, but at least it’s a relatively simple solution that doesn’t require any changes to the existing data in Wikidata.

Edit: According to T242518, a namespace parameter of “Institution” would also mean adding the prefix to the title. Probably check “TimedText” and “Sequence” from the switch in getCommonsNamespace() as well.