Page MenuHomePhabricator

Change namespace behavior of Commons link constraint
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

Problem:
Right now the Commons link constraint expects the value of the Statement to include the namespace of the page on Commons. This is however only true for Properties with datatype geoshape and tabular data. We should change the current checks to not assume the namespace to be included in the value except for Properties of datatype geoshape and tabular data.

Example:

BDD
GIVEN a Property with a Commons link constraint
AND a statement using that Property

(a)
AND the datatype is not tabular data or geoshape
WHEN the value does contain the namespace
THEN a constraint violation of the Commons link constraint is triggered

(b)
AND the datatype is tabular data or geoshape
WHEN the value does not contain the namespace
THEN a constraint violation of the Commons link constraint is triggered

Acceptance criteria:

  • Commons link constraints deal with namespaces only in the way defined in the BDD

Original report:
The creator page on Commons https://commons.wikimedia.org/wiki/Creator:Louis_Huard_(1813-1874) is connected to the Wikidata item https://www.wikidata.org/wiki/Q21555912
This item has the property P1472 Commons creator page https://www.wikidata.org/wiki/Property:P1472 linking to the correct page. However, the constraint Commons link constraint https://www.wikidata.org/wiki/Property:P1472#P1472$51803ECB-1D7F-46CD-B21C-E66E2C5F076F doesn't recognise the link and displays the message "Commons link should exist."

This happens on some Items only (see also https://www.wikidata.org/wiki/Q28837128 )

Event Timeline

I'm pretty sure there is another ticket for this but I couldn't find it, feel free to merge if you have a clue.

@Lucas_Werkmeister_WMDE could you check if something is wrong with the constraint?

Well, we’ve certainly had issues with this constraint before, such as:

As far as I can tell, it’s currently ill-defined: it’s unclear whether the namespace parameter means “the value must include this namespace, and a page with that title must exist” or “a page with this title (no namespace) and the namespace from the parameter must exist”. The “geographic shape” and “tabular data” data types require the former interpretation, I believe (since the “Data:” or namespace is part of the value), whereas “Commons Creator page” and also “Commons category” require the latter.

It’s notable that “Commons category” doesn’t suffer from this bug, though. I think this is because it’s not specially handled in this method of CommonsLinkChecker:

	private function getCommonsNamespace( $namespace ) {
		switch ( $namespace ) {
			case '':
				return [ NS_MAIN, '' ];
			// extra namespaces, see operations/mediawiki-config.git,
			// wmf-config/InitialiseSettings.php, 'wgExtraNamespaces' key, 'commonswiki' subkey
			case 'Creator':
				return [ 100, '' ];
			case 'TimedText':
				return [ 102, '' ];
			case 'Sequence':
				return [ 104, '' ];
			case 'Institution':
				return [ 106, '' ];
			// extension namespace, see mediawiki/extensions/JsonConfig.git,
			// extension.json, 'namespaces' key, third element
			case 'Data':
				return [ 486, '' ];
			default:
				return [ NS_MAIN, $namespace . ':' ];
		}
	}

This function used to be necessary because we used to parse the value using a TitleParser for Wikidata (i. e. one that was not configured to know about Commons’ custom namespaces) and then look up the resulting namespace number and database key in Commons’ database. However, nowadays (specifically, after the fixes to the above tasks, I think), we no longer do that, and in fact the only caller of this method immediately throws the namespace part of the return value away:

			$prefix = $this->getCommonsNamespace( $namespace )[1];
			$normalizedTitle = $this->pageNameNormalizer->normalizePageName(
				$prefix . $commonsLink,
				'https://commons.wikimedia.org/w/api.php'
			);

So the “Category” namespace works because it falls through to the default case, where the prefix becomes "Category:" instead of "" and so the Commons API (called through the PageNameNormalizer) sees the correct title.

I suppose we could fix this issue by hard-coding in CommonsLinkChecker that namespace parameters of “Creator”, “Category”, etc.(?) mean a corresponding prefix should be added to the title before sending it to the PageNameNormalizer, whereas a namespace parameter of “Data” (and any others?) should include no prefix (because it’s expected to be part of the value). That’s not nice at all, but at least it’s a relatively simple solution that doesn’t require any changes to the existing data in Wikidata.

Edit: According to T242518, a namespace parameter of “Institution” would also mean adding the prefix to the title. Probably check “TimedText” and “Sequence” from the switch in getCommonsNamespace() as well.

Esc3300 changed the subtype of this task from "Task" to "Bug Report".Jun 15 2021, 12:35 PM
Lydia_Pintscher renamed this task from Wikidata constraint doesn't recognise Commons creator link to change namespace behaviour of Commons link constraint.Jun 17 2021, 4:41 PM
Lydia_Pintscher updated the task description. (Show Details)
Manuel renamed this task from change namespace behaviour of Commons link constraint to Change namespace behavior of Commons link constraint.Jun 22 2021, 10:04 AM
Manuel updated the task description. (Show Details)
Addshore set the point value for this task to 5.Jun 23 2021, 10:38 AM

Change 701944 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/WikibaseQualityConstraints@master] Don't expect namespace if namespace parameter exists

https://gerrit.wikimedia.org/r/701944

Change 701944 merged by jenkins-bot:

[mediawiki/extensions/WikibaseQualityConstraints@master] Don't expect namespace if namespace parameter exists

https://gerrit.wikimedia.org/r/701944

This works like a charm, thank you! \o/

@Mohammed_Sadat_WMDE: Let's also communicate this to the community (e.g. WikiProject property constraints) as this should solve some long-awaited issues for them. I'll bring it to our next session.