Page MenuHomePhabricator

Wikibase language links feature works only for site groups which follow WMF's naming patterns
Open, Needs TriagePublic

Description

This is more or less related to T137537: Ensure correct information about Wikimedia sites in the Sites facility on the Wikimedia cluster. , which could be silently fixed with fixing the task, however, I wanted to make sure, that the current implementation (described below) probably works for the specific cases and assumptions in Wikimedia wikis, but does not work for third-party wikis or wikis, which do not meet these assumptions.

Today I worked on getting the ContentTranslation extension up and running and started using the language links functionalities of Wikibase (the Wikibase installation exists for a long time already, but wasn't used for language links, as there was only one wiki with one language until now). During the evaluation and installation of these things, I found out, that Wikibase' LangLinkHandler has a very _specific_ assumption about what the interwiki link of a site might be (until T137537: Ensure correct information about Wikimedia sites in the Sites facility on the Wikimedia cluster. got fixed). It simply removes the "wiki" or "wikitionary" part of the site_global_id and uses the remaining part as the interwiki link:

	public function getInterwikiCodeFromSite( Site $site ) {
		// FIXME: We should use $site->getInterwikiIds, but the interwiki ids in
		// the sites table are wrong currently, see T137537.
		$id = $site->getGlobalId();
		$id = preg_replace( '/(wiki\w*|wiktionary)$/', '', $id );
		$id = strtr( $id, [ '_' => '-' ] );
		if ( !$id ) {
			$id = $site->getLanguageCode();
		}
		return $id;
	}

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/22c655666a651864674826be72c4f697d9281e6c/client/includes/LangLinkHandler.php#L399-L409

This might work very well for WIkimedia and Wikimedia-like wikis, however, it does not work for most of the other wikis. A site_global_key, which ends with wiki, like endroidwiki, will be treated as having an interwiki link of endroid, which is false, as the interwiki link for this wiki is only "en", which is also the prefix saved in the interwiki table of MediaWiki.

While I agree, that task T137537: Ensure correct information about Wikimedia sites in the Sites facility on the Wikimedia cluster. needs to be fixed, I would ask for a workaround/temporary configuration, which allows wikis (like third-party wikis) to use getInterwikiIds of the Site class for the interwiki links, instead of this fuzzy assumption with the site global key.

Event Timeline

Restricted Application added subscribers: PokestarFan, Aklapper. · View Herald Transcript

What happens if the group name doesn't start with "wiki* or "wiktionary". That's pretty much an inevitability on a third-party wiki.

For instance, if Oncyclopedia is a Dutch-language Uncyclopedia, its $site entry might look something like this:

object(MediaWikiSite)#903 (10) {
["globalId":protected]=>
string(7) "nluncyc"
["type":protected]=>
string(9) "mediawiki"
["group":protected]=>
string(5) "uncyc"
["source":protected]=>
string(5) "local"
["languageCode":protected]=>
NULL
["localIds":protected]=>
array(0) {
}
["extraData":protected]=>
array(1) {
  ["paths"]=>
  array(2) {
    ["file_path"]=>
    string(27) "https://oncyclopedia.org/$1"
    ["page_path"]=>
    string(32) "https://oncyclopedia.org/wiki/$1"
  }
}
["extraConfig":protected]=>
array(0) {
}
["forward":protected]=>
bool(false)
["internalId":protected]=>
int(180)
}

The group name is "uncyc" and the individual wiki's globalId is "nluncyc". Not "nlwiki" as this isn't Wikipedia, it's Uncyclopedia. The same issue applies for any other family of third-party wikis with multiple languages.

I've had to kludge extensions/Wikibase/client/includes/LangLinkHandler.php to change getInterwikiCodeFromSite() to get the $id by taking the globalId and stripping out the group name, ie:

public function getInterwikiCodeFromSite( Site $site ) {
        $id = $site->getGlobalId();
        $group = $site->getGroup();
        $id = str_replace($group,'',$id);       // added to suppress 'uncyc' and 'illogic' suffix 2019-04-11
        $id = preg_replace( '/(wiki\w*|wiktionary)$/', '', $id );
        $id = strtr( $id, [ '_' => '-' ] );
        if ( !$id ) {
                $id = $site->getLanguageCode();
        }
        return $id;
}

Without this kludgey fix, what ends up in the "in other languages" sidebar is not "Nederlands" but "nluncyc:Hoofdpagina" - and a whole column of those sort of malformed links (which don't work, as they end up pointing to redlink pages on the local wiki) is unusable and unsightly.

This is just an untested idea, but perhaps a more general solution would be to str_replace the group name and (in the lone case where $group=='wikipedia') also str_replace out the 'wiki' suffix. Something like:

public function getInterwikiCodeFromSite( Site $site ) {
      $id = $site->getGlobalId();
      $group = $site->getGroup();
      $id = str_replace($group,'',$id);
      if ( $group=='wikipedia' )      
         $id = str_replace( 'wiki', '', $id );
      $id = strtr( $id, [ '_' => '-' ] );
      if ( !$id ) {
              $id = $site->getLanguageCode();
      }
      return $id;
}

That way, the original poster's issue where "wiki" is part of the name of a non-Wikipedia / non-Wikimedia project might be addressed, as it's only looking for the configured $group name on non-Wikipedia wikis.

Carlb renamed this task from Wikibase language links feature not working for thirdparty with site_global_keys ending with wiki but do not represent interwiki links to Wikibase language links feature works only for site groups which follow WMF's naming patterns.Apr 22 2019, 4:03 PM

This is a problem mainly for external uses of Wikibase (thus adding Wikibase Suite Team and Wikibase (3rd party installations) ), but also for local development setups that currently moreorless have to copy the Wikimedia production setup 1:1. (Thus adding Developer Productivity )

Winston_Sung updated the task description. (Show Details)
Winston_Sung moved this task from Backlog to General on the MediaWiki-Site-system board.