Currently, Wikibase accepts only one way for sitelinks. their canonical database name (their id in sites table). This is causing numerous problems:
- They don't always match to language code of the wikis, e.g. zh-classical.wikipedia.org has its canonical id set as zh_classicalwiki (we expect people to know dash turns to underline)
- It gets even worse, we have one renamed wiki that's be-tarask.wikipedia.org and its canonical site id is b_x_oldwiki and as result we don't recognize the wiki's actual language code and we force people to use the deadname of the wiki and we show the deadname as well.
- As result, this is blocking further wiki renames that are in the queue for years
- We only accept one way to enter sitelink which users could possibly be able to enter different ways (how that would work is not defined, and not in scope of this task)
- We are exposing internals of our system (dbname of wikis), to users, it's not an issue for transparency or security but for user experience, this is not nice.
**Proposed solution:**
A service that takes sitelink id and return the canonical dbname (for db lookups, etc.). For example:
- `zh-classicalwiki` should return `zh_classicalwiki` (normalizing `-`)
- `be_taraskwiki` and `be-taraskwiki` should return `be_x_oldwiki` (this should be a configuration keeping the mapping)
- The rest should return itself (`dewiki` -> `dewiki`)
This service should return a "to show" value on frontend (`SiteModule`):
- `be_x_oldwiki` or `be_taraskwiki` should turn to `be-tarask` (the exact opposite of the above)
(Note that the both have many to one relations but they converge to different values)
----
**Original bug:** (includes the description of possible approach, not expected to be followed)
Wikibase (and MediaWiki) need a more flexible way to handle site ids. In particular:
For API input (for wbaddsitelink, etc) several aliases should be supported per wiki.
* In addition to the global ID, at least the domain should be usable as a wiki id
* it should be possible to define additional aliases for input, for use when wikis get renamed, as was recently the case for be-x-old -> be-tarask.
For manual input in the UI, at least the above aliases should be supported
* in addition, per-group IDs/Aliases should be supported (e.g. "en" means "enwiki" in context of the "wikipedia" group)
* these aliases should be provided to the UI by the SitesModule
For output, two "labels" should be available:
** a long, globally unique label, which would also work as input to the UI widget and API module. The full domain name of the wiki should do.
** a per-group shorthand, which would also work as input to the UI widget. This would usually be the language code, e.g. "en" for en.wikipedia.org
---------
To achieve the above, we need a service (or several services) that provide the following functions:
lang=php
getGlobalAliases( $globalSiteId ): string[] // all globally unique aliases for $globalId
getLocalAliases( $groupId, $globalSiteId ): string[] // all aliases unique within the given group (including the global ones)
getGlobalName( $globalSiteId ): string // the preferred name that is also a globally unique alias
getLocalName( $groupId, $globalSiteId ): string // the preferred name that is also an alias unique in the given group
getAllGlobalAliases(): string[][] // map siteId -> list of globally unique aliases
getAllLocalAliases( $group ): string[][] // map siteId -> list of all locally unique aliases for members of the given group
resolveAlias( $alias, $group = null ): string // return the global site ID for the given alias. Local aliases are supported if $group is given.
These functions would probably be implemented on top of a SiteList. SiteList and Site may have to be extended to provide access to additional information. The schema of the sites table should be flexible enough to accommodate all we need. The information in the SiteList can be mapped as follows:
* the global ID is used as the primary identifier, as well as a global alias (and thus also a local alias).
* all "local ids" (navigation ids, interwiki ids) would be also count as global ids. //Note the different meaning of "local" in this context//
* a site's domain name would act as a global id, as well as the "global label"
* a site's subdomain would act as a local id, as well as the "local label" (alternatively, we could use the language code)
* additional aliases can be stored as "extra data"
* the site's global and local label can be overwritten by "extra data"
**Notes**
This would be created in Wikibase where it could be proved and later pushed to MediaWiki core.
As noted above regardless of which approach is taken some core objects such as `Site` and `SiteList` might need modifications.
Data about the mappings needs to come from somewhere. Core currently stores most of this in a DB table, but that is painful. Perhaps this approach should just take this from config or a file.
This task is only about having a tested merged implementation of the service.
Usage of the service would be specified in separate tasks.
**Acceptance criteria**
[] A SiteIdMapper PHP service has been implemented