Currently, Wikibase accepts only one way for sitelinks. their canonical database name (their id in sites table). This is causing numerous problems:
- They don't always match to language code of the wikis, e.g. zh-classical.wikipedia.org has its canonical id set as zh_classicalwiki (we expect people to know dash turns to underline)
- It gets even worse, we have one renamed wiki that's be-tarask.wikipedia.org and its canonical site id is b_x_oldwiki and as result we don't recognize the wiki's actual language code and we force people to use the deadname of the wiki and we show the deadname as well.
- As result, this is blocking further wiki renames that are in the queue for years
- We only accept one way to enter sitelink which users could possibly be able to enter different ways (how that would work is not defined, and not in scope of this task)
- We are exposing internals of our system (dbname of wikis), to users, it's not an issue for transparency or security but for user experience, this is not nice.
Changes intended:
* Site IDs containing hyphens are accepted and mapped to Site IDs with underscores for storage (and mapped back to hyphens for display)
* There will be a configurable site ID aliases to be used in favour of "canonical" site IDs
**Examples:**
- `zh-classicalwiki` is used to refer to wiki, which canonical site ID is `zh_classicalwiki` (note `-` vs `_`)
- `be_taraskwiki` and `be-taraskwiki` is used to refer to wiki, which canonical site ID `be_x_oldwiki`
- `be-tarask` is the expected way of communicating the wiki ID of the "Belarussian Taraškievica" Wikipedia
**Acceptance criteria**
[] Configuration is open for more aliases to be added for other siteids
[] The canonical site ids are still used for storage in JSON and other indexes.
[] Configuration is documented in options.md
[] Default Wikibase configuration does not have any aliases
[] Any canonical site id that has `_` in them (like `zh_min_nanwiki`) has an alias of the underline replaced with dash (-> `zh-min-nan`) as a generic rule.
[] `zh-classical`, `zh-min-nan` and `be-tarask` are the IDs presented to the user in the sitelink editing UI
[] `zh-classical`, `zh-min-nan` and `be-tarask` are the IDs presented in the JSON output provided by Wikibase APIs (including Special:EntityData)
[] WMF production config is adjusted so that
[] `be_taraskwiki`, and `be-taraskwiki` is accepted as an site identifier when adding a sitelink to "Belarussian Taraškievica" Wikipedia (canonical site ID `be_x_oldwiki`)
[] identifiers containing underscores, as well as `be_x_oldwiki` as still accepted as site identifiers when adding a sitelink to a respective Wiki
[] WMF production should work like this without any config change:
[] identifiers containing hyphens, e,g, `zh-classicalwiki` are accepted as an site identifier, and adds a sitelink to a Wikipedia with a canonical site ID containing underscores instead of hyphens, e.g. "Chinese classical" Wikipedia (canonical site ID `zh_classicalwiki`) both in UI and API
----
**Original bug:** (includes the description of possible approach, not expected to be followed)
> Wikibase (and MediaWiki) need a more flexible way to handle site ids. In particular:
>
> For API input (for wbaddsitelink, etc) several aliases should be supported per wiki.
> * In addition to the global ID, at least the domain should be usable as a wiki id
> * it should be possible to define additional aliases for input, for use when wikis get renamed, as was recently the case for be-x-old -> be-tarask.
>
> For manual input in the UI, at least the above aliases should be supported
> * in addition, per-group IDs/Aliases should be supported (e.g. "en" means "enwiki" in context of the "wikipedia" group)
> * these aliases should be provided to the UI by the SitesModule
>
> For output, two "labels" should be available:
> ** a long, globally unique label, which would also work as input to the UI widget and API module. The full domain name of the wiki should do.
> ** a per-group shorthand, which would also work as input to the UI widget. This would usually be the language code, e.g. "en" for en.wikipedia.org
>
> ---------
> To achieve the above, we need a service (or several services) that provide the following functions:
>
> lang=php
> getGlobalAliases( $globalSiteId ): string[] // all globally unique aliases for $globalId
> getLocalAliases( $groupId, $globalSiteId ): string[] // all aliases unique within the given group (including the global ones)
> getGlobalName( $globalSiteId ): string // the preferred name that is also a globally unique alias
> getLocalName( $groupId, $globalSiteId ): string // the preferred name that is also an alias unique in the given group
>
> getAllGlobalAliases(): string[][] // map siteId -> list of globally unique aliases
> getAllLocalAliases( $group ): string[][] // map siteId -> list of all locally unique aliases for members of the given group
>
> resolveAlias( $alias, $group = null ): string // return the global site ID for the given alias. Local aliases are supported if $group is given.
>
> These functions would probably be implemented on top of a SiteList. SiteList and Site may have to be extended to provide access to additional information. The schema of the sites table should be flexible enough to accommodate all we need. The information in the SiteList can be mapped as follows:
>
> * the global ID is used as the primary identifier, as well as a global alias (and thus also a local alias).
> * all "local ids" (navigation ids, interwiki ids) would be also count as global ids. //Note the different meaning of "local" in this context//
> * a site's domain name would act as a global id, as well as the "global label"
> * a site's subdomain would act as a local id, as well as the "local label" (alternatively, we could use the language code)
> * additional aliases can be stored as "extra data"
> * the site's global and local label can be overwritten by "extra data"
>
> **Notes**
> This would be created in Wikibase where it could be proved and later pushed to MediaWiki core.
> As noted above regardless of which approach is taken some core objects such as `Site` and `SiteList` might need modifications.
> Data about the mappings needs to come from somewhere. Core currently stores most of this in a DB table, but that is painful. Perhaps this approach should just take this from config or a file.
>
> This task is only about having a tested merged implementation of the service.
> Usage of the service would be specified in separate tasks.
>