Page MenuHomePhabricator

Allow entering Wikidata sitelinks to sites using an alternative (alias) site ID (not matching the database name)
Closed, DeclinedPublic13 Estimated Story Points

Description

Currently, Wikibase accepts only one way for defining sitelinks. Historically, the site ID of the site (its ID in sites table) matches their canonical database name.

It is not uncommon for WMF wikis to have their site ID changed.
Example: "Belarussian Taraškievica" Wikipedia use to have a site ID be_x_old, which is also related to wiki's database canonical name (be_x_oldwiki). At some point the wiki has been renamed to be-tarask. The be_x_old remained to be stored in the database as a "canonical site ID", but it is be-tarask which is primarily used in the UI and expected in the user input (UI and API)

Change requested: There will be a configurable site ID aliases to be used in favour of "canonical" site IDs

Acceptance criteria:

  • Configuration option have been create that allows to defined aliases for "canonical" site IDs
  • Configuration is documented in options.md
  • The canonical site ids are still used for storage in JSON and other indexes.
  • Alias site IDs are only allowed consist of lower-case Latin letters (a-z) and hyphens
  • alias site IDs are allowed to be used in the input in the sitelink editing UI
  • alias site IDs are the IDs presented to the user in the sitelink editing UI
    • One of the aliases is configured as the "label" to be shown in the UI - especially for cases when there are multiple aliases per "canonical site ID"
  • alias site IDs are presented in the JSON output provided by Wikibase APIs (including Special:EntityData)
    • One of the aliases is configured as the "label" to be shown in the API output - especially for cases when there are multiple aliases per "canonical site ID"
    • The change in the the JSON output should be configurable, so it could be enabled on demand, to allow WMDE to follow the Stable Interface possible
  • alias site IDs are allowed to be used in the input to Wikibase APIs (eg. wbeditentity)
  • Special:GoToLinkedPage, Special:ItemByTitle accept all alias Site IDs, as well as the canonical site ID
  • Default Wikibase configuration does not have any aliases
  • WMF production config is adjusted so that
    • be-taraskwiki is accepted as an site identifier when adding a sitelink to "Belarussian Taraškievica" Wikipedia (canonical site ID be_x_oldwiki)
    • "old" site IDs as `be_x_oldwiki are still accepted as site identifiers when adding a sitelink to a respective Wiki
    • be-taraskwiki is displayed in the sitelink editing UI when adding a sitelink to "Belarussian Taraškievica" Wikipedia

Originally part of T114772

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Lydia_Pintscher @Addshore @Ladsgroup please find an open question about the undefined behaviour in the task description (marked as TODO): What should be the displayed site ID if there are multiple aliases defined for a single "canonical site ID"?

I'm confused by the need to create a new task (as far as I can see this is for the same thing / the title is the same.
Now all of the tasks that reference that last task point to a closed ticket instead of an open ticket, and there si one layer of extra indirection to find what people are looking for.
And the subscribers that are interested are now gone etc..
We could merge the other one into this one as a duplicate? and move over the parent tasks? But I think we may as well just continue using the old ticket then?


Regarding the TODOs

alias site IDs are the IDs presented to the user in the sitelink editing UI - TODO: which one if there are multiple aliases per "canonical site ID"?
alias site IDs are presented in the JSON output provided by Wikibase APIs (including Special:EntityData) - TODO: which one if there are multiple aliases per "canonical site ID"?

Looking at the task description of T114772 though the detail seems to be missing in this new ticket description.

zh-classical, zh-min-nan and be-tarask are the IDs presented to the user in the sitelink editing UI
zh-classical, zh-min-nan and be-tarask are the IDs presented in the JSON output provided by Wikibase APIs (including Special:EntityData)

This won't be hardcoded so I gather that a siteid for presentation is also needed in the configuration.
Though that should also take into account this AC.

Any canonical site id that has _ in them (like zh_min_nanwiki) has an alias of the underline replaced with dash (-> zh-min-nan) as a generic rule.

So even with 0 configuration, if a siteid has an _, the presentation site id should have -s?

Any canonical site id that has _ in them (like zh_min_nanwiki) has an alias of the underline replaced with dash (-> zh-min-nan) as a generic rule.

So even with 0 configuration, if a siteid has an _, the presentation site id should have -s?

that has been my understanding of what T114772 was describing, in particular highlighting that the hyphen-to-underscore and the other way round behaviour should be something general, and listing all WMF wikis where it applies as special cases was not considered.

Regarding the TODOs

alias site IDs are the IDs presented to the user in the sitelink editing UI - TODO: which one if there are multiple aliases per "canonical site ID"?
alias site IDs are presented in the JSON output provided by Wikibase APIs (including Special:EntityData) - TODO: which one if there are multiple aliases per "canonical site ID"?

Looking at the task description of T114772 though the detail seems to be missing in this new ticket description.

zh-classical, zh-min-nan and be-tarask are the IDs presented to the user in the sitelink editing UI
zh-classical, zh-min-nan and be-tarask are the IDs presented in the JSON output provided by Wikibase APIs (including Special:EntityData)

This won't be hardcoded so I gather that a siteid for presentation is also needed in the configuration.
Though that should also take into account this AC.

I've noticed

a long, globally unique label, which would also work as input to the UI widget and API module. The full domain name of the wiki should do.

from the original description. Will be added to the description of this.

Note that when I was asking you, @Ladsgroup and @Lydia_Pintscher for the requirements, it didn't seem necessary to have the additiona

a per-group shorthand, which would also work as input to the UI widget. This would usually be the language code, e.g. "en" for en.wikipedia.org

defined, so it is not included here. It might be it is still needed was just not stated

WMDE-leszek set the point value for this task to 13.Nov 17 2020, 2:29 PM

@Addshore @Ladsgroup @Lydia_Pintscher please note another undefined requirement:

TODO: specify what characters are allowed in the alias site ID (it seems that underscores are not wanted to avoid funny behaviour in combination with T267791; would this be small case Latin letters and hyphens only?)

thanks

@Addshore @Ladsgroup @Lydia_Pintscher please note another undefined requirement:

TODO: specify what characters are allowed in the alias site ID (it seems that underscores are not wanted to avoid funny behaviour in combination with T267791; would this be small case Latin letters and hyphens only?)

thanks

I don't have a strong opinion on this and would like to defer to @Addshore and @Ladsgroup.

I think it should be only small case latin letters and hyphens (and underscores). Anything else makes this ticket bigger can be tackled later.

WMDE-leszek updated the task description. (Show Details)
WMDE-leszek updated the task description. (Show Details)

this task, in particular the scenario that is being described - also in the acceptance critieria - seem to have drifted from the bug reported in T112426.
Closing this to circle bug to the actual problems.