Page MenuHomePhabricator

Blockers for Wikimedia wiki domain renaming
Open, Needs TriagePublic

Description

The domains of several Wikimedia wikis should be renamed: T21986.

This was done once, when be-x-old.wikipedia.org was renamed to be-tarask.wikipedia.org, but this renaming exposed several issues. Performing any more renaming is not advised until these issues are resolved. This task tracks these issues.


Posted by @C933103 at Community Wishlist Survey 2019/Miscellaneous/Clear roadblock for wiki site URL changes

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
StalledNone
InvalidNone
StalledNone
StalledNone
StalledNone
StalledNone
StalledNone
StalledFeatureNone
StalledNone
StalledFeatureNone
StalledFeatureNone
StalledFeatureNone
StalledNone
StalledNone
OpenNone
ResolvedWinston_Sung
OpenNone
OpenNone
ResolvedNone
OpenWinston_Sung
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Liuxinyu970226 I'm only opposing renaming of the no.wikipedia.org to nb.wikipedia.org. What other project decides to do I have no strong opinion on.

Slightly of-topic: @jeblad claims " the nowiki community would probably accept a move if old URLs will work for some time. ". This subject has been up many times and no, there is no consensus of moving the project to another language code. (See the last round here https://no.wikipedia.org/wiki/Wikipedia:Avstemninger/Prefiks vote 85 to keep no versus 39 to move to nb).

Remember no.wikipedia.org covers the Norwegian language and since 2005* it has been covering two norms including some 90 % of the usage of Norwegian; bokmål as normed by Språkrådet (https://en.wikipedia.org/wiki/Language_Council_of_Norway) and riksmål, the de facto standard most norwegians follow as normed by Det norske akademi (https://en.wikipedia.org/wiki/Norwegian_Academy_for_Language_and_Literature). We cover the Norwegian language at no.wikipedia.org, and in respect to the nn.wikipedia.org that left us in 2004 or something we do not write nynorsk.

The language situation in Norway is complex and has been a big fighting ground for 100 year. We have a stable situation on the project and it's pity that this has again come up as an issue. We should be writing articles and make no.wikipedia.org better, not handling unneccessary noise like this.

If someone will start a nb.wikipedia.org I suppose that is up to the Foundation to accept. But no.wikipedia.org stays at no.wikipedia.org (until eventually Foundation make a board case on it and force move us to nb. (since they are the owners of the domains)).

Remember there are a lot of reasons why we should not move:
https://meta.wikimedia.org/wiki/Talk:Requests_for_comment/Rename_no.wikipedia_to_nb.wikipedia#No_relocation_from_nowiki_to_nbwiki

@Nsaa To answer your list of common aganist reasons, I worked hard on analysing ICANN, IETF, IEEE, ITU, ... resources in the past 8 days, and now (I removed your ref tags since those are shown U+FFFD characters on my browser):

'External links' – nowiki has established a great value of links that link to the page. At present it is up to 1.3 million such links. How will we get external stakeholders to update over 1,300,000 link to us with Yahoo containing no.wikipedia.org?

Maybe this would be a bug regarding SEO.

'Brand' no.wikipedia.org – NO domain coincides with the country code no, and that no has been in use since this wiki was created. Thus, it is advantageous to retain the familiar prefix no, that all Norwegian speakers have a relationship with, versus the totally unknown nb, who barely the most educated language people know.

Again, we use ISO 639, not ISO 3166, are we discussing the SAME ISO standard? "it is advantageous to retain the familiar prefix no" So again that nowikibooks, nowikinews, nowikisource, and potential nowikivoyage are bokmål only?

'Visibility' – All links from articles on no.wikipedia.org will no longer count as much for Google's PageRank algorithm (one can assume) if you do not 'permanent' add correctly redirectkoder the .com domain. The proposal allows for 'remove 301 redirect' after 5 years.
'Visibility and value' – 'Britannica boss' Jorge Cauze say the following on Wikipedia If I were to be the CEO of Google or the Founders of Google I would ask very [displeased] That the best search engine in the world continues two provideh as a first link, Wikipedia,

by those reasons, you're still opposing ALL Wiki-Setup (renaming) requests, and what's the reality of ".com"? How is that important?

'Index and traffic figures' . A switch will 'reduce the main page's importance' . At present, the traffic was 57 & nbsp; 694 hits, it amounts to 5.04 & nbsp;% of traffic on nowiki. Traffic on nnwikis index is pr. Day 1 & nbsp; 496 hits, it amounts to 1.37 & nbsp;% of traffic on nn. Sources: Source nn (archive nn 2009-01-26) source en (archive en 2009-01-26)

This looks rather like a bug (hence ask SEO) than a good argument

'Uncertainty' – Likewise, we have no control over what other search engines and others who follow links do with this kind of redirects and to what extent this has a negative effect.
'Uncertainty' – there is no agreement on what one possibly to do with .no domain afterwards, ergo it is inherent in a great danger that external links will no longer pointing directly to our articles and main compartment.

still, the macrolanguage problem, I believe Estonian, Latvian and Lithuanian could also have such problems...

'Brand' – no domain is built up throughout the Norwegian speaking population consciousness as the site of Wikipedia in bokmål and riksmål.

Still, you still still and still not provided which articles are examples on the de facto nowiki which are writting in riksmål, only provided two enwiki articles that about orthography which are nonsense under reviewing resources criteria.

'Technical' – That it is technically a not insignificant job to implement (many bots must clean incredible amounts, one must set things up properly by developers on Wikipedia's servers – we'll use these scarce resources to such policy changes like this?).

I don't believe that you can't search what @Krenair said many times before, the only "hard" thing is renaming database names which is no longer a topic of this, but T83609

'Bias of bokmål / riksmål' – indirectly proposal an attempt to impartiality in distributing languages ​​printed on no (bokmål and riksmål), a language used mainly by the vast majority (In a survey with a sample of over 4,000 people came forward with the following "7.5 % responded that they write only nynorsk, 5.5% that they write about the same amount in both language variants, and 86.3% that they write riksmål / bokmål "oNLY 7 , 5% nyNorsk ( archived 2009-01-09 ). Bottom line then it's over 90% of the population of writing that uses bokmål / riksmål to put that in perspective). It will be quite discriminatory to destroy all the value created on no.domenet already at a relocation simply because a driver with semantic argument that also no-ISO code comprises the entire riksmål.

Now you claim that most nowiki articles can also be considered as riksmål, and bokmål is just riksmål, with those "resources".

'For all eternity .no domain is unusable for other purposes' – With a permanent relocation by. En:301 Redirect avoids possibly some of the problems with external links, but it will make no.wikipedia.org domain busy forever.

But if without renaming domain, the nb.wiki* can be permanently one of 404/405/504, how is 301 a big problem than those three? Just count the number of error codes here.

'ISO code no is more correct than nb for this wiki' – Officially Bokmål is normed by no:Språkrådet, while riksmål normed of no:Det Norske Akademi for Sprog og Litteratur. In ISO description of nb says bokmål and riksmål is not defined in this. no covers the Norwegian language and are therefore both national target and on Bokmål under no-ISO code, but not directly under nb-ISO code, at least not name terms. Thus riksmål and moderate bokmål (since this is not used by the Official Languages ​​Council sticks to radical forms) will be an immeasurably [ http://www.ordnett.no/ordbok.html?search=forfordele&search_type=&publications=23 (in meaning 1)] by moving from uk to nb.

By this claim, I could say that Persian is just Persian, not Dari, no dialects problems, btw an ask.com discussion (I missed the entire URL of it) says that in sometime, the "no" can also contain series of Sami languages, so if you'd love to keep no.wiki*, please include Sami just in those no.*, not creating those on Incubator, Okay?!

'Wikimedia should not not make a change, which entails serious consequences just to satisfy a small minority' – The move does not apply to equate two equal languages. We're not Wikipedia for Norway, but Wikipedia in Norwegian (bokmål, nynorsk, riksmål others). Here we are not official bias of language. 90% of the population uses bokmål/riksmål and then mainly the moderate form. Thus the argument that no recording a domain which should also cover nynorsk correct, but weak. There are many varieties of German, but they just follow the new orthography (not Low German, Swiss German, Austrian German etc.). I think I see that dewiki being accused of discriminating against them.

"not not" = just do it, and please be aware that Swiss German has dialects too, and alswiki contain 4 of em. It would be in case that eswiki will face-to-face dialects of es in many countries problems by the same claim (still, which dialect/orthography the Spanish-Sites are following?).

'Definition Power' – The impact of vertical search engines: The 'relative position' to nn will be improved (ergo choose people nn articles instead nb article to a greater extent). Thus acquires nn more of definition power (Is it called the no:Vedavågen or Veavågen mm)

still, the SEO bugs

(added after poll) 'External links on paper'. There is no one who knows how many (permanent) links that are currently operating pressure and after the proposed five-year period will no longer pointing to that content.
(added after poll) 'External links to papers. By a shift will no longer (permament) links be like. After five years they will probably not pointing to the correct content. This is unfortunate set in an academic perspective (hampers reference check).

copied from my T172035#3613648 above

regarding fix codes on paper materials, I would suggest to get help from some (mainly) Chinese-made stuffs "Correction Tapes (修正带)" or "Correction Fluid (涂改液)"

or just, as deryck said above, doesn't concern the vast majority of requests, because you can just re-print them after domain renaming, only waste a little of inks in ink cartridge/selenium in toner cartridge.

(added after poll, 2011-08-14) 'ownership' to no.wikipedia.org gained by everyone who has helped in no.wikipedia.org on the articles per currently exist. Each of the users has mixed his work into these articles and it will in practice mean that every one of all the user needs to be requested and accepting delivery of this landed right. Presumably, only a Board decision in Wikimedia Foundation that can move the project since it is they who own the domain formally.

@Mdennis-WMF is this really? there's holds up from WMF bords?

(added after poll, 2011-08-14) 'W3C' strongly recommend that you do 'NOT' modify URLs.

W3C also suggests to let browsers support Audio track selection, (likely Video-), MPED-4 ASP, H.265 (surveillance-controller?)... but how are those must be supported on nowadays market-of-browsers?
On the other hand, IETF RFC 1035 implied to suggest to rename that in a necessary period.

(added after poll, 2011-08-14) Adding riksmål under bokmål is very wrong when bokmål covers almost the entire Norwegian written language, even the many nynorsk forms. Riksmål has a unique hundred year history with many of the leading cultural porters of the Norwegian language. To illustrate how remote on Bokmål may be from riksmål we can take this example

You said many times that nowiki articles are riksmål but now you say the opposite to YOUR SELF, psst.

Please continue discussions about the merits of renaming particular
languages' / language groups' sites on their respective tasks. This thread
is about the issues blocking the actual act of renaming.

Please continue discussions about the merits of renaming particular
languages' / language groups' sites on their respective tasks. This thread
is about the issues blocking the actual act of renaming.

Well, I'm just pointing some common advantages for renaming domains, nothing is specifically above.

Is it possible to do the renaming task for those wikis as of current status first and then deal with whatever bugs that would appear after the renaming is to be done? As mentioned by others, it've been almost a decade since the issue was <del>raised</del>submitted, and there will be more and more legacy issue need to deal with the longer it drags on (CX didn't even exists back in the day). Things like CX would be broken but those seem to be less important.

Edit: Alternatively, how about closing a project into incubator and then reopen it with the desired language code immediately?

@C933103:

Is it possible to do the renaming task for those wikis as of current status first

If that is possible, then we don't need this task, we can just rename those wikis, even that results a large number of database conflicts.

and then deal with whatever bugs that would appear after the renaming is to be done?

(the main topic of this task)

As mentioned by others, it've been almost a decade since the issue was raised

Those "raising" actions are illegal, please see Bug management/Phabricator etiquette, especially:
Report status and priority fields summarize and reflect reality and do not cause it. Read about the meaning of the Priority field values and, when in doubt, do not change them, but add a comment suggesting the change and convincing reasons for it.

and there will be more and more legacy issue need to deal with the longer it drags on (CX didn't even exists back in the day).

Full list of that thing is needed to investigate

Things like CX would be broken but those seem to be less important.

So if we don't work on those tasks before e.g. 2050, then which "broken" is much more inappropriate to you?

Edit: Alternatively, how about closing a project into incubator and then reopen it with the desired language code immediately?

I was suggesting that as an alternate way of T25216, but @Verdy_p doesn't agree it, and still suggests to do somethings on CNAME (which is also stucked per T133548).

T133548 is about certificates for HTTPS. if we create a CNAME of a subdomain project to another subdomain, it should have no impact if the certificate allows not just specific subdomains but its parent domain (e.g. wikipedia.org): there are hundreds of subdomains and having one certificate for each one is costly.

But may be there are reasons I don't know why you don't want to use "wildcards" for allowing all subdomains (e.g you have subdomains actually delegated and managed by third parties on their own servers and administrators, but inb my opinion they should not be within *.wikipedia.org but only in *.wikimedia.org; or you want to create distinct subsets of authoritative DNS for several languages for managing the deployments in your server farms, or for legal reasons if some wikipedias follow different copyright policies and need separate trustships with separate certificates or for international issues with some countries that want to block some subdomains)

But may be you could then renew just each certificate by including in them both the old CNAME'd subdomain and the new one. Finally you may not want CNAMEs if they cause issues in your front caching proxies (duplicate cache storage for actually the same page from two distinct subdomains).

Another reason is possibly the configuration of your webservers not accepting different "Host:" in HTTP/HTTPS requests, as your servers actually are used to host multiple virtual web servers and need it to know which site to render.

The advante of CNAME is that it does not force clients to perform sucessive requests to follow a redirect (by HTTP result code at best, or by Javascript at worst): How were subdomains renamed and aliased in the past ?

I'm just curious why then T133548 is blocking also the CNAME aliasing.

As mentioned by others, it've been almost a decade since the issue was raised

Those "raising" actions are illegal, please see Bug management/Phabricator etiquette, especially:
Report status and priority fields summarize and reflect reality and do not cause it. Read about the meaning of the Priority field values and, when in doubt, do not change them, but add a comment suggesting the change and convincing reasons for it.

Sorry, a better term would be, "since the issue was submitted".

Things like CX would be broken but those seem to be less important.

So if we don't work on those tasks before e.g. 2050, then which "broken" is much more inappropriate to you?

(There are no workable MTL for yue/nan/etc anyway so CX serve little purpose there, so it's probably only as important as any random widget, but I can't speak for others about the priority on which is more important.) The langlink query things seem to affect more things but its impact can only be determined based on the progress of further increase in reliance on wikidata. As for other potentially undiscovered problems, there are probably no way to know it without first exposing it?

@Verdy_p:

The advante of CNAME is that it does not force clients to perform sucessive requests to follow a redirect (by HTTP result code at best, or by Javascript at worst): How were subdomains renamed and aliased in the past ?

But it has also disadvante, at least that method will still blocks creation of Narom contents under the formal Wp/nrm/*, and so we have to "setup a temporary code" which is at least I don't wanna see. (Please, believe me, until we could have time to do FULLY exporting-and-importing db dumps from nrmwiki dbname to nrfwiki dbname ON THE ACTUAL Machine of s3 we don't yet know how to unlock it, as we don't have a whitelist way of WikimediaIncubator.class.php;c34aeead7cadeedf906247512ea76ec6eacba73a$386.)

Narom is another unrelated problem: "nrm" was an incorrect code for Norman and no aliasing between "nrm" and "nrf" should be kept if we want to have Narom contents.
So it's impossible to preserve the links to Norman contents except transitionally (but this must have an end).
Of course "nrm" must be completely freed to leave space to Narom (however we've still not seen for now any interested community in creating contents for Narom)
And subdomains is not really a blocker for creating contents in Incubator (where the language would only be a path).

All past contents on Incubator for Norman (previously using "nrm") can very easily moved to "nrf". The issue is elsewhere: in templates still assuming "nrm" means Norman, in interwiki links, in Wikidata, and in Translatewiki.net: these must all be changed to free the incorrect "nrm" code they use or assume. And this work cannot be made only by Norman contributors (who don't care about Narom) or by Narom contributors (who don't care about Norman): it has to be done administratively (after approval by the language comity) and taken by contributors in any Wikimedia projects working proactively for this cleanup.

As long as we don't do that, the only way to create Narom contents in Incubator (or later for a localized wiki) will be to use a private use code (such as "x-narom"), meaning that Narom will not be considered a plain language equal to others even if it has equal treatment in ISO 639-3. Such private use code may be acceptable in Incubator (Wp/x-narom, but there may be issues with Incubator templates to recognize it as a valid language code), but certainly not for a localized wiki project and also not for preparing the core UI translations needed in Translatewiki.net that will be needed first before a Wp project is created (and it would not avoid the later needed renaming to use the standard code).

Note: there's an alternative: ISO 639 approved the "nrm" code for Narom but as far as I know it has never been really used. Narom could be alllocated by ISO another distinctive ISO 639-3 code and could then deprecate "nrm" (this has already occured for other languages). If "nrm" is deprecated, then it could remain associated to Norman in Wikimedia (note that "nrf" is really a language code assigned for Continental Normal, and not the 3 major variants for France, Jersey, Guernsey, and Jersey may also requests its own distinctive code for its official language; in that case "nrm" would a legacy macrolanguage, but deprecated, and another standard code could be assigned to the new macrolanguage, leacing "nrm" deprecated and used privately by Wikimedia).

However Wikimedia should not pressure the ISO wroking group like this: it has to do its own local job for cleaning up the situation. For now the use of "nrm" as a subdomain for Norman is conforming (subdomains are not restricted to be language codes) just like the Wikimedia interwiki prefixes. But we know that these Wikimedia interwiki prefixes and domain name labels are not always language codes (e.g. "simple" would then be "en-x-simple" if Wikimedia really conformed to BCP47 standard for these interwiki prefixes and domain names for its own localization purposes).

@Verdy_p

Narom is another unrelated problem

It is the same problem, that you still suggesting a "temporary deprecation", which won't help that Incubator link either

All past contents on Incubator for Norman (previously using "nrm") can very easily moved to "nrf"... in Wikidata, and in Translatewiki.net: these must all be changed to free the incorrect "nrm" code they use or assume.

So leave SIL page as just "Guernésiais"+"Jèrriais" and without "Norman"? How is there having benefit to play double standard game?

meaning that Narom will not be considered a plain language equal to others

Then why don't you consider creating separated wikis for Guernésiais as nrf-gg.wikipedia.org and Jèrriais as nrf-je.wikipedia.org? Still, you're "double standard"ing

ISO 639 approved the "nrm" code for Narom but as far as I know it has never been really used.

Then which language https://incubator.wikimedia.org/wiki/Wt/nrm/fa%CA%94 is using?

Narom could be alllocated by ISO another distinctive ISO 639-3 code and could then deprecate "nrm" (this has already occured for other languages).

Changing the ISO codes themselves are simply impossible, if those can even be possible, then we can ask for re-assigning the Wawa (www) to e.g. wwe, so we have no problem between the potential different usages of "www" (but then you have to try distinguishing between World Wrestling Entertainment and Wawa).

If "nrm" is deprecated, then it could remain associated to Norman in Wikimedia (note that "nrf" is really a language code assigned for Continental Normal, and not the 3 major variants for France, Jersey, Guernsey, and Jersey may also requests its own distinctive code for its official language; in that case "nrm" would a legacy macrolanguage, but deprecated, and another standard code could be assigned to the new macrolanguage, leacing "nrm" deprecated and used privately by Wikimedia).

Jèrriais - Language Status: 8a (Moribund) < Narom - Language Status: 6b (Threatened). Still a good comment?

However Wikimedia should not pressure the ISO wroking group like this

[self-published source?]

subdomains are not restricted to be language codes

[clarification needed] [examples needed] [according to whom?] [by whom?] [dubious] [neutrality is disputed] [vague] and [needs update]

But we know that these Wikimedia interwiki prefixes and domain name labels are not always language codes

[when defined as?] [weasel words] and [not in citation given]

"simple" would then be "en-x-simple" if Wikimedia really conformed to BCP47 standard for these interwiki prefixes and domain names for its own localization purposes

no need to add "-x" since "en-simple" is just confirming your standard.

I said "deprecated" about the incorrect use of the code by Wikimedia, it
was NOT daying anything about the status of the languages. You're clearly
misreading (or don't want to read correctly).

subdomains are not restricted to be language codes

" *[clarification needed] [examples needed] [according to whom?] [by
whom?] [dubious]
[neutrality is disputed] [vague] and [needs update]*"

Your comment is evern more dubious and with completely unnecessary
questions. There's absolutely no dispute about my affirmation except by
you. You seam to not know at all what is a subdomain (initially they have
NEVER been created to be used for language/locale codes, they are just a
hierarchy of labels assigned by separate authorities managing their own
local zone with their own local rules).

They're just labels conforming to the DNS system but can be quite arbitrary
(depending only on registry rules for specific zones, but within each
domain in a zone, each domain owner does what he wants, and so does the
Wikimedia foundation with its own domains. There was not even any
requirement to use any subdomain to create separate wikis, this could as
well have been paths (or parameters in a query string, but more complex to
manage for users). Even Wikimedia Incubator does not need an subdomain, it
uses paths instead (Incubator has enabled Wikis "subpages" to do that at
start of the path, and Meta or Commons are also doing that for pages
created with the translation tool). So where you put the code in an URL is
not relevant at all.

Even "www" is NOT restricted to mean "World Wide Web", it's jsut a common
convention (but now many sites do not even use it or need it). Beside
"www.*" in their domain name, they assign various subdomains for unrelated
things such as "secure.*", "ads.*", "pub.*", "admin*.", "store.*", "buy.*",
"tv.*", "video.*", "mail.*", "int.*" (for some "intranet"), "private.*",
"dns.*", "stats.*", "order.*", "dev.*", "auth.*", "login.*", "map.*",
"to.*" (for redirectors)... some ot them may look like language/locale
codes but are not.

The only relevant question is which language/locale code we must use and
how we can conform to the BCP 47 standard and allow interoperating with
other languages on all wikis to fond documents written in a relavant
language (for now Wikiemdia has not been able to define any standard, and
in fact the situation is more compelx because we handle the case of
multiple languages (including the case of translations hosted on specific
wikis having different default languages, and pages that embed multilingual
contents, or pages with subpages edited separately for specific languages).

@Verdy_p then how do you consider informations on Ethnologue, which points that Jèrriais has more bad status than Narom?

@Liuxinyu970226 the current language "status" does not matter at all for this issue for any language. Do you think that I would be arguing that supporting Narom was not needed in Wikimedia ? Certainly no, I think that Narom should be supported. But which code to use for Jerriais is still not clear (nrf ?) even if it has an official status. But we really need to free the "nrm" code for Narom as it was standardized.. And you're just confirming that need too. So really we don't disagree on this.

However I confirm that "en-simple" is not standard at all, please reread the BCP 47 standard! It would be standard if it had been officially registered as a language variant in the registry. It has been used in Wikiemdia only for private use by Wikiemdia because of Wikimedia users demand for a specific Wikipedia edition for English learners. But the language is still standard English (a subset of it) as there's absolutely no delimitation line. Also this is not "my" standard like you said, but "the" BCP 474 standard (and not the Ethnologue catalogue, which is not a standard but a repository for some research data).

And you should avoid mixing other concepts (notably your confusion between DNS and BCP47 : the decision to use language/locale codes to tag contents in Wikimedia wikis and to create specific liguistic editions of Wiki projects using subdomains was a private decision of Wikimedia.

As well for the distinct decision of Wikimedia to link these editions using "interwiki" prefixes (there are other interwikis codes that are not at all used for that but used to link to other projects, and as a consequence the private decision of Wikimedia to assign a specific role for the first occurence of ":" in pagenames, which also causes some problems with words of some languages like Swedish, or some known article names (this created debates about "c:" and forced some articles to be renamed using alternate characters instead of ":"). Interwiki prefixes can be tricky so their detetion had to be limited in scope, but the detection turned to be unstable as it was not really formalized (and interwikis had a few other rules because they are fully case-insignicant, while the rest of article names are case-insignificant only on their "first" letter (after the interwiki prefix) but only for some wikis.

All these inconsistencies are purely private to Wikimedia wikis. They're not part of any standard, and should not contaminate other sites or data, because too many people think that Wikimedia is a common standard or perfectly follows standards (which is obviously wrong).

However I confirm that "en-simple" is not standard at all, please reread the BCP 47 standard! It would be standard if it had been officially registered as a language variant in the registry.

https://www.iana.org/assignments/language-subtag-registry contains:

%%
Type: variant
Subtag: simple
Description: Simplified form
Added: 2015-12-29
%%

Therefore simple has been already officially registered as a language variant in the registry and so en-simple is a BCP 47 conform language code.

Sorry, I should have checked if it had been recently registered (so only "simple" is not conforming and breaks BCP47 resolvers by forcing us to implement a custom fallback to "en" instead of the standard BCB 47 fallback of "en"); various templates are already converting "simple" to "en-simple" but there may remain a few that use "en-x-simple" to be compliant with HTML/CSS/XML lang="*" attributes, and with standard i18n libraries

Note also that this initial bug was really started many years ago, before that BCP47 registration only two years ago (before even this thread in Phabricator when it imported all bugs from former bug trackers)

Note that renaming a wiki or changing its internal database or domain name is not mandatory. All we need is to support the correct interwikis, and stop polluting Wikidata with fake language codes for its translations (Wikidata should use "en-simple" even for linking to Wikipedia, and it's perfectly possible to alias the domain name). Only a minor modification of known interwiki codes is needed so that it points to the correct domain name even if it's not changed.
Renaming databases is not absolutely necessary. Some pywiki bots will need to be upated to know also the new interwiki alias.
After years, and with other pending renames or creations, we should know now exactly where and how to centralize such maintenance for lists of language codes supported, fallbacks, updates to Translatewiki.net and its import bot.
And so we should also deprecate legacy codes we still use on Translatewiki.net (we should not pollute other non-wikimedia projects hosted there, even if there are now warnings on this site for such legacy private codes, notably on its language portals and their associated categories), as well as all Wikidata entries in properties that are NOT links to Wikipedia. We're reaching the point where complete cleanup can be terminated. For Wikidata it's an important goal to have it established as an important standard, as useful and powerful as CLDR.

@Verdy_p

All we need is to support the correct interwikis

Isn't that already done? *T23915*

and stop polluting Wikidata with fake language codes for its translations

If there are still labels, descriptions, aliases and monolingual language values that are using fake codes, let us know, but as far as I have heard, those are generally regularly fixed by Pasleim's bot.

Wikidata should use "en-simple" even for linking to Wikipedia

the third blocker T114772

Some pywiki bots will need to be upated to know also the new interwiki alias.

Two word: TOO HARD, see T113461. I'd love to say that unless and until there's kiseki that the RENAME DATABASE command can be back, this can't be done safety.

T133548 is about certificates for HTTPS. if we create a CNAME of a subdomain project to another subdomain, it should have no impact if the certificate allows not just specific subdomains but its parent domain (e.g. wikipedia.org): there are hundreds of subdomains and having one certificate for each one is costly.

Isn't https://en-wp.org faced-to-faced that problem? T190244

And so we should also deprecate legacy codes we still use on Translatewiki.net

Already done. Already done. Already done. T51898

we should not pollute other non-wikimedia projects hosted there, even if there are now warnings on this site for such legacy private codes, notably on its language portals and their associated categories

Yep, this is just what currently Incubator doing.
Finally,

some ot them may look like language/locale codes but are not.

What's "ot" here? fire? grass? eight?...

Finally,

some ot them may look like language/locale codes but are not.

What's "ot" here? fire? grass? eight?...

Just an evident typo while composing the message on a smartphone. "some of them"...

Zabe subscribed.

T48141 and T168389 aren't blockers here since we only talk about domain renames where the db name is not being touched. (DBA decided a long time ago that db renames won't happen)

This is so freaking stupid.

The issues would have been long gone if in 2006 the old sites' data were copypasted to the new sites and then the old sites were shut down. Now the wikis have grown exponentially with millions more edits and pages.

Since we only talk about domain renames where the db name is not being touched. (DBA decided a long time ago that db renames won't happen)

Does that mean we need to implement another kind of mapping to replace the DB-name-directly mapping?

https://github.com/wikimedia/Wikibase/blob/8dbd84e/client/includes/Hooks/LangLinkHandler.php#L322-L344

T137537: Ensure correct information about Wikimedia sites in the Sites facility on the Wikimedia cluster.

client/includes/Hooks/LangLinkHandler.php#L322-L344
	/**
	 * Extracts the local interwiki code, which in case of the
	 * wikimedia site groups, is always the global id's prefix.
	 *
	 * @fixme put somewhere more sane and use site identifiers data,
	 * so that this works in non-wikimedia cases where the assumption
	 * is not true.
	 *
	 * @param Site $site
	 *
	 * @return string
	 */
	public function getInterwikiCodeFromSite( Site $site ) {
		// FIXME: We should use $site->getInterwikiIds, but the interwiki ids in
		// the sites table are wrong currently, see T137537.
		$id = $site->getGlobalId();
		$id = preg_replace( '/(wiki\w*|wiktionary)$/', '', $id );
		$id = strtr( $id, [ '_' => '-' ] );
		if ( !$id ) {
			$id = $site->getLanguageCode();
		}
		return $id;
	}

Change 882172 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/extensions/ContentTranslation@master] Remove subdomain-migrated language code "be-tarask" / "be-x-old" from DomainCodeMapping

https://gerrit.wikimedia.org/r/882172

Change 876295 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/extensions/Wikibase@master] Fix Wikibase Client getInterwikiCodeFromSite for WMF wikis

https://gerrit.wikimedia.org/r/876295

Change 882173 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/services/cxserver@master] Remove subdomain-migrated language code "be-tarask" / "be-x-old" from language-domain-mapping

https://gerrit.wikimedia.org/r/882173

Change 879580 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/core@master] Return both non-deprecated language code and deprecated language code results for non-deprecated and deprecated language code queries for QueryLangLinks API

https://gerrit.wikimedia.org/r/879580

Change 882175 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/extensions/ExternalGuidance@master] Reverse key-value for subdomain-migrated language code "be-tarask" / "be-x-old" for DomainCodeMapping

https://gerrit.wikimedia.org/r/882175

Change 882173 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Reverse key-value for subdomain-migrated language code "be-tarask" / "be-x-old" from language-domain-mapping

https://gerrit.wikimedia.org/r/882173

Change 882172 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Reverse key-value for subdomain-migrated language code "be-tarask" / "be-x-old" for DomainCodeMapping

https://gerrit.wikimedia.org/r/882172

Change 882791 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-01-23-123356-production

https://gerrit.wikimedia.org/r/882791

Change 882175 merged by jenkins-bot:

[mediawiki/extensions/ExternalGuidance@master] Reverse key-value for subdomain-migrated language code "be-tarask" / "be-x-old" for ExternalGuidanceDomainCodeMapping

https://gerrit.wikimedia.org/r/882175

Change 884493 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/extensions/SiteMatrix@master] SiteMatrix: Show the actual (non-deprecated) language code for deprecated language codes

https://gerrit.wikimedia.org/r/884493

Change 884494 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[operations/mediawiki-config@master] SiteMatrix config: Use the actual (non-deprecated) language code for deprecated language codes

https://gerrit.wikimedia.org/r/884494

Change 884495 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/core@master] SiteConfiguration: Use the actual (non-deprecated) language code for deprecated language codes in siteFromDB

https://gerrit.wikimedia.org/r/884495

Change 882791 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-02-02-004918-production

https://gerrit.wikimedia.org/r/882791

Mentioned in SAL (#wikimedia-operations) [2023-02-02T06:17:08Z] <kart_> Updated cxserver to 2023-02-02-004918-production (T129470, T172035, T327842)

Change 887854 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/core@master] Support gradual migration of language links from deprecated codes - tracking categories

https://gerrit.wikimedia.org/r/887854

Change 884493 merged by jenkins-bot:

[mediawiki/extensions/SiteMatrix@master] SiteMatrix: Show the actual (non-deprecated) language code for deprecated language codes

https://gerrit.wikimedia.org/r/884493

Change 884494 merged by jenkins-bot:

[operations/mediawiki-config@master] SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes

https://gerrit.wikimedia.org/r/884494

Mentioned in SAL (#wikimedia-operations) [2023-08-21T13:11:02Z] <zabe@deploy1002> Started scap: Backport for [[gerrit:884494|SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes (T172035 T111876)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-21T13:12:31Z] <zabe@deploy1002> zabe and wsung: Backport for [[gerrit:884494|SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes (T172035 T111876)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-21T13:18:55Z] <zabe@deploy1002> Finished scap: Backport for [[gerrit:884494|SiteMatrix config: Add actual (non-deprecated) language code for deprecated language codes (T172035 T111876)]] (duration: 07m 53s)

Change 953650 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[operations/mediawiki-config@master] SiteMatrix config: Remove deprecated language codes from the list

https://gerrit.wikimedia.org/r/953650