Make creating a new Language project easier
Open, Needs TriagePublic

Description

Meta-comments:

  • This task is somewhat similar to T158730, but from the end-user perspective. The end-user here is a person, or a group of people, who want to create a project in a new language, most often a Wikipedia.
  • If I subscribed you to this task, I believe it will interest you. If it doesn't please feel free to unsubscribe yourself, and accept my apologies.

The current process for creating a wiki in a new language is fully documented at https://meta.wikimedia.org/wiki/Language_proposal_policy , but I'll write something brief and practical here:

  • Make sure you have a standard ISO 639 language code. Don't proceed if you don't.
  • Add a language to translatewiki.net (technically, to UniversalLanguageSelector's langdb and translatewiki.net's LanguageSettings.php)
    • Translate the most-used MediaWiki messages. (About 500.)
  • Add a language to Incubator, by creating a main page at Wp/languagecode (and replace "Wp" with another project code if it's not Wikipedia).
    • Get a lot of people to write a lot of articles. The current threshold for approval is not precisely defined, but a rule of thumb is ~5 people working for three months, and several hundreds of articles.
  • Get the Language committee to approve the project, if the above things were done.
    • The Language committee assesses the fulfillment of the above points, and asks for approval from a third-party expert who knows the language.
  • If all of the above is done, create the project. Task T158730 and the page https://wikitech.wikimedia.org/wiki/Add_a_wiki describe this long and mostly-manual technical process.

This process could be better.

  • Adding a new language to translatewiki.net usually works well, although there were several complaints of languages that took months to get added. Usually it should take just a couple of days if the ISO code is valid. Perhaps something could be improved in this process.
  • Getting to the import threshold is a bit harder, however:
    • The Most-used messages list is close to to the import threshold of ~490 core messages, but doesn't correspond to it directly, because some of the most-used messages come from extensions. This may cause a situation in which a project has all the most-used messages translated and fulfills all the other Language Committee requirements, but doesn't actually have the message imported from translatewiki.net to the core MediaWiki code repository, so the project in this language will not have proper localization unless somebody verifies that the language was added to Names.php and the import worked. I usually do it for projects that are about to be created, but there's no proper procedure here, and this could be more automatic.
  • Incubator is hard to use. Its advantage is that having all the tiny new projects in one wiki makes it easier for the sysops to combat vandalism.
    • Some disadvantages of the current Incubator:
      • Having to write prefixes on page names and on every internal link manually is very tedious.
      • Counting pages is tedious.
      • Categorization is tedious and redundant.
      • Interlanguage links are problematic—they can only be added the old way and not with Wikidata (T54971).
      • ContentTranslation doesn't work, even though it could be one of the most useful features for the Incubator. It's fixable (T89089), but requires resources.
    • A possible solution is to make creating an actual new experimental domain easier (T158730), but with special requirements:
      • Patroling all such domains for vandalism must be at least as easy as it is for Incubator.
      • Content Translation must work, with all its features, at least for translating into the new language. This must include the set up of all the services (cxserver, Parsoid, etc.)
      • Visual Editor must work. (Already works in Incubator, but I'm making sure that it's noted explicitly.)
      • In addition to Content Translation and Visual Editor, most usual extension need to be installed and configured, if they are installed on wikis in most other languages.
      • The MediaWiki translations should be exported from translatewiki.net. Preferably, this should be done even before reaching the import threshold.
      • Internal search must work. Noting it here because this was an issue several times with newly-created projects in the past.
      • The domain must be temporary, for example for a year. It must be easy to destroy the domain without a difficult closing process if it proves to be inactive, spammy, or too prone to vandalism. This would include removing all the services configuration, search indexes, Wikidata edits in this language, etc. (This shouldn't have to be done if the content that was written is actually good.)
      • The domain may have a special configuration of visibility to search engines. (This actually shouldn't be a hard requirement. Is it really harmful if Google indexes it? Probably not.)
      • The domain may have a different URL structure from a usual project. For example, languagecode.incubator.wikipedia.org. However, this must not disturb the services, and this must be easy to rename to a standard domain (languagecode.wikipedia.org). And again, it's OK if it's just languagecode.wikipedia.org right from the start as long as patroling for vandalism works reasonably well.

The biggest known technical hurdle to implementing this is, again, T158730, but it's certainly not the only one.

Thanks for reading so far. This is a big idea. It will take a long time. This is just the initial exploration. Everybody's thoughts are welcome.

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

For example, including Translatewiki within the SUL of Wikimedia projects is a simple thing that would simplify a lot of things.

It is not. In fact, it is very difficult due to both technical and non-technical reasons.

We don't ask French speakers to look at their Unicode when they arrive on Wikimedia projects for example

It's just not possible to create Wikipedia if we don't know which language it will use. The language code is required for everyone to know what language it is. Aside, even Wikipedia readers in French need to know that their language code is fr if they want to access fr.wikipedia.org directly.

Nemo_bis I'm not saying those things are useless and not important. I'm saying it is problematic to ask new contributors to take care of it.

It is, but we need to realise that MediaWiki is at the frontier of language support worldwide. People asking for new languages hopefully understand that they are pioneers, with all it involves. We can support them, but they're still the heroes of the story.

In fact many languages are now supported only in MediaWiki, when minority
languages are not supported anywhere else in the software industry (except
formally in standards for assigning tem a language code, which is typically
used first for archiving purpose by librarians (that use these codes in
their database to sort their corpus collection, most of them being
computerized only in form of facsimiles, or for precious books in rare
libraries and museums that cost a lot to preserve them in good state).
But there's no standard existing for the basic needs for full computer
support. The CLDR is still the first and only working project to create
basic localisation data that will allow initiate the repositories we need,
but many languages do not have the minimum support in CLDR with its basic
coverage level.

So the first step is to participate to the CLDR project to complete this
basic coverage. Then go to translatewiki.net: I think it already supports
almost automatically any new language that has reached the basic coverage
level in CLDR (which is updated about once a year). Note that the CLDR
process is just now reopened for a limited period (which is generally not
longer than a couple of months, but usually only one month for the
submission process, then one month for the vetting period, and a few weeks
later there's a new version, and the rest of the year is for fixing bugs
and compatibility problems, or discussing about new rulesets to integrate
in the project, or resynchronize with new updates of Unicode data and
algorithms, but very few is added or changed for the basic coverage).
As soon as a CLDR edition is released, go to translatewiki.net and start
localizing the project (you'll need to add a few missing items, notably fix
the plural rules with help of administrators that will allow you to perform
some tests with examples in a test project. When the new language code is
ready, you can start translating Mediawiki in that language on
translatewiki.net. When you'll reach the minimum coverage needed, you'll
need to ask Wikimedia adminsitrators (notably those in Commons) to
integrate the language code: you'll be able to experiment the translation
of the UI by testing existing I18 templates or translating some image
captions. the Universal Language Selector will be updated to include that
language that you'll be able to experiment.

If all goes well, then you can ask for adding a beta project for Wikipedia
or Wikitionnary in Incubator and see what to do: this is the complex task
that should be improved: for now Incubator uses a common database for all
languages, so pagenames are restricted and it's difficult to really create
an internationalization and difficult to create relevant links in articles
(you need using pagename prefixes everywhere, but you cannot use the
translate extension which requires suffixes instead. Ideally this process
should be better managed by using a separate database for each language,
even if there's a single SUL database for user accounts and user pages in
Incubator, and a unified list of privileges across all subprojects in
Incubator: Incubator user pages just need to be shared across multiple
separate wikis instead of being created in their local database (just
consider what has been done for the File/Image or Media namespaces: User
pages/talk pages could just be special namespaces with negative index
instead of the pair of regular namespaces with positive IDs). This means
that user pages will be actually stored in a separate wiki for incubator as
a whole, but that has no other namespace as its own root namespace (used
only for language selection and home page and even for its cross-project
coordination) and its own project namespace will be infact an alias to the
root namespace.

Some other namespace for internationalization templates could also be
shared across all incubator projects, but actully stored in the "Incubator"
wiki itself, it could be "i18n:". This would allow using those templates to
build localized templates without having to reinvent the wheel. The
templates used in "i18n:" should just be able to determine themselves on
which wikis they are imported: "incubator" itself using English as default
or other localized subprojects for specific languages using that localized
subproject code to select another default, but in all cases the user's UI
language can be set independantly (there's already a standardmagic keyword
that returns the default language for the local wiki, it should be used in
"i18n:" everywhere as the default instead of assuming English. What would
be in "i18n:" ? basically various tools to create infoboxes, navbars, and
manage the generic layout.

The "Module:" namespace could also be shared by all subprojects in
Incubator, without needing any duplication. It should support
internationalization by default (with common modules such as language
switches, direction switch, language fallbacks, specific font size settings
for some scripts...) Many languages with new scripts still not supported in
Commons may start there with less risk. If later the incubator subprojects
will become a standard wiki (with a separate users database), no pages need
to be edited. Only the "i18n:" and "modules" shared modules will be
imported ("i18n:" will be imported in "Template:Ii18n:" i.e. in the new
"Template" namespace). No user accounts will be converted (we will continue
to work with SUL, users will need to import their user pages and talkk
pages if needed, but they can still access to their "Incubator:User:" and
"Incubator:User_talk:" pages; if they connect on the new wiki the first
time and there's still no local page on the new non-incubator wiki, they
could be proposed as tool to import their existing user pages from
"Incubator:user:" that will be left unedited, or could be proposed to
create a default talk page containing a soft redirect to their user page on
"incubator'", or to their "home" wiki or anotherwiki of their choice.)

Now if a wiki goes from standard to incubator, their namespaces may be
imported to incubator, except user pages (that will be archived and made
readonly). This includes all "template:" and "category:", as well as their
local "file:" description pages. But their local modules will not be
importable without renaming them with some "ProjectCode/language-code/"
prefix

@Verdy_p: Please structure longer comments (e.g. by using section headings) to decrease the likeliness that some people might skip such comments ("tl;dr"). Thanks!

It is structured, each paragraph has its topic.

[offtopic] @Verdy_p: In the future, please follow my recommendation to use section headings for longer comments if you want people to read your comments. Thanks.

Hello. I agree with @Amqui, translatewiki.net should take advantage of the Wikimedia SUL.

Wikimedia Canada plans to create Wikipedias in several Canadian Aboriginal languages in the future (~50 languages). We started the project with the Atikamekw because young people still speak atikamekw, they have computers and internet access to their school.

For these reasons, we thought it would be easier to start with the Atikamekw Nation (~6000 people). But we realize that this is not easy at all, the smallest barrier or obsctacle is a big problem for these new volunteers.

Wikimedia gives access to almost 300 languages, that is exceptional, it's true. But the Wiki languages that remain to be created often have little speakers, and/or new to technology.

We have to simplify their tasks, and having SUL for translatewiki is part of this simplification process. For us, techno-lovers, it look like a small step... but believe me, every little obstacle is a huge thing for everyone.

Please consider SUL for translatewiki. Thank you.

@Benoit_Rochon Huh, using WMF SUL accounts to login to TWN? Are you sure that you're not trying to violate Privacy policies, where both can also contain differents?

Amire80 updated the task description. (Show Details)Jul 12 2017, 12:00 PM
Amire80 updated the task description. (Show Details)Jul 12 2017, 12:04 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 3 2017, 10:48 AM
TheDJ added a subscriber: TheDJ.May 21 2018, 9:13 AM

Hello,

I was asked by @Amire80 to think about this. I generally agree with the idea to simplify getting a wiki (even more, I agree with an idea to simplify everything, but that's out of this task's scope).

Firstly, we must have nice, smart and robust addWiki.php that will handle literally everything, including DNS, Apache and MW configrations and what it does right now. Then, we should have closeWiki.php as well, that will do the closing. As the last thing, we need migrateWiki.php to migrate from one cluster to another (technically, that will include closing the old wiki, creating new one and exporting/importing the content). Once completed, we can setup a frontend allowing to execute that scripts, with limited access for the LangCom plus sysadmins (who are the ones who take care about all wikis right now, language wikis will be handled entriely by LangCom, special wikis entriely by sysadmins).

I'd just create third "cluster" (not sure if that's the right word). Right now, we have production and beta cluster. We can create incubator cluster, that will not be connected with any other cluster and/or wiki. There will be no local sysops (either the group won't exist or won't be assigned) or bureacurats or every other group and all the rights should be global. Also, the changes should be directed to one IRC channel and another interface can simulate global recent changes (so the patrolling requirement will be solved).

Once the cluster will be started, then LangCom can - after receiving an request for new wiki - quickly assess it, to ensure there is at least tiny chance for getting a working project and then create a wiki. That wiki should be temporary (1 years? 2 years?) and then auto-close (watchdog will do). This should be renewable by the LangCom. This is because I think it is a good idea to have real difference between approved wikis and to-be-approved wikis.

When the LangCom think the wiki can be pushed into permanent and SUL project, it will just migrate it.

I think that this solution can be useful for language wikis, but also for special wikis (we can just skip the incubator thing, click and being completed), because they are also important. I don't think this process is to be abused, as it will depend entriely on LangCom, just as it is now (sysadmins usually won't decline to create a wiki upon LangCom member request).

Is there something I'm missing? I'll be happy to read comments of others.

Looking forward for this,
Martin

Amire80 updated the task description. (Show Details)Jul 4 2018, 2:50 PM
jhsoby added a subscriber: jhsoby.Jul 25 2018, 12:34 AM

In principle, I’m interested. I need to understand how it’s going to work.

I'm supportive of Amir's initiative, both as a LangCom member and as a promoter of under-resourced languages. I wouldn't call myself computer-illiterate but the intricacies of starting a project on Incubator (wt:lag) and of getting a language into Wikidata (ULS, translatewiki requirements and all that) was a learning curve way too steep for my skills set. If a highly educated linguist like me finds it almost too difficult, it must be close to impossible for the language communities themselves. Change is inevitable, so please simplify the processes of the incubator and related activities for supporting under-resourced languages. Thanks!

I gave a talk at Wikimania last week where the Incubator difficulties are part of the narrative: https://commons.wikimedia.org/wiki/File:PG-Slides-Wikimania18.pdf. The project in question is Wp/hz. I'm not a native but have tried to reach out to this community. I'm hz-0.5, trying to get to 1.

TLDR: The vicious circle is: (1) Language in Incubator -> (2) severe restrictions in reading and editing -> (3) No edits -> continue with (1)

Why do we have to first create an editing community before making editing a bit easier? Can Wp/hz be reactivated (url is already there), not temporary but for as long as someone volunteers to remove spam?

I don't get many of the technical details above but I support anything that makes editing in small languages simpler, lets Google index existing articles, and gives communities, however small, the possibility to administer their project.

I also support this new approach to the incubator. Here in Latin America, I've seen firsthand how intimidating and not user-friendly the current incubator platform can be for some interested community members. I think these changes would have a positive effect towards encouraging new groups to participate in Wikipedia in their languages.

alanajjar added a subscriber: alanajjar.

Going back to @Urbanecm 's comments above:

With a handful of modifications, that's probably a reasonable way to start. Current Incubator 'crats/sysops would be "stewards" or "global sysops" or something like that within the cluster, and would certainly have 'crat-like rights within the cluster. We would have local sysops; their sysop rights would be shrunk to resemble the current test-admin rights group, and the cluster "global-crats" would assign them for a year, as we do now.

With respect to semi-automatic closures after 1–2 years:

  • Yes, on the whole, we should do that.
  • That said, I've seen some tests on Incubator build up slowly and steadily over longer periods of time. I suppose the question in this case would be, "If such an incubation community continues to make steady progress over 2 years, but is not really "ready for prime tiime" the way we currently view that, do we just let the incubation subdomain stay open, or do we let them become a permanent wiki anyway?
  • The other question to ask is this: Suppose a community goes dormant pretty quickly. If there are only a handful of pages in it, it may not be worth bothering to keep around, and can be deleted outright. (I keep asking myself if I should seek a policy change at Incubator that any test with fewer than n mainspace pages [5? 10?] that is dormant for 2 years [excepting maintenance edits by someone like me] simply gets deleted so we don't have to keep maintaining it.) Still, if a project develops 50–100 pages and then goes dormant, there's probably enough content to make worth keeping. So do we move it back into Incubator? Let an incubation subdomain continue to exist (but stay dormant)? Create an .xml file to archive, then delete? Move it to Incubator Plus?

As I said to @Amire80 on the LangCom discussion board, I strongly favor the portion of this proposal where the projects closest to approvability get moved. There is much upside and little downside to that. The portion that I greatly worry about is that this proposal not turn the new wiki scene into the Wild West that existed here in the 2007–9 era.

Note that there's absolutely NO need to create "temporary" domains for languages codes and project in Incubator. We can just use the existing interwiki prefixes as they work now as rewriter rules for URLs, and they can already be resolved in the incibunator domain and its path structure.
All that is needed is to path the rewrite rules for their language prefix and project prefix.
This way we could still use normal interwiki links across all projects so that "lang:Articlename" in any wikipedia edition or in any wikipedia incubator subproject will link to "incubator:Wp/lang:Articlename". and this would also apply to Wikidata which could also accept already Wikipedia links using also "lang:Articlename" instead of "incubator:Wp/lang:Articlename".

In fact Incubator just exists because there's a need to have single user accounts and privileges to manage a large group of wikis with the same rights (the need to use a single database for that is an old requirement, we could just have a configuration in the wiki farms that allows several wikis to share the same user pages and user talk pages or the same namespaces (such as templates or images and a specific "Project:"="Incubator:" namespace with its talk page), all stored in a singled shared database for "incubator" (having itself NO article space and no default TALK space, except for user talk pages): these databases would just then need to contain their own article space, the rest being shared (we can already benefit of the Unified Login across wikis).

We should be able to create wikis (even outside wikipedia) by specifiying the namespaces to use locally or from another shared wiki, and allow "mounting" on a shared wiki several namespaces for articles stored in specific subwikis, mounting them on a common prefix like "Wp/lang/" (which would no longer be part of the "page name").

This would be useful as well to work with some "yearly" wikis, where we open a new specific wiki each year, mounted on a shared root, and whose "mounted" article namespace would be closed and archived the next year. It should not be necessary to have a different domain name for each wiki database, and in fact not even necessary to have multiple database instances (useful for those that want to create multiple wikis on the same domain and in the same database instance).

All that is needed is to extend the concept of "namespaces" (and allow them to be configured to be bound in the wiki using "/subpaths/" or "prefixes:")

@Verdy_p, your proposal sounds far more complicated. Adding URL rewrite rules and a lot of namespaces will make things very different from how usual wikis work. One of the central points of the proposal is making it easy work with Wikidata and Content Translation, both of which assume that there is one language per wiki. Creating a new wiki shouldn't really be complicated. It's just a matter of automating the current wiki creation procedure, and @Urbanecm says that it's doable.

I have a suggestion related to Incubator: stop limiting the language codes to 3 characters. That way, IETF language code support would be vastly improved.

Dzahn added a subscriber: Dzahn.Aug 1 2018, 7:04 PM

stop limiting the language codes to 3 characters

The following languages with more than 3 characters already exist in production:

~/dns/templates/helpers$ cut -d\' -f2 langs.tmpl | grep -E '^[a-z-]{4,}'
bat-smg
be-tarask
be-x-old
cbk-zam
fiu-vro
map-bms
minnan
nds-nl
roa-rup
roa-tara
simple
zh-cfr
zh-classical
zh-min-nan
zh-yue

KuboF added a subscriber: KuboF.Aug 1 2018, 7:55 PM

My suggestion is not just for Wikimedia wikis. This is a general need for deployement of various wikis which would like to be more flexible in what is shared and what is not, and without necessarily needing a specific domain for each wiki sharing common namespaces (notable "User:" and "User talk:", as well as user preferences for a single registration, possibly even other namespaces like "Template:", "Template talk:", "Module:", "Module talk:", "File:", "File talk:", "Category:", "Category talk:", "Help:", "Help talk:"; with only "Project:", "Project talk:", being specific, and hosted under their own "interwiki" code).

Namespaces are the basic component to do that, and each namespace can have its own URL rewrite rules and resolution, depending from which namespace it is used.

So this is a desirable goal for Mediawiki itself. And would also address the question of test/incubator wikis in Wikimedia, or yearly conference wikis, or specific maintenance.

Basically each single wiki instance just needs only 2 namespaces, all other ones (including special namespaces) being sharable on a main instance. And instances do not necessarily need their own database instance (sharing the database instrance also allows sharing the SQL admins and privileges for "special" pages, given they are also unified using the same "user (talk):" namespace).

Note: we also need flexibility for how to map translations (also for each namespace froim which they are looked up): in namespaces, or pagename prefixes, or in "/suffixed" subpages. This would require improving the setup of the "Translate" tool.

Verdy_p added a comment.EditedAug 2 2018, 11:11 AM

stop limiting the language codes to 3 characters

The following languages with more than 3 characters already exist in production:

~/dns/templates/helpers$ cut -d\' -f2 langs.tmpl | grep -E '^[a-z-]{4,}'
(...)

bat-smg -> aliased to "sgs"
be-tarask -> conforming to BCP 47
be-x-old -> aliased to "be-tarask"
cbk-zam -> should be aliased to ???
fiu-vro -> aliased to "vro"
map-bms -> aliased to "bms"
minnan -> aliased to "nan"
nds-nl -> conforming to BCP 47
roa-rup -> aliased to "rup"
roa-tara-> should be aliased to "it-x-tara"
simple -> should be aliased to "en-x-simple"
zh-cfr -> aliased to "nan"
zh-classical -> aliased to "lzh"
zh-min-nan -> aliased to "nan"
zh-yue -> aliased to "yue"
nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

Liuxinyu970226 added a comment.EditedAug 4 2018, 3:25 AM

@Verdy_p:

cbk-zam -> should be aliased to ???

should be renamed back to cbk, see T124657

map-bms -> aliased to "bms"

Huh? Banyumasan = Bilma Kanuri?

simple -> should be aliased to "en-x-simple"

Just en-simple, no need to use "-x-" here.

nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

I'm afraid that this is currently contesting at this RFL page.

@Dzahn in addition to your list, there are also those codes that match your criteria existing in our Names.php:

'ady-cyrl' => 'адыгабзэ', # Adyghe
'aeb-arab' => 'تونسي', # Tunisian Arabic (Arabic Script)
'aeb-latn' => 'Tûnsî', # Tunisian Arabic (Latin Script)
'bbc-latn' => 'Batak Toba', # Batak Toba
'crh-latn' => "qırımtatarca (Latin)\u{200E}", # Crimean Tatar (Latin)
'crh-cyrl' => "къырымтатарджа (Кирилл)\u{200E}", # Crimean Tatar (Cyrillic)
'de-at' => 'Österreichisches Deutsch', # Austrian German
'de-ch' => 'Schweizer Hochdeutsch', # Swiss Standard German
'de-formal' => "Deutsch (Sie-Form)\u{200E}", # German - formal address ("Sie")
'en-ca' => 'Canadian English', # Canadian English
'en-gb' => 'British English', # British English
'es-419' => 'español de América Latina', # Spanish for the Latin America and Caribbean region
'es-formal' => "español (formal)\u{200E}", # Spanish formal address
'gan-hans' => "赣语(简体)\u{200E}", # Gan (Simplified Han)
'gan-hant' => "贛語(繁體)\u{200E}", # Gan (Traditional Han)
'gom-deva' => 'गोंयची कोंकणी', # Goan Konkani (Devanagari script)
'gom-latn' => 'Gõychi Konknni', # Goan Konkani (Latin script)
'hif-latn' => 'Fiji Hindi', # Fiji Hindi (latin)
'hu-formal' => "magyar (formal)\u{200E}", # Hungarian formal address
'ike-cans' => 'ᐃᓄᒃᑎᑐᑦ', # Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
'ike-latn' => 'inuktitut', # Inuktitut, Eastern Canadian (Latin script)
'kbd-cyrl' => 'Адыгэбзэ', # Kabardian (Cyrillic)
'kk-arab' => "قازاقشا (تٴوتە)\u{200F}", # Kazakh Arabic
'kk-cyrl' => "қазақша (кирил)\u{200E}", # Kazakh Cyrillic
'kk-latn' => "qazaqşa (latın)\u{200E}", # Kazakh Latin
'kk-cn' => "قازاقشا (جۇنگو)\u{200F}", # Kazakh (China)
'kk-kz' => "қазақша (Қазақстан)\u{200E}", # Kazakh (Kazakhstan)
'kk-tr' => "qazaqşa (Türkïya)\u{200E}", # Kazakh (Turkey)
'ko-kp' => '조선말', # Korean (DPRK), T190324
'ks-arab' => 'کٲشُر', # Kashmiri (Perso-Arabic script)
'ks-deva' => 'कॉशुर', # Kashmiri (Devanagari script)
'ku-latn' => "kurdî (latînî)\u{200E}", # Northern Kurdish (Latin script)
'ku-arab' => "كوردي (عەرەبی)\u{200F}", # Northern Kurdish (Arabic script) (falls back to ckb)
'nl-informal' => "Nederlands (informeel)\u{200E}", # Dutch (informal address ("je"))
'pt-br' => 'português do Brasil', # Brazilian Portuguese
'ruq-cyrl' => 'Влахесте', # Megleno-Romanian (Cyrillic script)
# 'ruq-grek' => 'Βλαεστε', # Megleno-Romanian (Greek script)
'ruq-latn' => 'Vlăheşte', # Megleno-Romanian (Latin script)
'shi-tfng' => 'ⵜⴰⵛⵍⵃⵉⵜ', # Tachelhit (Tifinagh script)
'shi-latn' => 'Tašlḥiyt', # Tachelhit (Latin script)
'shy-latn' => 'tachawit', # Shawiya (Latin script) - T194047
'skr-arab' => 'سرائیکی', # Saraiki (Arabic script)
'sr-ec' => "српски (ћирилица)\u{200E}", # Serbian Cyrillic ekavian
'sr-el' => "srpski (latinica)\u{200E}", # Serbian Latin ekavian
'tg-cyrl' => 'тоҷикӣ', # Tajiki (Cyrllic script) (default)
'tg-latn' => 'tojikī', # Tajiki (Latin script)
'tt-cyrl' => 'татарча', # Tatar (Cyrillic script) (default)
'tt-latn' => 'tatarça', # Tatar (Latin script)
'ug-arab' => 'ئۇيغۇرچە', # Uyghur (Arabic script) (default)
'ug-latn' => 'Uyghurche', # Uyghur (Latin script)
'uz-cyrl' => 'ўзбекча', # Uzbek Cyrillic
'uz-latn' => 'oʻzbekcha', # Uzbek Latin (default)
'zh-cn' => "中文(中国大陆)\u{200E}", # Chinese (PRC)
'zh-hans' => "中文(简体)\u{200E}", # Mandarin Chinese (Simplified Chinese script) (cmn-hans)
'zh-hant' => "中文(繁體)\u{200E}", # Mandarin Chinese (Traditional Chinese script) (cmn-hant)
'zh-hk' => "中文(香港)\u{200E}", # Chinese (Hong Kong)
'zh-mo' => "中文(澳門)\u{200E}", # Chinese (Macau)
'zh-my' => "中文(马来西亚)\u{200E}", # Chinese (Malaysia)
'zh-sg' => "中文(新加坡)\u{200E}", # Chinese (Singapore)
'zh-tw' => "中文(台灣)\u{200E}", # Chinese (Taiwan)

Frankly, do we really need things like "-formal" and "-informal"? They can't be recognized by browser as no browser think that both are country codes.

Anyway, the

'eml' => 'emiliàn e rumagnòl', # Emiliano-Romagnolo / Sammarinese

Should also be contested because of T36217

@Verdy_p:

cbk-zam -> should be aliased to ???

should be renamed back to cbk, see T124657

map-bms -> aliased to "bms"

Huh? Banyumasan = Bilma Kanuri?

simple -> should be aliased to "en-x-simple"

Just en-simple, no need to use "-x-" here.

Has the "simple" variant been registered in the IANA database for BCP 47 ? If not, we need the "-x-" because it is a private extension in Wikimedia.

nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

I'm afraid that this is currently contesting at this RFL page.

NO This is the the same reason why we need to migrate existing "nrm" data to "nrf", so that Narom can be finally assigned the code (that's why the aliasing redirect can only be temporary to do the migration). There's no contestation here the request for new language for Narom is valid and pending since too long (well it can still be allocated in Incubatoro for Narom, given that there's no longer any current Norman data in Incubator, except some read-only archives that can be renamed to "nrf" too if one still needs them).

But most migration to do will be in Wikipedia and Wiktionnary. I think we can leave aside the migration of user talk pages (users will do this cleanup themselves even if their past links are now broken by going to a Narom page or nowhere instead of the past Norman page). In Wikidata, this migration can be easily automated by bots.

Frankly, do we really need things like "-formal" and "-informal"? They can't be recognized by browser as no browser think that both are country codes.

And they have absolutely no reason to recognize them as "country codes" (actually they are "region subtags" and not restricted to just "country codes", en they include also territories/dependency codes from ISO3166-1, and continental area codes from UN M.49, but exclude some codes from ISO 3166-1; note that ISO 3166-2 codes are not used at all as region subtags in BCP 47, and that the use of "region codes" is a legacy, deprecated in favor of ISO 639-3 codes for more specific languages already encoded as members of a registered macrolanguage, itself being registered in the IANA database).

Lots of legacy codes have been kept valid in BCP 47 but the extension mechanism has been simplified and formalized so that fallback resolvers will work as intended (fallback mechanisms are not part of ISO 639, only specified in BCP 47 which maintains the compatibility that the unstable ISO 639 never preserves, meaning that ISO 639 is always unreliable and should never be used as a "normative reference" for our use but only "informative" to exhibit how BCP 47 takes some of its sources; but the IANA database is still the only approved normative source of these codes, everything else is private annd should use private extension subtags).

The syntax recognized by BCP 47 parsers with "formal" and "informal", is the one for "registered language variants subtags", they would be valid and accepted by browsers if they were registered in the IANA database.

(This discussion of language codes is not really related to the topic of this task. There are better places to have it. Thank for understanding.)

I have a suggestion related to Incubator: stop limiting the language codes to 3 characters. That way, IETF language code support would be vastly improved.

In general, we are not encouraging that going forward. And you know that, as far as it goes.

Dzahn removed a subscriber: Dzahn.Aug 7 2018, 11:52 PM

The initial discussion in the Incubator wiki mostly supports this idea:
https://incubator.wikimedia.org/w/index.php?title=Incubator:Community_Portal&oldid=4370387#A_proposal_for_a_big_reform_of_the_Incubator

Remaining details to discuss:

  • How vandalism monitoring will work. There is some support for @Urbanecm's initial proposal (a few comments above), but it may need some more details before actually going forward.
  • Detailed steps will have to be discussed: which wikis will be moved out of the current incubator into the new wikis, what to do with the less active Incubator projects, etc.
  • The decision whether to close the Incubator wiki or leave it open is not part of this proposal. For now it stays open.

(Have I forgotten any important points?)

@Amire80, I'd venture that for now, since closing Incubator is not part of the proposal, we also don't need to discuss (yet) what to do with the less active Incubator projects.
In my personal view, there is clear consensus to begin the work to make this happen. Still, while the Incubator community agrees in principle that moving all active test projects into a space like the one we're proposing is a good idea, there is still enough concern about not creating a "Wild West" of new subdomains that we should be focusing on (1) putting the infrastructure in place and (2) deciding on which projects should be moved to test this infrastructure. For the moment, we're still putting new projects in Incubator.
Also, to clarify a couple of other things:

  • At the moment, I don't think we're discussing closing Beta Wikiversity (even in the longer run) and having all new Wikiversity tests start in one of these spaces. Tests nearing approval could be eligible, though. (At the moment, the only one remotely active enough is hewikiversity.)
  • I think a well-developed Wikisource looking to move to a subdomain can also be eligible. But the default for new Wikisources is going to remain Mulitilingual (Old) Wikisource, for a variety of very good reasons.

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

    I have no idea how to answer their concerns.

Thanks for forwarding this. Where was this posted?

Liuxinyu posted it, not me. But Wolverène posted it at the end of the discussion in Incubator (before my closing section).

Sent from Outlookhttp://aka.ms/weboutlook


From: Amire80 <no-reply@phabricator.wikimedia.org>
Sent: Monday, August 27, 2018 4:56 AM
To: koala19890@hotmail.com
Subject: [Maniphest] [Commented On] T165585: Make creating a new Language project easier

Amire80 added a comment.

In T165585#4533109https://phabricator.wikimedia.org/T165585#4533109, @Liuxinyu970226https://phabricator.wikimedia.org/p/Liuxinyu970226/ wrote:

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Thanks for forwarding this. Where was this posted?

TASK DETAIL
https://phabricator.wikimedia.org/T165585

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Amire80
Cc: Ebe123, KuboF, KATMAKROFAN, alanajjar, Sahaquiel9102, Ooswesthoesbes, Barrioflores, Ninjastrikers, Baba_Tabita, StevenJ81, jhsoby, Pgallert, Yair_rand, KartikMistry, Kvardek_du, Urbanecm, TheDJ, HalanTul, PokestarFan, Liuxinyu970226, Hydriz, Eloquence, Verdy_p, Raymond, Nikerabbit, brion, Ijon, Nemo_bis, tstarling, SPQRobin, MF-Warburg, millosh, Amqui, Amire80, Aklapper, Jayprakash12345, Liudvikas, Srdjan_m, MuhammadShuaib, LNDDYL, Psychoslave, Luke081515, Gryllida, Shizhao, zeljkofilipin, Arrbee, Jay8g, greg

Amire80 added a comment.EditedAug 28 2018, 11:59 AM

Liuxinyu posted it, not me. But Wolverène posted it at the end of the discussion in Incubator (before my closing section).

It's a bit weird, I cannot find it.

But nevermind - I'll just reply here.

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.

The question of extinct language is explicitly not a part of this reform discussion, as I had already written in one of my emails on the Langcom mailing list: https://lists.wikimedia.org/pipermail/langcom/2018-July/002162.html

It's not a problem that this proposal is trying to resolve. It's a matter for policy discussion for Language committee. This reform is more on the technical side of how Incubator wikis are managed.

  1. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.

This is also a matter of Langcom policy. If there are no native speakers, the incubator wiki is not supposed to be created. If it's created anyway, and no native speakers come along in reasonable time, this wiki should be closed in a fast-track process, as proposed from the beginning.

  1. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers.

There's nothing to understand there. A URL is transparent to the editor.

If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org .

Wiki markup is not an issue. Prefixes, however, are an issue. See below.

And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).

Deleting a whole wiki should be easier once the decision is made to delete it. The proposal suggests from the start to make it easy to delete. @Urbanecm, how difficult is it to delete a wiki? This was done with mo.wikipedia.org recently.

  1. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.

I worked with a lot of people writing in the Incubator: in Adyghe (ady), Dinka (din), Fon (fon), and in several other languages. It's one of the biggest hurdles for people, and it's completely artificial.

You can also see this discussed at this Wikimania presentation:

https://commons.wikimedia.org/wiki/File:Wikipedia_for_Indigenous_Communities.webm (especially after 27 minutes or so)

  1. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation.

Here are the changes that will benefit the people who read and write in the Incubator:

  1. Many wikis instead of one.
  2. No need to use prefixes.
  3. The possibility to use Wikidata.
  4. The possibility to use Content Translation.
  5. The possibility to search conveniently only in your language.

I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

This is not the intention. The intention is to make it easier to read and write in Incubator projects.

  1. If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?
  2. Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.
  3. Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?
  4. Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?
Amire80 added a comment.EditedAug 29 2018, 8:00 AM
  1. If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?

It's a very different kind of redesignation.

Moving zh-yue and zh-min-nan to yue and nan are difficult for reasons that are explained here: T172035: Blockers for Wikimedia wiki domain renaming.

These two domains, as well as several others such as nrm, als, simple, etc., were created many years ago, at a time when there was no Language committee, or the committee was less strict about standard language codes. Now it's much more strict, so it's not supposed to happen with new language codes.

Renaming these non-standard domains is a separate issue and it's not related to the Incubator.

Moving content from an incubator domain to a proper domain is more of a technical issue that should be resolved by engineers when this task will actually be executed. Input about this from Ops engineers is already welcome, but at this point I'm still trying to get functional community feedback and not to resolve the technical details.

This is supposed to become easier for everybody: the Incubator site readers, the Incubator site writers, the Language Committee, the engineers who manage domains and wiki installations, the vandalism patrolers, etc. If it makes anything harder for anybody, then it's a no-go, but we are now talking about ideas and not yet about the technical details of the implementation.

  1. Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.

No, this is not the intention at all. Eventually new languages are supposed to appear immediately in their own incubator wiki site without going through incubator.wikimedia.org. So the whole thing with prefixes on page titles and importing will be gone, and this means less bureaucracy, not more.

  1. Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?

Wikia uses completely different software and it has a lot of paid engineers working on this, so it's not really relevant.

  1. Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?

Yes, that's kind of the idea: the easiness of creating a new wiki site will be similar. No waiting for Ops people, no running scripts, no configuring databases by hand - all automatic (that's what @Urbanecm is talking about when he says "we must have nice, smart and robust addWiki.php"). However, there are several big differences:

WikiaNext-generation Incubator
Any language.Only eligible languages approved by Langcom
Any topic.Only Wikipedia, Wiktionary, Wikiquote, Wikibooks, Wikivoyage, Wikinews, and maybe Wikisource and Wikiversity.
Any number of wikis in every language.One wiki per language
Any web user can create a new wiki.Only people with permission granted by the Langcom can create a new wiki.

Getting the WMF to provide easy creation of a wiki for any topic in any language is a curious and valid idea, but it's completely out of the scope of this task.

Micru added a subscriber: Micru.Nov 9 2018, 4:21 PM
Liuxinyu970226 added a comment.EditedSat, Nov 17, 9:23 AM

I think that we should better not to discuss anythings about renaming domain, because this isn't what Incubator concern (@C933103 are you still thinking that semi-renaming i.e. closing old one->exporting to Incubator->creating new one->importing from Incubator is appropriate?)

@Liuxinyu970226 Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions

But what sort of impact would it have on e.g. user preference or wiki widget?

And also note that now different Cantonese wikiprojects seems to have different url prefixes

Base added a subscriber: Base.Sun, Nov 18, 12:04 AM

@C933103:

Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions

This question is what I originally wanna ask you, what's your reason that you ask back to me this?

But what sort of impact would it have on e.g. user preference or wiki widget?

Specific issues are needed to investigate the efforts of this question...

And also note that now different Cantonese wikiprojects seems to have different url prefixes

Do you even visit Special:SiteMatrix every day, which in theory that special page will have two entry lines of Cantonese? If not then what's your concern about this? I don't think that remote users are affected by this unless they're SiteMatrix fan.

@Amire80 Anything that the extension can do in this task? It seems to me that this task is more of a process issue than a software issue related to the WikimediaIncubator extension.

Nothing immediate, but it's close enough to the topic.

This project is near the end of the ideation phase and it's moving into the implementation planning phase, and some things may have to be done in it. For example, it may be useful to implement functions to analyze activity of projects in the extension, etc.

Hydriz added a comment.EditedMon, Nov 26, 10:56 AM

I have looked through all the comments, but I still don't see any actions related to the WikimediaIncubator extension. This task is about Wikimedia Incubator, which fits into incubator.wikimedia.org, but it is not defining what needs to be done about the extension. Can you perhaps put a list of items that needs to be done by the extension in the task description?

Soon, when we start the technical architecture work. (I'm not totally sure who "we" are, I'm working on it.)

Per last comment, re-add the tag when the technical architecture has actually started.