Page MenuHomePhabricator

Make creating a new Language project easier
Open, Needs TriagePublic

Description

Meta-comments:

  • This task is somewhat similar to T158730, but from the end-user perspective. The end-user here is a person, or a group of people, who want to create a project in a new language, most often a Wikipedia.
  • If I subscribed you to this task, I believe it will interest you. If it doesn't please feel free to unsubscribe yourself, and accept my apologies.

The current process for creating a wiki in a new language is fully documented at https://meta.wikimedia.org/wiki/Language_proposal_policy , but I'll write something brief and practical here:

  • Make sure you have a standard ISO 639 language code. Don't proceed if you don't.
  • Add a language to translatewiki.net (technically, to UniversalLanguageSelector's langdb and translatewiki.net's LanguageSettings.php)
    • Translate the most-used MediaWiki messages. (About 500.)
  • Add a language to Incubator, by creating a main page at Wp/languagecode (and replace "Wp" with another project code if it's not Wikipedia).
    • Get a lot of people to write a lot of articles. The current threshold for approval is not precisely defined, but a rule of thumb is ~5 people working for three months, and several hundreds of articles.
  • Get the Language committee to approve the project, if the above things were done.
    • The Language committee assesses the fulfillment of the above points, and asks for approval from a third-party expert who knows the language.
  • If all of the above is done, create the project. Task T158730 and the page https://wikitech.wikimedia.org/wiki/Add_a_wiki describe this long and mostly-manual technical process.

This process could be better.

  • Adding a new language to translatewiki.net usually works well, although there were several complaints of languages that took months to get added. Usually it should take just a couple of days if the ISO code is valid. Perhaps something could be improved in this process.
  • Getting to the import threshold is a bit harder, however:
    • The Most-used messages list is close to to the import threshold of ~490 core messages, but doesn't correspond to it directly, because some of the most-used messages come from extensions. This may cause a situation in which a project has all the most-used messages translated and fulfills all the other Language Committee requirements, but doesn't actually have the message imported from translatewiki.net to the core MediaWiki code repository, so the project in this language will not have proper localization unless somebody verifies that the language was added to Names.php and the import worked. I usually do it for projects that are about to be created, but there's no proper procedure here, and this could be more automatic.
  • The Incubator is hard to use. It should be possible to start a new independent test wiki for each new language instead of putting them all in one site with cumbersome prefixes. For a much more detailed explanation of this, see: T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes

The biggest known technical hurdle to implementing this is, again, T158730, but it's certainly not the only one.

Thanks for reading so far. This is a big idea. It will take a long time. This is just the initial exploration. Everybody's thoughts are welcome.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Amire80 updated the task description. (Show Details)Jul 4 2018, 2:50 PM
jhsoby added a subscriber: jhsoby.Jul 25 2018, 12:34 AM

In principle, I’m interested. I need to understand how it’s going to work.

I'm supportive of Amir's initiative, both as a LangCom member and as a promoter of under-resourced languages. I wouldn't call myself computer-illiterate but the intricacies of starting a project on Incubator (wt:lag) and of getting a language into Wikidata (ULS, translatewiki requirements and all that) was a learning curve way too steep for my skills set. If a highly educated linguist like me finds it almost too difficult, it must be close to impossible for the language communities themselves. Change is inevitable, so please simplify the processes of the incubator and related activities for supporting under-resourced languages. Thanks!

I gave a talk at Wikimania last week where the Incubator difficulties are part of the narrative: https://commons.wikimedia.org/wiki/File:PG-Slides-Wikimania18.pdf. The project in question is Wp/hz. I'm not a native but have tried to reach out to this community. I'm hz-0.5, trying to get to 1.

TLDR: The vicious circle is: (1) Language in Incubator -> (2) severe restrictions in reading and editing -> (3) No edits -> continue with (1)

Why do we have to first create an editing community before making editing a bit easier? Can Wp/hz be reactivated (url is already there), not temporary but for as long as someone volunteers to remove spam?

I don't get many of the technical details above but I support anything that makes editing in small languages simpler, lets Google index existing articles, and gives communities, however small, the possibility to administer their project.

I also support this new approach to the incubator. Here in Latin America, I've seen firsthand how intimidating and not user-friendly the current incubator platform can be for some interested community members. I think these changes would have a positive effect towards encouraging new groups to participate in Wikipedia in their languages.

alanajjar added a subscriber: alanajjar.

Going back to @Urbanecm 's comments above:

With a handful of modifications, that's probably a reasonable way to start. Current Incubator 'crats/sysops would be "stewards" or "global sysops" or something like that within the cluster, and would certainly have 'crat-like rights within the cluster. We would have local sysops; their sysop rights would be shrunk to resemble the current test-admin rights group, and the cluster "global-crats" would assign them for a year, as we do now.

With respect to semi-automatic closures after 1–2 years:

  • Yes, on the whole, we should do that.
  • That said, I've seen some tests on Incubator build up slowly and steadily over longer periods of time. I suppose the question in this case would be, "If such an incubation community continues to make steady progress over 2 years, but is not really "ready for prime tiime" the way we currently view that, do we just let the incubation subdomain stay open, or do we let them become a permanent wiki anyway?
  • The other question to ask is this: Suppose a community goes dormant pretty quickly. If there are only a handful of pages in it, it may not be worth bothering to keep around, and can be deleted outright. (I keep asking myself if I should seek a policy change at Incubator that any test with fewer than n mainspace pages [5? 10?] that is dormant for 2 years [excepting maintenance edits by someone like me] simply gets deleted so we don't have to keep maintaining it.) Still, if a project develops 50–100 pages and then goes dormant, there's probably enough content to make worth keeping. So do we move it back into Incubator? Let an incubation subdomain continue to exist (but stay dormant)? Create an .xml file to archive, then delete? Move it to Incubator Plus?

As I said to @Amire80 on the LangCom discussion board, I strongly favor the portion of this proposal where the projects closest to approvability get moved. There is much upside and little downside to that. The portion that I greatly worry about is that this proposal not turn the new wiki scene into the Wild West that existed here in the 2007–9 era.

Note that there's absolutely NO need to create "temporary" domains for languages codes and project in Incubator. We can just use the existing interwiki prefixes as they work now as rewriter rules for URLs, and they can already be resolved in the incibunator domain and its path structure.
All that is needed is to path the rewrite rules for their language prefix and project prefix.
This way we could still use normal interwiki links across all projects so that "lang:Articlename" in any wikipedia edition or in any wikipedia incubator subproject will link to "incubator:Wp/lang:Articlename". and this would also apply to Wikidata which could also accept already Wikipedia links using also "lang:Articlename" instead of "incubator:Wp/lang:Articlename".

In fact Incubator just exists because there's a need to have single user accounts and privileges to manage a large group of wikis with the same rights (the need to use a single database for that is an old requirement, we could just have a configuration in the wiki farms that allows several wikis to share the same user pages and user talk pages or the same namespaces (such as templates or images and a specific "Project:"="Incubator:" namespace with its talk page), all stored in a singled shared database for "incubator" (having itself NO article space and no default TALK space, except for user talk pages): these databases would just then need to contain their own article space, the rest being shared (we can already benefit of the Unified Login across wikis).

We should be able to create wikis (even outside wikipedia) by specifiying the namespaces to use locally or from another shared wiki, and allow "mounting" on a shared wiki several namespaces for articles stored in specific subwikis, mounting them on a common prefix like "Wp/lang/" (which would no longer be part of the "page name").

This would be useful as well to work with some "yearly" wikis, where we open a new specific wiki each year, mounted on a shared root, and whose "mounted" article namespace would be closed and archived the next year. It should not be necessary to have a different domain name for each wiki database, and in fact not even necessary to have multiple database instances (useful for those that want to create multiple wikis on the same domain and in the same database instance).

All that is needed is to extend the concept of "namespaces" (and allow them to be configured to be bound in the wiki using "/subpaths/" or "prefixes:")

@Verdy_p, your proposal sounds far more complicated. Adding URL rewrite rules and a lot of namespaces will make things very different from how usual wikis work. One of the central points of the proposal is making it easy work with Wikidata and Content Translation, both of which assume that there is one language per wiki. Creating a new wiki shouldn't really be complicated. It's just a matter of automating the current wiki creation procedure, and @Urbanecm says that it's doable.

I have a suggestion related to Incubator: stop limiting the language codes to 3 characters. That way, IETF language code support would be vastly improved.

Dzahn added a subscriber: Dzahn.Aug 1 2018, 7:04 PM

stop limiting the language codes to 3 characters

The following languages with more than 3 characters already exist in production:

~/dns/templates/helpers$ cut -d\' -f2 langs.tmpl | grep -E '^[a-z-]{4,}'
bat-smg
be-tarask
be-x-old
cbk-zam
fiu-vro
map-bms
minnan
nds-nl
roa-rup
roa-tara
simple
zh-cfr
zh-classical
zh-min-nan
zh-yue

KuboF added a subscriber: KuboF.Aug 1 2018, 7:55 PM

My suggestion is not just for Wikimedia wikis. This is a general need for deployement of various wikis which would like to be more flexible in what is shared and what is not, and without necessarily needing a specific domain for each wiki sharing common namespaces (notable "User:" and "User talk:", as well as user preferences for a single registration, possibly even other namespaces like "Template:", "Template talk:", "Module:", "Module talk:", "File:", "File talk:", "Category:", "Category talk:", "Help:", "Help talk:"; with only "Project:", "Project talk:", being specific, and hosted under their own "interwiki" code).

Namespaces are the basic component to do that, and each namespace can have its own URL rewrite rules and resolution, depending from which namespace it is used.

So this is a desirable goal for Mediawiki itself. And would also address the question of test/incubator wikis in Wikimedia, or yearly conference wikis, or specific maintenance.

Basically each single wiki instance just needs only 2 namespaces, all other ones (including special namespaces) being sharable on a main instance. And instances do not necessarily need their own database instance (sharing the database instrance also allows sharing the SQL admins and privileges for "special" pages, given they are also unified using the same "user (talk):" namespace).

Note: we also need flexibility for how to map translations (also for each namespace froim which they are looked up): in namespaces, or pagename prefixes, or in "/suffixed" subpages. This would require improving the setup of the "Translate" tool.

Verdy_p added a comment.EditedAug 2 2018, 11:11 AM

stop limiting the language codes to 3 characters

The following languages with more than 3 characters already exist in production:

~/dns/templates/helpers$ cut -d\' -f2 langs.tmpl | grep -E '^[a-z-]{4,}'
(...)

bat-smg -> aliased to "sgs"
be-tarask -> conforming to BCP 47
be-x-old -> aliased to "be-tarask"
cbk-zam -> should be aliased to ???
fiu-vro -> aliased to "vro"
map-bms -> aliased to "bms"
minnan -> aliased to "nan"
nds-nl -> conforming to BCP 47
roa-rup -> aliased to "rup"
roa-tara-> should be aliased to "it-x-tara"
simple -> should be aliased to "en-x-simple"
zh-cfr -> aliased to "nan"
zh-classical -> aliased to "lzh"
zh-min-nan -> aliased to "nan"
zh-yue -> aliased to "yue"
nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

Liuxinyu970226 added a comment.EditedAug 4 2018, 3:25 AM

@Verdy_p:

cbk-zam -> should be aliased to ???

should be renamed back to cbk, see T124657

map-bms -> aliased to "bms"

Huh? Banyumasan = Bilma Kanuri?

simple -> should be aliased to "en-x-simple"

Just en-simple, no need to use "-x-" here.

nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

I'm afraid that this is currently contesting at this RFL page.

@Dzahn in addition to your list, there are also those codes that match your criteria existing in our Names.php:

'ady-cyrl' => 'адыгабзэ', # Adyghe
'aeb-arab' => 'تونسي', # Tunisian Arabic (Arabic Script)
'aeb-latn' => 'Tûnsî', # Tunisian Arabic (Latin Script)
'bbc-latn' => 'Batak Toba', # Batak Toba
'crh-latn' => "qırımtatarca (Latin)\u{200E}", # Crimean Tatar (Latin)
'crh-cyrl' => "къырымтатарджа (Кирилл)\u{200E}", # Crimean Tatar (Cyrillic)
'de-at' => 'Österreichisches Deutsch', # Austrian German
'de-ch' => 'Schweizer Hochdeutsch', # Swiss Standard German
'de-formal' => "Deutsch (Sie-Form)\u{200E}", # German - formal address ("Sie")
'en-ca' => 'Canadian English', # Canadian English
'en-gb' => 'British English', # British English
'es-419' => 'español de América Latina', # Spanish for the Latin America and Caribbean region
'es-formal' => "español (formal)\u{200E}", # Spanish formal address
'gan-hans' => "赣语(简体)\u{200E}", # Gan (Simplified Han)
'gan-hant' => "贛語(繁體)\u{200E}", # Gan (Traditional Han)
'gom-deva' => 'गोंयची कोंकणी', # Goan Konkani (Devanagari script)
'gom-latn' => 'Gõychi Konknni', # Goan Konkani (Latin script)
'hif-latn' => 'Fiji Hindi', # Fiji Hindi (latin)
'hu-formal' => "magyar (formal)\u{200E}", # Hungarian formal address
'ike-cans' => 'ᐃᓄᒃᑎᑐᑦ', # Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
'ike-latn' => 'inuktitut', # Inuktitut, Eastern Canadian (Latin script)
'kbd-cyrl' => 'Адыгэбзэ', # Kabardian (Cyrillic)
'kk-arab' => "قازاقشا (تٴوتە)\u{200F}", # Kazakh Arabic
'kk-cyrl' => "қазақша (кирил)\u{200E}", # Kazakh Cyrillic
'kk-latn' => "qazaqşa (latın)\u{200E}", # Kazakh Latin
'kk-cn' => "قازاقشا (جۇنگو)\u{200F}", # Kazakh (China)
'kk-kz' => "қазақша (Қазақстан)\u{200E}", # Kazakh (Kazakhstan)
'kk-tr' => "qazaqşa (Türkïya)\u{200E}", # Kazakh (Turkey)
'ko-kp' => '조선말', # Korean (DPRK), T190324
'ks-arab' => 'کٲشُر', # Kashmiri (Perso-Arabic script)
'ks-deva' => 'कॉशुर', # Kashmiri (Devanagari script)
'ku-latn' => "kurdî (latînî)\u{200E}", # Northern Kurdish (Latin script)
'ku-arab' => "كوردي (عەرەبی)\u{200F}", # Northern Kurdish (Arabic script) (falls back to ckb)
'nl-informal' => "Nederlands (informeel)\u{200E}", # Dutch (informal address ("je"))
'pt-br' => 'português do Brasil', # Brazilian Portuguese
'ruq-cyrl' => 'Влахесте', # Megleno-Romanian (Cyrillic script)
# 'ruq-grek' => 'Βλαεστε', # Megleno-Romanian (Greek script)
'ruq-latn' => 'Vlăheşte', # Megleno-Romanian (Latin script)
'shi-tfng' => 'ⵜⴰⵛⵍⵃⵉⵜ', # Tachelhit (Tifinagh script)
'shi-latn' => 'Tašlḥiyt', # Tachelhit (Latin script)
'shy-latn' => 'tachawit', # Shawiya (Latin script) - T194047
'skr-arab' => 'سرائیکی', # Saraiki (Arabic script)
'sr-ec' => "српски (ћирилица)\u{200E}", # Serbian Cyrillic ekavian
'sr-el' => "srpski (latinica)\u{200E}", # Serbian Latin ekavian
'tg-cyrl' => 'тоҷикӣ', # Tajiki (Cyrllic script) (default)
'tg-latn' => 'tojikī', # Tajiki (Latin script)
'tt-cyrl' => 'татарча', # Tatar (Cyrillic script) (default)
'tt-latn' => 'tatarça', # Tatar (Latin script)
'ug-arab' => 'ئۇيغۇرچە', # Uyghur (Arabic script) (default)
'ug-latn' => 'Uyghurche', # Uyghur (Latin script)
'uz-cyrl' => 'ўзбекча', # Uzbek Cyrillic
'uz-latn' => 'oʻzbekcha', # Uzbek Latin (default)
'zh-cn' => "中文(中国大陆)\u{200E}", # Chinese (PRC)
'zh-hans' => "中文(简体)\u{200E}", # Mandarin Chinese (Simplified Chinese script) (cmn-hans)
'zh-hant' => "中文(繁體)\u{200E}", # Mandarin Chinese (Traditional Chinese script) (cmn-hant)
'zh-hk' => "中文(香港)\u{200E}", # Chinese (Hong Kong)
'zh-mo' => "中文(澳門)\u{200E}", # Chinese (Macau)
'zh-my' => "中文(马来西亚)\u{200E}", # Chinese (Malaysia)
'zh-sg' => "中文(新加坡)\u{200E}", # Chinese (Singapore)
'zh-tw' => "中文(台灣)\u{200E}", # Chinese (Taiwan)

Frankly, do we really need things like "-formal" and "-informal"? They can't be recognized by browser as no browser think that both are country codes.

Anyway, the

'eml' => 'emiliàn e rumagnòl', # Emiliano-Romagnolo / Sammarinese

Should also be contested because of T36217

@Verdy_p:

cbk-zam -> should be aliased to ???

should be renamed back to cbk, see T124657

map-bms -> aliased to "bms"

Huh? Banyumasan = Bilma Kanuri?

simple -> should be aliased to "en-x-simple"

Just en-simple, no need to use "-x-" here.

Has the "simple" variant been registered in the IANA database for BCP 47 ? If not, we need the "-x-" because it is a private extension in Wikimedia.

nrm -> should be first aliased to "nrf", then the "nrm" alias deleted after (mostly) complete migration (and cleanup of Wikidata)

I'm afraid that this is currently contesting at this RFL page.

NO This is the the same reason why we need to migrate existing "nrm" data to "nrf", so that Narom can be finally assigned the code (that's why the aliasing redirect can only be temporary to do the migration). There's no contestation here the request for new language for Narom is valid and pending since too long (well it can still be allocated in Incubatoro for Narom, given that there's no longer any current Norman data in Incubator, except some read-only archives that can be renamed to "nrf" too if one still needs them).

But most migration to do will be in Wikipedia and Wiktionnary. I think we can leave aside the migration of user talk pages (users will do this cleanup themselves even if their past links are now broken by going to a Narom page or nowhere instead of the past Norman page). In Wikidata, this migration can be easily automated by bots.

Frankly, do we really need things like "-formal" and "-informal"? They can't be recognized by browser as no browser think that both are country codes.

And they have absolutely no reason to recognize them as "country codes" (actually they are "region subtags" and not restricted to just "country codes", en they include also territories/dependency codes from ISO3166-1, and continental area codes from UN M.49, but exclude some codes from ISO 3166-1; note that ISO 3166-2 codes are not used at all as region subtags in BCP 47, and that the use of "region codes" is a legacy, deprecated in favor of ISO 639-3 codes for more specific languages already encoded as members of a registered macrolanguage, itself being registered in the IANA database).

Lots of legacy codes have been kept valid in BCP 47 but the extension mechanism has been simplified and formalized so that fallback resolvers will work as intended (fallback mechanisms are not part of ISO 639, only specified in BCP 47 which maintains the compatibility that the unstable ISO 639 never preserves, meaning that ISO 639 is always unreliable and should never be used as a "normative reference" for our use but only "informative" to exhibit how BCP 47 takes some of its sources; but the IANA database is still the only approved normative source of these codes, everything else is private annd should use private extension subtags).

The syntax recognized by BCP 47 parsers with "formal" and "informal", is the one for "registered language variants subtags", they would be valid and accepted by browsers if they were registered in the IANA database.

(This discussion of language codes is not really related to the topic of this task. There are better places to have it. Thank for understanding.)

I have a suggestion related to Incubator: stop limiting the language codes to 3 characters. That way, IETF language code support would be vastly improved.

In general, we are not encouraging that going forward. And you know that, as far as it goes.

Dzahn removed a subscriber: Dzahn.Aug 7 2018, 11:52 PM

The initial discussion in the Incubator wiki mostly supports this idea:
https://incubator.wikimedia.org/w/index.php?title=Incubator:Community_Portal&oldid=4370387#A_proposal_for_a_big_reform_of_the_Incubator

Remaining details to discuss:

  • How vandalism monitoring will work. There is some support for @Urbanecm's initial proposal (a few comments above), but it may need some more details before actually going forward.
  • Detailed steps will have to be discussed: which wikis will be moved out of the current incubator into the new wikis, what to do with the less active Incubator projects, etc.
  • The decision whether to close the Incubator wiki or leave it open is not part of this proposal. For now it stays open.

(Have I forgotten any important points?)

@Amire80, I'd venture that for now, since closing Incubator is not part of the proposal, we also don't need to discuss (yet) what to do with the less active Incubator projects.
In my personal view, there is clear consensus to begin the work to make this happen. Still, while the Incubator community agrees in principle that moving all active test projects into a space like the one we're proposing is a good idea, there is still enough concern about not creating a "Wild West" of new subdomains that we should be focusing on (1) putting the infrastructure in place and (2) deciding on which projects should be moved to test this infrastructure. For the moment, we're still putting new projects in Incubator.
Also, to clarify a couple of other things:

  • At the moment, I don't think we're discussing closing Beta Wikiversity (even in the longer run) and having all new Wikiversity tests start in one of these spaces. Tests nearing approval could be eligible, though. (At the moment, the only one remotely active enough is hewikiversity.)
  • I think a well-developed Wikisource looking to move to a subdomain can also be eligible. But the default for new Wikisources is going to remain Mulitilingual (Old) Wikisource, for a variety of very good reasons.

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Thanks for forwarding this. Where was this posted?

Liuxinyu posted it, not me. But Wolverène posted it at the end of the discussion in Incubator (before my closing section).

Sent from Outlookhttp://aka.ms/weboutlook


From: Amire80 <no-reply@phabricator.wikimedia.org>
Sent: Monday, August 27, 2018 4:56 AM
To: koala19890@hotmail.com
Subject: [Maniphest] [Commented On] T165585: Make creating a new Language project easier

Amire80 added a comment.

In T165585#4533109https://phabricator.wikimedia.org/T165585#4533109, @Liuxinyu970226https://phabricator.wikimedia.org/p/Liuxinyu970226/ wrote:

Currently @Wolverène oppose this idea, following reasons below:

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.
  2. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.
  3. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers. If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org . And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).
  4. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.
  5. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation. I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

I have no idea how to answer their concerns.

Thanks for forwarding this. Where was this posted?

TASK DETAIL
https://phabricator.wikimedia.org/T165585

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Amire80
Cc: Ebe123, KuboF, KATMAKROFAN, alanajjar, Sahaquiel9102, Ooswesthoesbes, Barrioflores, Ninjastrikers, Baba_Tabita, StevenJ81, jhsoby, Pgallert, Yair_rand, KartikMistry, Kvardek_du, Urbanecm, TheDJ, HalanTul, PokestarFan, Liuxinyu970226, Hydriz, Eloquence, Verdy_p, Raymond, Nikerabbit, brion, Ijon, Nemo_bis, tstarling, SPQRobin, MF-Warburg, millosh, Amqui, Amire80, Aklapper, Jayprakash12345, Liudvikas, Srdjan_m, MuhammadShuaib, LNDDYL, Psychoslave, Luke081515, Gryllida, Shizhao, zeljkofilipin, Arrbee, Jay8g, greg

Amire80 added a comment.EditedAug 28 2018, 11:59 AM

Liuxinyu posted it, not me. But Wolverène posted it at the end of the discussion in Incubator (before my closing section).

It's a bit weird, I cannot find it.

But nevermind - I'll just reply here.

  1. It doesn't solve the problem of active and reasonably proposed projects in extinct languages.

The question of extinct language is explicitly not a part of this reform discussion, as I had already written in one of my emails on the Langcom mailing list: https://lists.wikimedia.org/pipermail/langcom/2018-July/002162.html

It's not a problem that this proposal is trying to resolve. It's a matter for policy discussion for Language committee. This reform is more on the technical side of how Incubator wikis are managed.

  1. It reminds me the story of the Wp/vot, when the "native speaker" wasn't actually one.

This is also a matter of Langcom policy. If there are no native speakers, the incubator wiki is not supposed to be created. If it's created anyway, and no native speakers come along in reasonable time, this wiki should be closed in a fast-track process, as proposed from the beginning.

  1. How will it make the work easier? A separate URL may be also hardly understandable for some newcomers.

There's nothing to understand there. A URL is transparent to the editor.

If they need to learn the wiki markup in an actual Wikipedia, show them unfortunately forgotten test.wikipedia.org .

Wiki markup is not an issue. Prefixes, however, are an issue. See below.

And it's surprisingly but it's easier to delete a problematic project within the framework of the current Incubator, using bot and without managing domains (if I understand it right).

Deleting a whole wiki should be easier once the decision is made to delete it. The proposal suggests from the start to make it easy to delete. @Urbanecm, how difficult is it to delete a wiki? This was done with mo.wikipedia.org recently.

  1. Never seen anyone who really embarrassed about prefixes or categorization. I'm not excluding the fact that such people exist but that kind of people probably has difficulties with the wiki markup in general.

I worked with a lot of people writing in the Incubator: in Adyghe (ady), Dinka (din), Fon (fon), and in several other languages. It's one of the biggest hurdles for people, and it's completely artificial.

You can also see this discussed at this Wikimania presentation:

https://commons.wikimedia.org/wiki/File:Wikipedia_for_Indigenous_Communities.webm (especially after 27 minutes or so)

  1. I'm feeling like a stupid one because I don't understand what should be actually changed comparing to the current situation.

Here are the changes that will benefit the people who read and write in the Incubator:

  1. Many wikis instead of one.
  2. No need to use prefixes.
  3. The possibility to use Wikidata.
  4. The possibility to use Content Translation.
  5. The possibility to search conveniently only in your language.

I'm wondering how this reform will help the Langcom make decisions about project openings any faster.

This is not the intention. The intention is to make it easier to read and write in Incubator projects.

  1. If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?
  2. Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.
  3. Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?
  4. Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?
Amire80 added a comment.EditedAug 29 2018, 8:00 AM
  1. If a new site is to be created for each incubator site, how will WMF turn them into a full site once they become eligible? Last time I heard about it, such redesignation seems to be very cumbersome and that's also why wp/yue and wp/nan still haven't be moved to the desired domain name after almost a decade from their initial proposal. Will it also take a decade for any new projects to get a full site if the proposal is to be adopted?

It's a very different kind of redesignation.

Moving zh-yue and zh-min-nan to yue and nan are difficult for reasons that are explained here: T172035: Blockers for Wikimedia wiki domain renaming.

These two domains, as well as several others such as nrm, als, simple, etc., were created many years ago, at a time when there was no Language committee, or the committee was less strict about standard language codes. Now it's much more strict, so it's not supposed to happen with new language codes.

Renaming these non-standard domains is a separate issue and it's not related to the Incubator.

Moving content from an incubator domain to a proper domain is more of a technical issue that should be resolved by engineers when this task will actually be executed. Input about this from Ops engineers is already welcome, but at this point I'm still trying to get functional community feedback and not to resolve the technical details.

This is supposed to become easier for everybody: the Incubator site readers, the Incubator site writers, the Language Committee, the engineers who manage domains and wiki installations, the vandalism patrolers, etc. If it makes anything harder for anybody, then it's a no-go, but we are now talking about ideas and not yet about the technical details of the implementation.

  1. Is it going to lengthen the entire wiki creation process, and also requires more bureaucratic processes, as well as requiting more manpower to handle each and every applications? Now it is incubator→Full site, in the proposal it will be incubator→Experimental site→Full site.

No, this is not the intention at all. Eventually new languages are supposed to appear immediately in their own incubator wiki site without going through incubator.wikimedia.org. So the whole thing with prefixes on page titles and importing will be gone, and this means less bureaucracy, not more.

  1. Are those goals unachievable by overhauling incubator itself? It seems like Wikia is now going to change the url of their non-English wiki in order to save the SSL certification cost by changing urls in format of zh.community.wikia.com to community.wikia.com/zh, and each of these different language edition sites are still independent. Is that not achievable in Incubator?

Wikia uses completely different software and it has a lot of paid engineers working on this, so it's not really relevant.

  1. Likewise, is it possible to create such new experimental site in a way as easy as creating a new wiki site on wikia?

Yes, that's kind of the idea: the easiness of creating a new wiki site will be similar. No waiting for Ops people, no running scripts, no configuring databases by hand - all automatic (that's what @Urbanecm is talking about when he says "we must have nice, smart and robust addWiki.php"). However, there are several big differences:

WikiaNext-generation Incubator
Any language.Only eligible languages approved by Langcom
Any topic.Only Wikipedia, Wiktionary, Wikiquote, Wikibooks, Wikivoyage, Wikinews, and maybe Wikisource and Wikiversity.
Any number of wikis in every language.One wiki per language
Any web user can create a new wiki.Only people with permission granted by the Langcom can create a new wiki.

Getting the WMF to provide easy creation of a wiki for any topic in any language is a curious and valid idea, but it's completely out of the scope of this task.

Liuxinyu970226 added a comment.EditedNov 17 2018, 9:23 AM

I think that we should better not to discuss anythings about renaming domain, because this isn't what Incubator concern (@C933103 are you still thinking that semi-renaming i.e. closing old one->exporting to Incubator->creating new one->importing from Incubator is appropriate?)

@Liuxinyu970226 Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions

But what sort of impact would it have on e.g. user preference or wiki widget?

And also note that now different Cantonese wikiprojects seems to have different url prefixes

Base added a subscriber: Base.Nov 18 2018, 12:04 AM

@C933103:

Not sure why are you asking me this here when you yourself have stated that this seems to be an inappropriate place. Anyway I think it depends on projects and communities of each projects might also have different opinions

This question is what I originally wanna ask you, what's your reason that you ask back to me this?

But what sort of impact would it have on e.g. user preference or wiki widget?

Specific issues are needed to investigate the efforts of this question...

And also note that now different Cantonese wikiprojects seems to have different url prefixes

Do you even visit Special:SiteMatrix every day, which in theory that special page will have two entry lines of Cantonese? If not then what's your concern about this? I don't think that remote users are affected by this unless they're SiteMatrix fan.

@Amire80 Anything that the extension can do in this task? It seems to me that this task is more of a process issue than a software issue related to the WikimediaIncubator extension.

Nothing immediate, but it's close enough to the topic.

This project is near the end of the ideation phase and it's moving into the implementation planning phase, and some things may have to be done in it. For example, it may be useful to implement functions to analyze activity of projects in the extension, etc.

Hydriz added a comment.EditedNov 26 2018, 10:56 AM

I have looked through all the comments, but I still don't see any actions related to the WikimediaIncubator extension. This task is about Wikimedia Incubator, which fits into incubator.wikimedia.org, but it is not defining what needs to be done about the extension. Can you perhaps put a list of items that needs to be done by the extension in the task description?

Soon, when we start the technical architecture work. (I'm not totally sure who "we" are, I'm working on it.)

Per last comment, re-add the tag when the technical architecture has actually started.

I have been working with some small languages, mainly from the Americas, that wanted to figure out how a new Wikipedia was created. The main problem here is that lot of languages are in incubator per secula seculorum, because no one knows Incubator. So you can have, with a lot of work (not everyone is connected to a broadband or have time to volunteer) a small community working, but if they don't get results, they get quickly tired. Small languages tend to have a constant number of people volunteering, but they are all the time the same, so if someone doesn't even know they can volunteer in their language... community can't be created if they don't know each other BEFORE they start working on wikipedia.

Making things easier, like creating a new Wiki and then start translating and making content should make things easier. Press could cover that the new wiki is born and attract users... but in Incubator this is impossible.

Sannita added a subscriber: Sannita.Feb 5 2019, 2:05 PM

Thank you for pointing to this proposal. Two points I would like to put across.

1) Language localization threshold is too high
I would like to propose that we halve this requirement. Mainly because we do not even have a process to "proofread" the translations. We are setting up a bar too high here for a small new language.

2) Working on incubator is very hard
Yes this cannot be over stressed, I have been working with the ndebele Wikipedia (nr) on incubator for some time now and keep crating articles that I need to move due to prefixes etc.

I would like to propose that e pilot where users can request a creation of new projects and then give them a year to grow. If they fail then e can close them. I know I can get ndebele working and growing if it was not on incubator.

Zache added a subscriber: Zache.Apr 18 2019, 8:10 AM

Soon, when we start the technical architecture work. (I'm not totally sure who "we" are, I'm working on it.)

Any update with this ie. where to contribute :) ?

Any improvement that automates the procedure at https://wikitech.wikimedia.org/wiki/Add_a_wiki is a good step in the direction of implementing this. As the beginning of this task's description says, this task is a version of T158730 that addresses the user-facing aspects of creating new empty wikis, mostly in underresourced "small" languages (they are small only in their presence on the web, but some of them have millions of speakers).

The final step is allowing the creation of Incubators as (almost) independent wikis, and there are some things to figure out and agree upon before this happens, but automating the process is necessary and desirable in any case, so anyone with the relevant knowledge of Ops scripts for wiki creation, or desire to learn them, is welcome to contribute right now.

Susannaanas added a comment.EditedApr 18 2019, 9:05 AM

I develop the Wikidocumentaries project, which navigates Wikimedia content using Wikidata as the linking structure. Each page represents a topic in Wikidata. The Wikipedia article on the page is displayed in the user's language if it exists but it can be read in context in another language. This other article links to Content Translate to be able to translate to the user's language Wikipedia if the article is missing. With the introduction of language codes for Inari and Skolt Saami we now have the opportunity to navigate and display the content in those languages - except for the articles that are in the Incubator.

For our use case being able to display the article from Incubator (using any tricks, if necessary) and using Content Translate to translate to the user's language (possibly to Incubator) are key in serving small languages.

We will have another layer of difficulty later when we start recording topics in the local Wikibase. These are topics that could be rejected by Wikidata or the Wikipedias. This is why it is a micro history wiki, it welcomes nobodies, margins and minorities. For these topics we create articles locally in the underlying MediaWiki/Wikibase and would also like to use translation tools for them – and link between articles in the different locations. (I will support any project that works with intelligent Wikidata-based links/red links in MediaWiki articles).

https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=en
https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=fi
https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=se Northern Saami, own Wikipedia
https://wikidocumentaries-demo.wmflabs.org/Q1089774?language=sms Skolt Saami, Incubator stub exists

(Excuse the currently unfinished language fallbacks.)

Yupik added a subscriber: Yupik.May 17 2019, 7:44 PM
Yupik added a comment.EditedMay 18 2019, 8:32 AM

I've separated out the discussion about creating a new wikiproject for a language and being able to use that language without a wikiproject at T223664, because I believe that these are unnecessarily intertwined.

In my experience, a lot of small and under-resourced language communities would like to contribute to projects like Wikidata or Commons, but do not have the resources (human or otherwise) to go about translating the UI, or starting up an incubator project. So for instance, tagging and captioning photos, adding in labels, descriptions, native names in Wikidata, etc. And they'd like to start as soon as possible once they make the decision.

Yet the process right now does not allow them to do this and it is highly non-transparent, even for experienced users. I feel that this process should be disconnected from having to create an incubator project or translating the UI, streamlined, and documented in a community-friendly way to allow these communities to do exactly what they want with the resources they have.

So please feel free to join us at T223664 for a discussion from the viewpoint of lesser-resourced language community members as end users!

To make the discussion more focused, I split the biggest chunk of this task to a separate task: T228745: Allow creating an independent "incubator wiki" instead of hosting all new wikis in one Incubator wiki with prefixes

I didn't subscribe all the people from this task to that one. If you want, you can subscribe to it.

Perhaps it could be unified with T223664.