define standard turn-around times for language code additions for Wikidata
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Esc3300
	Jun 4 2021, 7:23 AM

Description

To better manage people's expectations, maybe we should try to come up with good estimates about how long it should take to add new language codes:

Tasks to consider:

code for monolingual strings
code for lexemes
changes in labels for existing codes (language name in English or other languages, not autonym)

Assumptions:

request is complete with samples
simple code addition
basic criteria of Language committee fulfilled (whatever they may be)

Steps to consider and estimates:

help users create/complete their request: can be on Project Chat, Contact_the_development_team, Wikidata status incoming or needs discussion or investigation
Language committee review: TBD (or two weeks without a response)
~~Lydia review~~
coding done by volunteer: (estimate if Mbch331 available) within a week: using Wikidata status ready to go
coding done by WMDE: TBD: using Wikidata status consider for next sprint
code review by WMDE: (estimate) within a week: use tags Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) status peer review and Wikidata status in progress
code deployment: (estimate) within a week
test deployment
(re)-triage of tickets on Language codes Wikidata Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) : done by Language codes project members or developers: usually weekly. See T284856 for steps

Which steps can run parallel ?

Current ranges vary from 1 week to x months even for trivial additions

Related Objects

Mentioned In: T297350: [GOAL] Improve experience around adding new language codes for Wikidata
T168295: Some language codes for sitelinks are capitalised differently from labels
T245927: Wrong Russian translation for ISO 639 language code "mul" (add mul - "несколько языков" to LocalNamesRu)
T284151: Add monolingual language code ca-valencia
T284856: identify maintenance steps for "language code" project
T256649: incorrect English names for languages (they display the native names only)
T281702: Wrong name for Pashto in Swedish in the languages list (Wikidata)
Mentioned Here: T273705: disallow use by Incubator of language codes such as "en-gb", "es-mx", "qqq" etc
T168295: Some language codes for sitelinks are capitalised differently from labels
T284856: identify maintenance steps for "language code" project
T165648: Add monolingual language codes nrf-gg (for Guernésiais), nrf-je (for Jèrriais)
T252198: Add Tundra Yukaghir language (ykg) to monolingual codes
T284808: Add a configuration variable that allows disabling language codes for labels, descriptions, and aliases
T262922: Add monolingual language code gsw-fr // Elsässisch
T284151: Add monolingual language code ca-valencia

Event Timeline

Esc3300 created this task.Jun 4 2021, 7:23 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 4 2021, 7:23 AM

Coding done by volunteer -> 1 week after langcom approval (should be shorter, but depends on when the volunteer (usually mee) has time). (Doesn't matter which of the three it is.)
Coding done by WMDE: Don't know if they want to spend "expensive" paid resources when it can easily be done by a volunteer
code review: usually done within 1 week after delivery of the code
After code review the deployment is with the next train (code deployment)
Lydia review nowadays not needed as long as there is langcom approval. @Amire80 / @jhsoby can comment on the langcom approval part.
Maybe @Lydia_Pintscher has something to say about the WMDE side (if my assumptions are correct and maybe I missed something)

Yeah as @Mbch331 said, no review needed from my side. If he submits the patch as usual ( ❤ ) it'll be reviewed in less than a week except for some very unusual circumstances. It then goes out within a week with the next deployment train unless it's halted for some exceptional reasons.

Esc3300 updated the task description. (Show Details)Jun 4 2021, 8:13 AM

I noticed that the quickest step usually is Mbc331 .

As creation of these codes is a basic Wikidata maintenance step, I suppose we should also plan for when Mbc331 isn't available and it has to be done by WMDE.

For code review, I added a few open ones I found into "Wikidata (consider for next sprint)". Is this the correct place or should be in "Wikidata (in progress)" ?

Esc3300 updated the task description. (Show Details)Jun 4 2021, 8:23 AM

In T284276#7133623, @Esc3300 wrote:

I noticed that the quickest step usually is Mbc331 .

As creation of these codes is a basic Wikidata maintenance step, I suppose we should also plan for when Mbc331 isn't available and it has to be done by WMDE.

If I'm not around, any other volunteer coder can do this. These are quite simple changes.

For code review, I added a few open ones I found into "Wikidata (consider for next sprint)". Is this the correct place or should be in "Wikidata (in progress)" ?

If there is actually a patch, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) is the correct tag and peer review is the correct status.

Mbch331 updated the task description. (Show Details)Jun 4 2021, 10:31 AM

Esc3300 updated the task description. (Show Details)Jun 7 2021, 10:33 AM

Esc3300 updated the task description. (Show Details)

In T284276#7133853, @Mbch331 wrote:

In T284276#7133623, @Esc3300 wrote:

As creation of these codes is a basic Wikidata maintenance step, I suppose we should also plan for when Mbc331 isn't available and it has to be done by WMDE.

If I'm not around, any other volunteer coder can do this. These are quite simple changes.

I think primarily all maintenance, development, and code review is done by WMF/WMDE, at least per Jimmy Wales' plan, but maybe we should also consider the hypothesis where code review is done by a volunteer. @Lydia_Pintscher do you have an estimate for time needed for coding?

@Amire80 @jhsoby what's your view on this?

I agree this is a good idea, I have to admit that we (langcom) are often a bottleneck in this – more often than I would like. So what if we say something like, if nobody hears from Langcom within 2 weeks of a request being posted, a request can be assumed to be ok? What do you think, @Amire80? It would also be nice if we could bring some more langcom people into the loop on these tasks, so that it doesn't all rely on Amir and me – we are both fathers with full-time jobs, so many times these tasks unfortunately don't make it to the top of the pile on a given day.

Esc3300 mentioned this in T281702: Wrong name for Pashto in Swedish in the languages list (Wikidata).Jun 7 2021, 3:53 PM

Sounds reasonable. The process seems more "predictable" since the two of you take care of it. I suppose we agree that Wikidata's needs are very different from Incubator's.

In T284276#7138007, @Esc3300 wrote:

I think primarily all maintenance, development, and code review is done by WMF/WMDE, at least per Jimmy Wales' plan, but maybe we should also consider the hypothesis where code review is done by a volunteer. @Lydia_Pintscher do you have an estimate for time needed for coding?

(Not all coding is done and should be done by WMDE and the same applies for code review.)
In the case of new language codes no coding is involved. It's just adding it to a list, which @Mbch331 does, and then someone from the dev-team just reviews, like any other change.

Esc3300 updated the task description. (Show Details)Jun 10 2021, 8:57 AM

In T284276#7139740, @Lydia_Pintscher wrote:

In the case of new language codes no coding is involved. It's just adding it to a list, which @Mbch331 does, and then someone from the dev-team just reviews, like any other change.

Unless we change it, it still requires developer access to the code. I guess either step could be done by any user with that access. What's the plan if no volunteer is available to do the update? @Lydia_Pintscher

Which steps can run parallel ?

Actually none. It doesn't make sense to start coding before there's approval. If a ticket doesn't get approval by langcom, you've coded something that's never going to be used.
You can't review a code change, without the coding first. Deployment is always done via the weekly code rollouts, these changes don't require immediate rollout. Code can't be rolled out without review, as the change hasn't been approved in Gerrit and won't be merged.

Which steps can run parallel ?

I think there are two possibilities:

steps that can't technically done in parallel
steps that can technically be done in parallel

For the later, there may be reasons to do them before or after some step. Obviously, it doesn't mean they have to be done in parallel.

Current estimate for the entire process seems to be 5 weeks (if you are available). This seems a lot for a somewhat basic task even for the volunteer part of our organization.

In T284276#7148299, @Esc3300 wrote:

Which steps can run parallel ?

I think there are two possibilities:

steps that can't technically done in parallel

steps that can technically be done in parallel

For the later, there may be reasons to do them before or after some step. Obviously, it doesn't mean they have to be done in parallel.

Current estimate for the entire process seems to be 5 weeks (if you are available). This seems a lot for a somewhat basic task even for the volunteer part of our organization.

Usually the code review, test and deployment are all done in one week time and not in 3 weeks. Usually it's even in the same week the coding is done, so from coding to deployment it's usually 1 week time. And deployment doesn't wait for testing. Once a patch is merged, it will be deployed with the next deployment run (train) whether it's tested or not.

Well, test is after deployment (i.e. ensure it actually works). Ideally this is done by the person who requested the new code.

In T284276#7148337, @Esc3300 wrote:

Well, test is after deployment (i.e. ensure it actually works). Ideally this is done by the person who requested the new code.

There is also a test done by WMDE after the merge, but before the deployment.

Esc3300 updated the task description. (Show Details)Jun 11 2021, 5:41 AM

Esc3300 updated the task description. (Show Details)Jun 11 2021, 7:30 AM

Esc3300 mentioned this in T256649: incorrect English names for languages (they display the native names only).Jun 12 2021, 8:16 AM

Esc3300 mentioned this in T284856: identify maintenance steps for "language code" project.Jun 12 2021, 12:53 PM

Esc3300 mentioned this in T284151: Add monolingual language code ca-valencia.Jun 17 2021, 4:33 AM

Amire80 updated the task description. (Show Details)Jun 17 2021, 4:59 AM

This was already mentioned in T284151 as if it's the new rule, but it isn't. Please don't do this. It doesn't create a good impression.

It comes back to the questions of what are the basic criteria of Language committee fulfilled. If these aren't defined, deadlines cannot be defined either.

The whole point of the Language committee is to prevent hoaxes, invalid languages, and duplicates. We could perhaps codify our process in a way that allows quicker automatic approval of simple cases, but first it has to be codified, and only later it can actually be used.

I suppose you refer to the timeframe for langco review of proposed new codes. I don't think it's entirely new. It is already used for language codes on Wikidata for some time, given the lack of response and sometimes entirely incomprehensible arguments we had during reviews. I think the situation has much improved lately (also thanks to you), but during my recent cleanup I still came across countless requests by contributors that were lost in phabricator without a clear reason. Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

The whole point of the Language committee is to prevent hoaxes, invalid languages, and duplicates.

The question here is mainly about codes for Wikidata, these are generally trivial in nature and I don't think I have seen any of "hoaxes, invalid languages, and duplicates". Even a potential duplicate could eventually be merged and the people involved should be able to assess if the code is technically valid. This is somewhat different from the usual incubator business langco was formed for.

The consensus on Wikidata is that langco review isn't needed, but I think the review (if done in a timely and comprehensible manner) can still help formulate clearer proposals.

In T284276#7160260, @Esc3300 wrote:

The consensus on Wikidata is that langco review isn't needed

If @Lydia_Pintscher and her team don't agree with this, then I can make patches, but they won't approve them without LangCom approval. And without their approval, new codes can't be added.

Let's see what Amire80 suggests how we could improve the process as seen for "Alsatian".

In T284276#7160260, @Esc3300 wrote:

I suppose you refer to the timeframe for langco review of proposed new codes.

Can you please stop writing "langco"? It's really weird. It's "Language committee", and if you really want to abbreviate, then it's Langcom. I haven't seen anyone else writing "langco".

I don't think it's entirely new. It is already used for language codes on Wikidata for some time, given the lack of response and sometimes entirely incomprehensible arguments we had during reviews. I think the situation has much improved lately (also thanks to you), but during my recent cleanup I still came across countless requests by contributors that were lost in phabricator without a clear reason.

They are not countless. Phabricator has a finite number of tasks.

It certainly happens that things get missed, and it's not good. I do Language committee work it as a volunteer. I also have a day job and two children. I receive a lot of Phabricator emails, which are very similar to each other, and it happens that I miss some. If you believe my attention is needed somewhere, I'm easy to find on email and Telegram, and a lot of people use this successfully.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

See above. Phabricator pings by themselves are not always perfectly efficient.

The whole point of the Language committee is to prevent hoaxes, invalid languages, and duplicates.

The question here is mainly about codes for Wikidata, these are generally trivial in nature and I don't think I have seen any of "hoaxes, invalid languages, and duplicates".

No, they aren't trivial. People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Even a potential duplicate could eventually be merged and the people involved should be able to assess if the code is technically valid. This is somewhat different from the usual incubator business langco was formed for.

Have such merges ever happened? It's not as easy as you make it seem.

It certainly happens that things get missed, and it's not good. [..], I'm easy to find on email and Telegram, and a lot of people use this successfully.

Personally, I'm happy with the current process as re-designed by the language committee not too long ago. The phabricator workboard should help you participate if you choose to do so. I don't think any WMF process can rely on Telegram or other third party tools. We should probably also avoid suggesting users to go through them.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

See above. Phabricator pings by themselves are not always perfectly efficient.

The process was eventually followed and the code defined. No input was actually needed (as far as I can tell). The only problem I see was the delay.

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Even a potential duplicate could eventually be merged [..]

Have such merges ever happened? It's not as easy as you make it seem.

On Wikidata, this is frequently done (one code replaced by another one). The main problem we currently have is that T284808 still hasn't been done (I created the ticket last week, but the problem was spelled out years ago).

In T284276#7160307, @Esc3300 wrote:

I don't think any WMF process can rely on Telegram or other third party tools.

Email is generic enough if you don't want Telegram, which is totally understandable.

Emails from Phabricator are not very efficient.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

See above. Phabricator pings by themselves are not always perfectly efficient.

The process was eventually followed and the code defined. No input was actually needed (as far as I can tell). The only problem I see was the delay.

... And the language happens to be valid, but it's not great that it happens outside of process. It's my fault, too, but still. Let's all make our best efforts to respect it.

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Cannot recall anything from Wikidata, but it happened in lots of other places, and it will happen with Wikidata sooner or later. "Why can it be there if it can't be here" is a very frequent complaint.

In T284276#7160314, @Amire80 wrote:

Emails from Phabricator are not very efficient.

If you want to participate in a phabricator based process at some point you need to adopt its use. Personally, I think the workboard set up by langcom at Language codes is sufficient. It takes some time to get used to it, but eventually I figured it out. If you don't need the emails from phabricator, you could de-activate them.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

The process was eventually followed and the code defined. No input was actually needed (as far as I can tell). The only problem I see was the delay.

... And the language happens to be valid, but it's not great that it happens outside of process. It's my fault, too, but still. Let's all make our best efforts to respect it.

Maybe you could explain what part of the process wasn't followed and where this is spelled out. Also, I think we should try to show more respect for users who formulate these requests.

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Cannot recall anything from Wikidata, but it happened in lots of other places, and it will happen with Wikidata sooner or later. "Why can it be there if it can't be here" is a very frequent complaint.

If there were no cases in a decade, maybe we can look into it when it actually happens.

In T284276#7160323, @Esc3300 wrote:

In T284276#7160314, @Amire80 wrote:

Emails from Phabricator are not very efficient.

If you want to participate in a phabricator based process at some point you need to adopt its use.

I use Phabricator for a lot of things, and not just this.

Personally, I think the workboard set up by langcom at Language codes is sufficient. It takes some time to get used to it, but eventually I figured it out.

I will try to look at it more frequently.

... And the language happens to be valid, but it's not great that it happens outside of process. It's my fault, too, but still. Let's all make our best efforts to respect it.

Maybe you could explain what part of the process wasn't followed and where this is spelled out. Also, I think we should try to show more respect for users who formulate these requests.

The code was added without Language committee approval. Or did I miss anything?

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Cannot recall anything from Wikidata, but it happened in lots of other places, and it will happen with Wikidata sooner or later. "Why can it be there if it can't be here" is a very frequent complaint.

If there were no cases in a decade, maybe we can look into it when it actually happens.

The whole point is that we don't want it to happen in the first place.

Look, I understand why it looks inefficient, and maybe even totally unnecessary in a place that calls itself a wiki, which is a site that anyone can edit and where you write first and get things checked later. But the definition of a language is a special layer that has extra sensitivity. Luckily, it's finite, and it's not needed so frequently that the inefficiency grinds everything to a halt.

If we knew what problem you are trying to solve and the time this may take, we could help you address it.

There seem to be aspects that are understood differently by Wikibase developers and the Wikidata community compared to langcom, so the langcom review can appear incomprehensible/irrelevant (e.g. T165648). Also, the process you follow seems unclear (e.g. T252198).

For langcom, it may appear as a mere inefficiency if these aren't reviewed in a timely manner, but to people requesting such codes for endangered languages it may appear particularly insensitive.

Let's just bear in mind that the codes follow a system designed for it and constructive feedback from langcom is always welcome.

Esc3300 mentioned this in T245927: Wrong Russian translation for ISO 639 language code "mul" (add mul - "несколько языков" to LocalNamesRu).Jun 21 2021, 8:01 AM

Esc3300 mentioned this in T168295: Some language codes for sitelinks are capitalised differently from labels.Jun 21 2021, 12:18 PM

Esc3300 updated the task description. (Show Details)Jun 22 2021, 12:58 PM

@Lydia_Pintscher on some of the open points:

what's your view on the triage of tickets? Do you want to do it? Are the various boards/statuses fine with you? There was some debate about it at T168295#7165601
how long should it take on Contact_the_development_team?
are there are others we should enhance/complete?

It seems that the recent new review and triage of the tickets helped move ahead. I added an update about it to the weekly news.

We need to distinguish tickets that are additions of language codes and other language-related tickets. They are not the same and do not follow the same processes etc. This ticket was only about standardizing turn-around times for the addition of new language codes based on its title. For the process around this we have the language code phabricator board for some time now and that seems to have made the process much clearer, more transparent and tickets are in my opinion moving through it swiftly enough. The one thing that takes a bit longer still is getting language committee input. Since none of these tickets are life or death situations I prefer to get meaningful input from people tasked with understanding the complexity of languages in Wikimedia instead of making a change and then having to go through discussions about why it was wrong and undo it because that is more painful. So let's think about how we can help make getting that input easier (and I fear setting a deadline doesn't qualify as making it easier). @Amire80 if anything comes to your mind please let me know.

All other ticket triage related to language or not: generally setting deadlines on tickets, understanding if something is suitable for other contributors to pick up and deciding if a ticket is ready for the development team should be with the team (if in doubt the product manager or tech lead). I think we're spending quite a bit more time and energy in discussions on language processes than is warranted compared to the many other things people want to see done around Wikidata.

@Lydia_Pintscher Looks like we wont get further input.

Tasks to consider above are mostly unchanged (obviously there are others, less related to Wikidata). Maybe the common point is that these are all additions or updates to configuration variables or language lists primarily used on Wikidata. Nobody expects T284808 to be fixed within a month.
A problem I found when doing maintenance on these is that the requests were distributed in many different projects and states without much consistency. Requests about language code questions at Wikidata weren't touched in months despite being trivial to solve, but just weren't on any radar. I think it's helpful if maintenance sorts them into predetermined projects and statuses regularly. T284856 tries to help with that. The states are now outlined above and if someone from your team checks and sorts them periodically that would simplify things.
As there seems to be a preference not to use "due date" for follow up, I stopped using this. The workboard at Language codes is now shorter and makes it easier to follow up anyways.
I don't think we can avoid problems like changing "no" to "nb", but I don't recall this being particularly painful or problematic for Wikidata contributors. Afterall we are using IETF language tags and content can easily be edited at Wikidata (contrary to interface messages where the language code is carved in stone). Also, T273705 helped me realize that the problems for Incubator with new language codes are practically nil, a view apparently shared by Incubator admins (see here). It can be painful for some people tasked to proceed with some of the steps here to realize that things don't quite advance as much as they like, but I think that happens to all of us once in a while and we should be able to determine how to avoid it going forward. You probably recall that we had three iterations by WMDE to improve the process. That requests entered by langcom members for their own needs get fast-tracked despite being incomplete doesn't really help build trust in their approach.

If WMDE actively seeks grant money to expand Wikidata/Wikibase to other languages, it would probably help to make sure the process discussed here is more clear.

Esc3300 updated the task description. (Show Details)Jul 11 2021, 2:43 PM

Manuel mentioned this in T297350: [GOAL] Improve experience around adding new language codes for Wikidata .Aug 3 2022, 1:02 PM

Winston_Sung moved this task from Backlog to Monitoring on the Language codes board.Apr 19 2023, 5:27 PM

define standard turn-around times for language code additions for WikidataOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

define standard turn-around times for language code additions for Wikidata
Open, Needs TriagePublic
Actions