Page MenuHomePhabricator

define standard turn-around times for language code additions for Wikidata
Open, Needs TriagePublic

Description

To better manage people's expectations, maybe we should try to come up with good estimates about how long it should take to add new language codes:

Tasks to consider:

  • code for monolingual strings
  • code for lexemes
  • changes in labels for existing codes (language name in English or other languages, not autonym)

Assumptions:

  • request is complete with samples
  • simple code addition
  • basic criteria of Language committee fulfilled (whatever they may be)

Steps to consider and estimates:

Which steps can run parallel ?

Current ranges vary from 1 week to x months even for trivial additions

Event Timeline

Coding done by volunteer -> 1 week after langcom approval (should be shorter, but depends on when the volunteer (usually mee) has time). (Doesn't matter which of the three it is.)
Coding done by WMDE: Don't know if they want to spend "expensive" paid resources when it can easily be done by a volunteer
code review: usually done within 1 week after delivery of the code
After code review the deployment is with the next train (code deployment)
Lydia review nowadays not needed as long as there is langcom approval. @Amire80 / @jhsoby can comment on the langcom approval part.
Maybe @Lydia_Pintscher has something to say about the WMDE side (if my assumptions are correct and maybe I missed something)

Yeah as @Mbch331 said, no review needed from my side. If he submits the patch as usual ( ❤ ) it'll be reviewed in less than a week except for some very unusual circumstances. It then goes out within a week with the next deployment train unless it's halted for some exceptional reasons.

I noticed that the quickest step usually is Mbc331 .

  • As creation of these codes is a basic Wikidata maintenance step, I suppose we should also plan for when Mbc331 isn't available and it has to be done by WMDE.
  • For code review, I added a few open ones I found into "Wikidata (consider for next sprint)". Is this the correct place or should be in "Wikidata (in progress)" ?

I noticed that the quickest step usually is Mbc331 .

  • As creation of these codes is a basic Wikidata maintenance step, I suppose we should also plan for when Mbc331 isn't available and it has to be done by WMDE.

If I'm not around, any other volunteer coder can do this. These are quite simple changes.

  • For code review, I added a few open ones I found into "Wikidata (consider for next sprint)". Is this the correct place or should be in "Wikidata (in progress)" ?

If there is actually a patch, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)) is the correct tag and peer review is the correct status.

Esc3300 updated the task description. (Show Details)
  • As creation of these codes is a basic Wikidata maintenance step, I suppose we should also plan for when Mbc331 isn't available and it has to be done by WMDE.

If I'm not around, any other volunteer coder can do this. These are quite simple changes.

I think primarily all maintenance, development, and code review is done by WMF/WMDE, at least per Jimmy Wales' plan, but maybe we should also consider the hypothesis where code review is done by a volunteer. @Lydia_Pintscher do you have an estimate for time needed for coding?

I agree this is a good idea, I have to admit that we (langcom) are often a bottleneck in this – more often than I would like. So what if we say something like, if nobody hears from Langcom within 2 weeks of a request being posted, a request can be assumed to be ok? What do you think, @Amire80? It would also be nice if we could bring some more langcom people into the loop on these tasks, so that it doesn't all rely on Amir and me – we are both fathers with full-time jobs, so many times these tasks unfortunately don't make it to the top of the pile on a given day.

Sounds reasonable. The process seems more "predictable" since the two of you take care of it. I suppose we agree that Wikidata's needs are very different from Incubator's.

I think primarily all maintenance, development, and code review is done by WMF/WMDE, at least per Jimmy Wales' plan, but maybe we should also consider the hypothesis where code review is done by a volunteer. @Lydia_Pintscher do you have an estimate for time needed for coding?

(Not all coding is done and should be done by WMDE and the same applies for code review.)
In the case of new language codes no coding is involved. It's just adding it to a list, which @Mbch331 does, and then someone from the dev-team just reviews, like any other change.

In the case of new language codes no coding is involved. It's just adding it to a list, which @Mbch331 does, and then someone from the dev-team just reviews, like any other change.

Unless we change it, it still requires developer access to the code. I guess either step could be done by any user with that access. What's the plan if no volunteer is available to do the update? @Lydia_Pintscher

Which steps can run parallel ?

Actually none. It doesn't make sense to start coding before there's approval. If a ticket doesn't get approval by langcom, you've coded something that's never going to be used.
You can't review a code change, without the coding first. Deployment is always done via the weekly code rollouts, these changes don't require immediate rollout. Code can't be rolled out without review, as the change hasn't been approved in Gerrit and won't be merged.

Which steps can run parallel ?

I think there are two possibilities:

  • steps that can't technically done in parallel
  • steps that can technically be done in parallel

For the later, there may be reasons to do them before or after some step. Obviously, it doesn't mean they have to be done in parallel.

Current estimate for the entire process seems to be 5 weeks (if you are available). This seems a lot for a somewhat basic task even for the volunteer part of our organization.

Which steps can run parallel ?

I think there are two possibilities:

  • steps that can't technically done in parallel
  • steps that can technically be done in parallel

For the later, there may be reasons to do them before or after some step. Obviously, it doesn't mean they have to be done in parallel.

Current estimate for the entire process seems to be 5 weeks (if you are available). This seems a lot for a somewhat basic task even for the volunteer part of our organization.

Usually the code review, test and deployment are all done in one week time and not in 3 weeks. Usually it's even in the same week the coding is done, so from coding to deployment it's usually 1 week time. And deployment doesn't wait for testing. Once a patch is merged, it will be deployed with the next deployment run (train) whether it's tested or not.

Well, test is after deployment (i.e. ensure it actually works). Ideally this is done by the person who requested the new code.

Well, test is after deployment (i.e. ensure it actually works). Ideally this is done by the person who requested the new code.

There is also a test done by WMDE after the merge, but before the deployment.

This was already mentioned in T284151 as if it's the new rule, but it isn't. Please don't do this. It doesn't create a good impression.

It comes back to the questions of what are the basic criteria of Language committee fulfilled. If these aren't defined, deadlines cannot be defined either.

The whole point of the Language committee is to prevent hoaxes, invalid languages, and duplicates. We could perhaps codify our process in a way that allows quicker automatic approval of simple cases, but first it has to be codified, and only later it can actually be used.

I suppose you refer to the timeframe for langco review of proposed new codes. I don't think it's entirely new. It is already used for language codes on Wikidata for some time, given the lack of response and sometimes entirely incomprehensible arguments we had during reviews. I think the situation has much improved lately (also thanks to you), but during my recent cleanup I still came across countless requests by contributors that were lost in phabricator without a clear reason. Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

The whole point of the Language committee is to prevent hoaxes, invalid languages, and duplicates.

The question here is mainly about codes for Wikidata, these are generally trivial in nature and I don't think I have seen any of "hoaxes, invalid languages, and duplicates". Even a potential duplicate could eventually be merged and the people involved should be able to assess if the code is technically valid. This is somewhat different from the usual incubator business langco was formed for.

The consensus on Wikidata is that langco review isn't needed, but I think the review (if done in a timely and comprehensible manner) can still help formulate clearer proposals.

The consensus on Wikidata is that langco review isn't needed

If @Lydia_Pintscher and her team don't agree with this, then I can make patches, but they won't approve them without LangCom approval. And without their approval, new codes can't be added.

Let's see what Amire80 suggests how we could improve the process as seen for "Alsatian".

I suppose you refer to the timeframe for langco review of proposed new codes.

Can you please stop writing "langco"? It's really weird. It's "Language committee", and if you really want to abbreviate, then it's Langcom. I haven't seen anyone else writing "langco".

I don't think it's entirely new. It is already used for language codes on Wikidata for some time, given the lack of response and sometimes entirely incomprehensible arguments we had during reviews. I think the situation has much improved lately (also thanks to you), but during my recent cleanup I still came across countless requests by contributors that were lost in phabricator without a clear reason.

They are not countless. Phabricator has a finite number of tasks.

It certainly happens that things get missed, and it's not good. I do Language committee work it as a volunteer. I also have a day job and two children. I receive a lot of Phabricator emails, which are very similar to each other, and it happens that I miss some. If you believe my attention is needed somewhere, I'm easy to find on email and Telegram, and a lot of people use this successfully.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

See above. Phabricator pings by themselves are not always perfectly efficient.

The whole point of the Language committee is to prevent hoaxes, invalid languages, and duplicates.

The question here is mainly about codes for Wikidata, these are generally trivial in nature and I don't think I have seen any of "hoaxes, invalid languages, and duplicates".

No, they aren't trivial. People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Even a potential duplicate could eventually be merged and the people involved should be able to assess if the code is technically valid. This is somewhat different from the usual incubator business langco was formed for.

Have such merges ever happened? It's not as easy as you make it seem.

It certainly happens that things get missed, and it's not good. [..], I'm easy to find on email and Telegram, and a lot of people use this successfully.

Personally, I'm happy with the current process as re-designed by the language committee not too long ago. The phabricator workboard should help you participate if you choose to do so. I don't think any WMF process can rely on Telegram or other third party tools. We should probably also avoid suggesting users to go through them.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

See above. Phabricator pings by themselves are not always perfectly efficient.

The process was eventually followed and the code defined. No input was actually needed (as far as I can tell). The only problem I see was the delay.

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Even a potential duplicate could eventually be merged [..]

Have such merges ever happened? It's not as easy as you make it seem.

On Wikidata, this is frequently done (one code replaced by another one). The main problem we currently have is that T284808 still hasn't been done (I created the ticket last week, but the problem was spelled out years ago).

I don't think any WMF process can rely on Telegram or other third party tools.

Email is generic enough if you don't want Telegram, which is totally understandable.

Emails from Phabricator are not very efficient.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

See above. Phabricator pings by themselves are not always perfectly efficient.

The process was eventually followed and the code defined. No input was actually needed (as far as I can tell). The only problem I see was the delay.

... And the language happens to be valid, but it's not great that it happens outside of process. It's my fault, too, but still. Let's all make our best efforts to respect it.

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Cannot recall anything from Wikidata, but it happened in lots of other places, and it will happen with Wikidata sooner or later. "Why can it be there if it can't be here" is a very frequent complaint.

Emails from Phabricator are not very efficient.

If you want to participate in a phabricator based process at some point you need to adopt its use. Personally, I think the workboard set up by langcom at Language codes is sufficient. It takes some time to get used to it, but eventually I figured it out. If you don't need the emails from phabricator, you could de-activate them.

Maybe you can explain why https://phabricator.wikimedia.org/T262922 took so long (you were pinged three times over a period of six months).

The process was eventually followed and the code defined. No input was actually needed (as far as I can tell). The only problem I see was the delay.

... And the language happens to be valid, but it's not great that it happens outside of process. It's my fault, too, but still. Let's all make our best efforts to respect it.

Maybe you could explain what part of the process wasn't followed and where this is spelled out. Also, I think we should try to show more respect for users who formulate these requests.

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Cannot recall anything from Wikidata, but it happened in lots of other places, and it will happen with Wikidata sooner or later. "Why can it be there if it can't be here" is a very frequent complaint.

If there were no cases in a decade, maybe we can look into it when it actually happens.

Emails from Phabricator are not very efficient.

If you want to participate in a phabricator based process at some point you need to adopt its use.

I use Phabricator for a lot of things, and not just this.

Personally, I think the workboard set up by langcom at Language codes is sufficient. It takes some time to get used to it, but eventually I figured it out.

I will try to look at it more frequently.

... And the language happens to be valid, but it's not great that it happens outside of process. It's my fault, too, but still. Let's all make our best efforts to respect it.

Maybe you could explain what part of the process wasn't followed and where this is spelled out. Also, I think we should try to show more respect for users who formulate these requests.

The code was added without Language committee approval. Or did I miss anything?

People with weird ideas may use the presence of a language in Wikidata as "a foot in the door" and demand a whole Wikipedia.

Do you have any Wikidata related samples for this or is it just an abstract argument? I understand that there might be problems you are facing with incubator, but we should be able to build Wikidata without affecting it.

Cannot recall anything from Wikidata, but it happened in lots of other places, and it will happen with Wikidata sooner or later. "Why can it be there if it can't be here" is a very frequent complaint.

If there were no cases in a decade, maybe we can look into it when it actually happens.

The whole point is that we don't want it to happen in the first place.

Look, I understand why it looks inefficient, and maybe even totally unnecessary in a place that calls itself a wiki, which is a site that anyone can edit and where you write first and get things checked later. But the definition of a language is a special layer that has extra sensitivity. Luckily, it's finite, and it's not needed so frequently that the inefficiency grinds everything to a halt.

If we knew what problem you are trying to solve and the time this may take, we could help you address it.

There seem to be aspects that are understood differently by Wikibase developers and the Wikidata community compared to langcom, so the langcom review can appear incomprehensible/irrelevant (e.g. T165648). Also, the process you follow seems unclear (e.g. T252198).

For langcom, it may appear as a mere inefficiency if these aren't reviewed in a timely manner, but to people requesting such codes for endangered languages it may appear particularly insensitive.

Let's just bear in mind that the codes follow a system designed for it and constructive feedback from langcom is always welcome.

@Lydia_Pintscher on some of the open points:

  • what's your view on the triage of tickets? Do you want to do it? Are the various boards/statuses fine with you? There was some debate about it at T168295#7165601
  • how long should it take on Contact_the_development_team?
  • are there are others we should enhance/complete?

It seems that the recent new review and triage of the tickets helped move ahead. I added an update about it to the weekly news.

We need to distinguish tickets that are additions of language codes and other language-related tickets. They are not the same and do not follow the same processes etc. This ticket was only about standardizing turn-around times for the addition of new language codes based on its title. For the process around this we have the language code phabricator board for some time now and that seems to have made the process much clearer, more transparent and tickets are in my opinion moving through it swiftly enough. The one thing that takes a bit longer still is getting language committee input. Since none of these tickets are life or death situations I prefer to get meaningful input from people tasked with understanding the complexity of languages in Wikimedia instead of making a change and then having to go through discussions about why it was wrong and undo it because that is more painful. So let's think about how we can help make getting that input easier (and I fear setting a deadline doesn't qualify as making it easier). @Amire80 if anything comes to your mind please let me know.

All other ticket triage related to language or not: generally setting deadlines on tickets, understanding if something is suitable for other contributors to pick up and deciding if a ticket is ready for the development team should be with the team (if in doubt the product manager or tech lead). I think we're spending quite a bit more time and energy in discussions on language processes than is warranted compared to the many other things people want to see done around Wikidata.

@Lydia_Pintscher Looks like we wont get further input.

  • Tasks to consider above are mostly unchanged (obviously there are others, less related to Wikidata). Maybe the common point is that these are all additions or updates to configuration variables or language lists primarily used on Wikidata. Nobody expects T284808 to be fixed within a month.
  • A problem I found when doing maintenance on these is that the requests were distributed in many different projects and states without much consistency. Requests about language code questions at Wikidata weren't touched in months despite being trivial to solve, but just weren't on any radar. I think it's helpful if maintenance sorts them into predetermined projects and statuses regularly. T284856 tries to help with that. The states are now outlined above and if someone from your team checks and sorts them periodically that would simplify things.
  • As there seems to be a preference not to use "due date" for follow up, I stopped using this. The workboard at Language codes is now shorter and makes it easier to follow up anyways.
  • I don't think we can avoid problems like changing "no" to "nb", but I don't recall this being particularly painful or problematic for Wikidata contributors. Afterall we are using IETF language tags and content can easily be edited at Wikidata (contrary to interface messages where the language code is carved in stone). Also, T273705 helped me realize that the problems for Incubator with new language codes are practically nil, a view apparently shared by Incubator admins (see here). It can be painful for some people tasked to proceed with some of the steps here to realize that things don't quite advance as much as they like, but I think that happens to all of us once in a while and we should be able to determine how to avoid it going forward. You probably recall that we had three iterations by WMDE to improve the process. That requests entered by langcom members for their own needs get fast-tracked despite being incomplete doesn't really help build trust in their approach.

If WMDE actively seeks grant money to expand Wikidata/Wikibase to other languages, it would probably help to make sure the process discussed here is more clear.