Page MenuHomePhabricator

New settings for ContentTranslation on the cswiki
Open, MediumPublic

Description

The cswiki community decided per RfC (https://cs.wikipedia.org/wiki/Wikipedie:%C5%BD%C3%A1dost_o_koment%C3%A1%C5%99/Omezen%C3%AD_pro_ContentTranslation) to restrict usage of the ContentTranslation (CX) on the site.

  • The tool should be available only for autopatrolled user group. It should be disabled for autoconfirmed.
  • Revert T324721, the original machine translation doesn't need to be modified (useful for lists... and we assume that autopatrolled users wouldn't publish low quality translations).

Related:
T322100

Event Timeline

Pginer-WMF subscribed.

In case it provides some helpful context, I'm sharing some data for translations created on Czech Wikipedia since 2019.

Monthly translations show most of the time the monthly translations are in the 100-300/month range, with some spikes of activity at different points in time: January 2021 (44550 translations), June 2021 (536 translations) and November 2023 (315 translations):

monthly-translations-at-top-10-wikipedias-2023-12-18T13-11-34.794Z.jpg (376×875 px, 34 KB)

When looking at the translation activity for the same period grouped by the edit count of the users making those translations, we see that most of the translations are created by users with an edit count of 1K+ edits ( green line in the graph below) or 10K+ edits (orange line in the graph below):

monthly-translations-by-user-edit-count-bucket-2023-12-18T13-12-32.036Z.jpg (376×893 px, 51 KB)

The deletion rate for the translations created is normally below 10%, with some significant spikes in 2019 during April (35% translations deleted) and July (32% translations deleted). More recently, a spike during October 2023 resulted in 18% of the translations deleted. Unlike cases in some other wikis, the spikes in deletions does not clearly align with high spikes of activity (which often correspond to contests/campaigns), but it would be still interesting to know whether the community knows potential reasons for activity changes in these periods.

monthly-rate-of-deleted-translations-2023-12-18T13-12-14.605Z.jpg (400×875 px, 46 KB)


We consider that the proposed adjustment is not ideal since it is based on the user edit count and not the quality of the contents they produce. We are open to adjust the tool in the way that best helps the community. Given the community consensus, we'll proceed with the changes proposed and will report back with more data after a period of time to have a sense on the impact.
Meanwhile feel free to share any thoughts, data and impressions about the quality of the translations that Czech Wikipedia editors create with Content Translation.

Pginer-WMF triaged this task as Medium priority.Dec 18 2023, 3:47 PM

Thanks for the stats.

We consider that the proposed adjustment is not ideal since it is based on the user edit count and not the quality of the contents they produce.

The proposed adjusment isn't based on editcount. (It's based on the membership in the usergroup (autopatrolled), which is managed by local 'crats, who assign these rights thoroughly. They usually assess promotions to that group made by experienced patrollers or administrators.)

We cannot measure the quality of the the content easily and the metrics of the translation match isn't ideal as proved in the RfC. Conversely, the autopatrol rights assignment is quite good measurement of the quality of the contributions made.

The RfC clearly stated the unanimous support for the CX restriction based on various experiences from the RC / NP patrolling.

Thanks for the stats.

We consider that the proposed adjustment is not ideal since it is based on the user edit count and not the quality of the contents they produce.

The proposed adjusment isn't based on editcount. (It's based on the membership in the usergroup (autopatrolled), which is managed by local 'crats, who assign these rights thoroughly. They usually assess promotions to that group made by experienced patrollers or administrators.)

Thanks for the clarification. For some reason, I was thinking on "Extended confirmed users" which are based on edit count instead. In this case, it may be useful to compile some stats about the percentage of current translators that are autopatrolled and the differences in deletion rates for the translations they create.

Looking at the data from the current year (Jan-Nov 2023):

  • Most translations (73%) were created by autopatrollers. Disabling the tool for other groups would have affected the remaining 27% of translations.
  • Translations by autopartollers have a much lower deletion rate (0.1%) compared to those of other users (25% of deletion rate).
  • The deletion rate for non-autopatrollers when they create a new article without using Content translation is similar (25%). This may raise the question on why limiting only the creation of articles through Content Translation for this group when it does not seem to be more problematic than the alternatives. We may want to check the numbers after the change to try to identify if we are just redirecting newcomers to start their articles from scratch with identical results (same deletion rates, and same work for reviewers).

Overall the proposal seems likely to have a positive impact in terms of the quality of the translations produced, by excluding the publication of the third of the translations where most deletions happen. The main risks seem related to a potential transfer of inexperienced editors to other contribution activities where they are not better equipped to succeed (creating a new article form scratch) , and the fairness of limiting a way to contributing to some users that may prefer it where numbers don't suggest ist has a more negative impact than others where such limits do not exist.


The queries on translation activity, and deletion rates for translations by autopatrollers, translations by non-autopatrollers, and non-translations by non-autopatrollers were compiled using Turnilo which unfortunately has no publicly accessible instance yet. I'm keeping the links to the queries for those with access (or future reference to myself)

Thanks for the clarification. For some reason, I was thinking on "Extended confirmed users" which are based on edit count instead. In this case, it may be useful to compile some stats about the percentage of current translators that are autopatrolled and the differences in deletion rates for the translations they create.

Please be respectful of the discussion that took place, the result of which was a clear consensus (autopatrolled). Delaying a request has a negative impact on the experienced (high Machine translation limit)

The discussion was based on the real experiences of users and administrators. In my opinion, this is more meaningful than just analyzing statistics. Specifically, your statistics will only take into account the deletion of the article, not the low quality of the output, which requires extensive editing by experienced editors.

Thanks for the clarification. For some reason, I was thinking on "Extended confirmed users" which are based on edit count instead. In this case, it may be useful to compile some stats about the percentage of current translators that are autopatrolled and the differences in deletion rates for the translations they create.

Please be respectful of the discussion that took place, the result of which was a clear consensus (autopatrolled). Delaying a request has a negative impact on the experienced (high Machine translation limit)

The discussion was based on the real experiences of users and administrators. In my opinion, this is more meaningful than just analyzing statistics. Specifically, your statistics will only take into account the deletion of the article, not the low quality of the output, which requires extensive editing by experienced editors.

As one of the most active cswiki admins, I agree with Jklamo. Phabricator is only supposed to technically mediate local community decisions. It is not intended to serve as a place to submitting of "expedient" alternatives.

The discussion was based on the real experiences of users and administrators. In my opinion, this is more meaningful than just analyzing statistics. Specifically, your statistics will only take into account the deletion of the article, not the low quality of the output, which requires extensive editing by experienced editors.

I agree that stats don't tell the whole story, and it would be great to have reliable measurements about quality beyond deletions. I was trying to provide additional context by surfacing some stats that may be harder to collect for some people in the community. As I mentioned in my comment (T353049#9414927) "Overall the proposal seems likely to have a positive impact in terms of the quality of the translations produced" so it is consistent with the experience reported by the editors. We'll apply these changes, but it is important for us to use these opportunities to learn more in detail which are the issues that communities face to make the tool better for Czech and the other 300+ Wikipedias where the tool is available.

The discussion was based on the real experiences of users and administrators. In my opinion, this is more meaningful than just analyzing statistics. Specifically, your statistics will only take into account the deletion of the article, not the low quality of the output, which requires extensive editing by experienced editors.

I agree that stats don't tell the whole story, and it would be great to have reliable measurements about quality beyond deletions. I was trying to provide additional context by surfacing some stats that may be harder to collect for some people in the community. As I mentioned in my comment (T353049#9414927) "Overall the proposal seems likely to have a positive impact in terms of the quality of the translations produced" so it is consistent with the experience reported by the editors. We'll apply these changes, but it is important for us to use these opportunities to learn more in detail which are the issues that communities face to make the tool better for Czech and the other 300+ Wikipedias where the tool is available.

Since this request has been around for many months, I would appreciate faster processing of this request.

Change #1025300 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/mediawiki-config@master] ContentTranslation: Update publishing setting for cswiki

https://gerrit.wikimedia.org/r/1025300