Re-run the MT service usage report after MinT is made available to a broad set of languages
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Pginer-WMF
	Jun 9 2023, 12:50 PM

Description

After the two previous reports about machine translation usage (T303812 , T310773), there has been significant change in machine translation support. The introduction of MinT exposed several translation models such as NLLB-200 and Opus to many languages: several languages getting machine translation support for the first time, others getting an alternative that may or may not be preferred to the existing alternatives.

Once the process of enabling languages has stabilized, we may want to re-run the report. In this way we can understand which services are used (consider adjust default options) and have a sense of the impact in terms of edits to the initial translation and deletion rates.

Codes for the supported languages with MinT are available in this configuration file.

Result

Report of MT usage
View as HTML (if the above the link doesn't work, due to nbviewer.org downtime)

The results of this ticket will inform the change of defaults: T341458: Set MinT as the default for languages where it is optional but frequently used

Related Objects

Mentioned In: T370749: Re-run the MT service usage report (2024)
T132542: Collect usage data about Apertium use in Content Translation
T353140: Propose MinT as the default over proprietary services where community feedback on quality does not indicate otherwise
T353145: Potential languages for MinT to be proposed as default, based on Nov 2023 MT usage report
T349688: Analyze translation activity for languages supported by MinT as their only option
T345371: Check and document translation samples for language pairs supported by Softcatalà and NLLB-200
T284905: Softcatalà translator - requested for integration as an MT service for CX
T341458: Set MinT as the default for languages where it is optional but frequently used
T310773: Re-run and update the MT service usage report
Mentioned Here: T341458: Set MinT as the default for languages where it is optional but frequently used
T333969: Enable Opus models for languages lacking other Machine Translation options
T340953: Enable MinT for all the remaining languages supported by NLLB-200
T303812: Identify languages where each MT service is most popular
T310773: Re-run and update the MT service usage report

Event Timeline

Pginer-WMF created this task.Jun 9 2023, 12:50 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 9 2023, 12:50 PM

Pginer-WMF triaged this task as Medium priority.Jun 9 2023, 12:50 PM

Pginer-WMF added a project: Language-Team (Language-2023-April-June).

Pginer-WMF mentioned this in T310773: Re-run and update the MT service usage report.

Pginer-WMF edited projects, added Language-Team (Language-2023-July-September); removed Language-Team (Language-2023-April-June).Jun 28 2023, 8:08 PM

Once the enablements from T340953 and T333969 are completed, we can wait two months and re-run the report.

Pginer-WMF mentioned this in T341458: Set MinT as the default for languages where it is optional but frequently used.Jul 10 2023, 11:16 AM

Pginer-WMF updated the task description. (Show Details)Jul 10 2023, 1:19 PM

Nikerabbit moved this task from Needs Triage to MT on the ContentTranslation board.Aug 28 2023, 11:43 AM

UOzurumba subscribed.Aug 28 2023, 7:37 PM

Pginer-WMF mentioned this in T284905: Softcatalà translator - requested for integration as an MT service for CX.Aug 31 2023, 12:26 PM

Pginer-WMF mentioned this in T345371: Check and document translation samples for language pairs supported by Softcatalà and NLLB-200.Aug 31 2023, 3:28 PM

Pginer-WMF moved this task from Backlog to Priority Backlog on the Language-analytics board.Sep 1 2023, 2:58 PM

MNeisler assigned this task to KCVelaga_WMF.Sep 11 2023, 3:34 PM

MNeisler added a project: Product-Analytics.

KCVelaga_WMF edited projects, added Product-Analytics (Kanban); removed Product-Analytics.Sep 28 2023, 8:20 AM

Pginer-WMF edited projects, added Language-Team (Language-2023-October-December); removed Language-Team (Language-2023-July-September).Sep 28 2023, 9:56 AM

Pginer-WMF moved this task from Quarter Backlog to Priority: Translation on the Language-Team (Language-2023-October-December) board.

Pginer-WMF raised the priority of this task from Medium to High.Oct 2 2023, 9:31 AM

KCVelaga_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.Oct 2 2023, 10:21 AM

@Pginer-WMF We spoke to analysing the data from the last two months, which is fine for most of the report. But the previous report has section where the usage stats were compared pre/post some changes (new language additions and changes to default service). I am wondering if you would be interested in something similar for this report as well, especially as the focus is on MinT. If yes, is there a date that can be used as reference point to compare pre/post?

In T338606#9241687, @KCVelaga_WMF wrote:

@Pginer-WMF We spoke to analysing the data from the last two months, which is fine for most of the report. But the previous report has section where the usage stats were compared pre/post some changes (new language additions and changes to default service). I am wondering if you would be interested in something similar for this report as well, especially as the focus is on MinT. If yes, is there a date that can be used as reference point to compare pre/post?

The previous report has a "Changes in machine translation usage since the first report" section. A similar comparison can be useful to have a sense of how the usage of the different options evolve as new services and more languages are supported. For example, as MinT supports more languages the expectation is for the percentage of translations started from scratch to be reduced, and it would be interesting to check how much.

The recent expansion period where MinT was enabled for hundreds of languages was between May and July 2023. So I think it makes sense to compare how the usage was distributed before and after the period. For the "after" period I think it makes sense to use the range we are usign for the overall report (August - September 2023). For the "before" period I can think of two options:

A period of similar length immediately before the recent enblements started (March - April 2023)
Have a Year-ove-Year (YoY) comparison (August - September 2022) since that is a period that takes pace after the enablements of the 2nd report and before the recent set of enablements.

I'd be inclined to the YoY comparison since it has the benefit of accounting for seasonality, but happy to go with a more recent period (or something different) if you think it is better.

Pginer-WMF mentioned this in T349688: Analyze translation activity for languages supported by MinT as their only option.Oct 25 2023, 10:01 AM

Pginer-WMF updated the task description. (Show Details)Nov 3 2023, 12:56 PM

@Pginer-WMF the report is complete, this is the new link.

KCVelaga_WMF moved this task from Priority: Translation to Done on the Language-Team (Language-2023-October-December) board.Nov 9 2023, 9:57 AM

KCVelaga_WMF updated the task description. (Show Details)Nov 9 2023, 12:02 PM

Pginer-WMF closed this task as Resolved.Nov 11 2023, 2:26 AM

One consideration for the "scratch" concept, is that we may want to distinguish two scenarios:

Starting from scratch when there are other MT service. this represents a signal of the MT services being of low quality and a preference for the user.
Starting from scratch when there is no other option. This just reflects the lack of availability of MT options. It can be still useful to identify which languages can benefit the most from getting MT, but it does not capture a user preference since there were no options. For example, when translating to English, German and Japanese, machine translation is not enabled in content translation.

Having a break down for the current "scratch" numbers could be a useful consideration for the next re-run

Another consideration for the next re-run based on the feedback we received: It may be relevant to incorporate the perspective of user expertise. Are newcomers/experts using one service more, are their translations deleted more often?
For example, on wikis where MinT is provided as optional, maybe experienced editors are the ones more prone to find (and willing to try) the option.

I'm not sure which is the best way to capture this to have useful information while not fragmenting the data representations too much. We can talk more on this for the next iteration.

KCVelaga_WMF updated the task description. (Show Details)Dec 10 2023, 4:56 AM

KCVelaga_WMF mentioned this in T353145: Potential languages for MinT to be proposed as default, based on Nov 2023 MT usage report.Dec 11 2023, 1:10 PM

KCVelaga_WMF updated the task description. (Show Details)Dec 20 2023, 12:30 PM

Pginer-WMF mentioned this in T353140: Propose MinT as the default over proprietary services where community feedback on quality does not indicate otherwise.Mar 27 2024, 1:43 PM

KCVelaga_WMF mentioned this in T132542: Collect usage data about Apertium use in Content Translation.Jul 9 2024, 7:08 AM

Pginer-WMF mentioned this in T370749: Re-run the MT service usage report (2024).Jul 23 2024, 8:32 AM