Page MenuHomePhabricator

Re-run and update the MT service usage report
Closed, ResolvedPublic

Description

An initial report was created to capture how the different translation services were used (T303812). The initial report informed some changes in the services provided by default on certain language pairs (T309266). After there is time to those changes to reflect in the data as well as the support from different translation services for new languages (T307970, T308248) we may want to re-generate the report again with more up-to-date data.

In addition to re-running the report, we may consider some adjustments:

  • Surface optional services that are close to the defaults. In the first run the focus was on optional services that were used more than the defaults. Since being the default may have an advantage to get a higher use even when it is not the best option, we may want to surface cases where the optional service is close to the default even if it is less used than the default (e.g., with a 10%-20% difference). This is supported by our experience with Icelandic (36% Flores vs 60% Google as the default) and Basque (43% Elia vs 56% Google as the default) where the non-default options were considered better based on community feedback.
  • Include deletion rates in the MT modifications graph. The graphs at the end of the report indicate the percentage of translations with different levels of modifications. This is useful information but may be hard to interpret (is a high number of translations lightly edited positive to signal good MT quality or negative indicating they have not been edited enough?). Showing the deletion rates in those graphs could help to better triangulate the different quality aspects.
  • Include more languages in the MT modifications graph. Also, more languages can be included in the last graph since more will be supported by Flores, which enables the comparison with more languages supported by multiple services.

Result: Published report

Event Timeline

Pginer-WMF triaged this task as Medium priority.Jun 16 2022, 9:21 AM
Pginer-WMF moved this task from Backlog to Priority Backlog on the Language-analytics board.

We received feedback indicating that NLLB-200 was perceived as better than the current default for Bashkir (Yandex). So we may want to consider this input as we analyze the results of the new report and propose additional adjustments for the default services.

MNeisler subscribed.

@Pginer-WMF Here is the updated machine translation service report for your review. Please let me know if you have any questions or changes you'd like me to incorporate.

Reviewed timeframe: The report now reflects translation data logged from 1 Feb 2022 through 20 September 2022. Pre and post-deployment changes due to various machine translation service changes are provided in the report where relevant.

Some key findings:

  • Overall changes in machine translation service usage:
    • The biggest changes in usage were observed for NLLB-200 (5 percentage point increase [0.95% to 6.05%]) and Apertium (4.3 percentage point decrease [6.64% to 2.31%]) machine translation service since the first report was run.
    • There was also a decrease in the percent of published translations started from scratch (no machine translation content was used) (2.7 percentage point decrease [6.85% to 4.17%]).
  • I identified 18 Language pairs where an optional mt service is used close to (within a 20% difference) or more than the current default service. This is based on machine translation service usage after the initial set of defaults were adjusted in T309266.
  • The changes in machine translation service availability and default settings are reflected in the data (We see expected increases or decreases where the changes occur).
  • We now have sufficient data to review deletion rates for NLLB-200 in comparison to other MT services. I added a section to the report "Percent of articles that are created with each MT service and deleted" that provides an overall comparison of deletion ratios for each machine translation service and also a per wiki deletion ratios for all NLLB-supported languages. Please let me know if there are any other sets of languages that would be worth comparing further.

We received feedback indicating that NLLB-200 was perceived as better than the current default for Bashkir (Yandex). So we may want to consider this input as we analyze the results of the new report and propose additional adjustments for the default services.

Yandex still accounts for the majority of translations at Bashkir (93%) but usage of NLLB-200 is increasing. There are now about 6% of published translations at that language were created using NLLB-200, since it was deployed in June 2022 as a non-default service to the language. To date, NLLB-200 has been used for the Russian (ru) to ba language pair (6.5%, 45 translations) and egl- Bashkir (ba) language pair (50%, 1 translation).

Per Language Pair Analysis Data
Please refer to the google spreadsheet for details on the machine translation service usage and deletion rates by language pair. Key observations and overall findings for each MT service are summarized in the report but the google sheet can be used if you'd like to review data for any specific language pair.