Page MenuHomePhabricator

Re-run and update the MT service usage report
Open, MediumPublic


An initial report was created to capture how the different translation services were used (T303812). The initial report informed some changes in the services provided by default on certain language pairs (T309266). After there is time to those changes to reflect in the data as well as the support from different translation services for new languages (T307970, T308248) we may want to re-generate the report again with more up-to-date data.

In addition to re-running the report, we may consider some adjustments:

  • Surface optional services that are close to the defaults. In the first run the focus was on optional services that were used more than the defaults. Since being the default may have an advantage to get a higher use even when it is not the best option, we may want to surface cases where the optional service is close to the default even if it is less used than the default (e.g., with a 10%-20% difference). This is supported by our experience with Icelandic (36% Flores vs 60% Google as the default) and Basque (43% Elia vs 56% Google as the default) where the non-default options were considered better based on community feedback.
  • Include deletion rates in the MT modifications graph. The graphs at the end of the report indicate the percentage of translations with different levels of modifications. This is useful information but may be hard to interpret (is a high number of translations lightly edited positive to signal good MT quality or negative indicating they have not been edited enough?). Showing the deletion rates in those graphs could help to better triangulate the different quality aspects.
  • Include more languages in the MT modifications graph. Also, more languages can be included in the last graph since more will be supported by Flores, which enables the comparison with more languages supported by multiple services.

Event Timeline

Pginer-WMF triaged this task as Medium priority.Jun 16 2022, 9:21 AM
Pginer-WMF moved this task from Backlog to Priority Backlog on the Language-analytics board.