Page MenuHomePhabricator

Translation time estimations are very underestimated
Open, MediumPublic4 Estimated Story PointsBUG REPORT

Description

I've translated to French yesterday the https://en.wikipedia.org/wiki/Calcium_lactate article, which made me realize that the estimated translation time is probably way too low. The interface was estimating a bit less than an half an hour (22 minutes if I remember correctly), and it took me several (> 5) hours. Granted, I'm an inexperienced editor, but it made me curious about the estimate computation.

My understanding of the code is that the translation time is estimated in https://gerrit.wikimedia.org/g/mediawiki/extensions/ContentTranslation/+/master/app/src/composables/useTranslationSize.js with the assumptions of 5 bytes in a word and 200 words per minute.

This would yield 12000 words per hour, which is very unrealistic. For reference, https://en.wikipedia.org/wiki/Postediting (which feels like a reasonable approximate of ML-assisted content translation in the extension) indicates estimates of 1000 word per hour in light post-editing, which is a/ an order of magnitude lower b/ probably still an underestimation since ContentTranslation usage is meant to be beyond "light" post-editing (and probably used by non-professional translators).

Such a discrepancy feels like it would be discouraging to new editors/translators and reduce the number of returning users.


Update from team discussion on 2025-10-29: We want to revise the estimation algorithm based historical data of time spent translating. We would like a data analyst to take this on, or at least their advice on how to do it ourselves.

Event Timeline

Nikerabbit moved this task from Needs Triage to Dashboard on the ContentTranslation board.
Nikerabbit added a subscriber: eamedina.

This article (of unknown reliability) on MT post editing cites studies that reported numbers between 700 and 1100 (with outliers going much higher) words per hour. Which would translate to about 12 to 18 words per minutes.

If I take the Calcium lactate example from the task description, the source article on the day the translation was 21,618 bytes. Using our existing 5 bytes per word estimate and let's say 15 words per minutes, gives an estimate of 288 minutes, or 4h48m, which is getting close to the > 5 hours reported.

While somewhat anecdotal, I think moving the word per minute estimate to something is that range (12 - 18) would be a lot more realistic already.

Change #1185214 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/ContentTranslation@master] Reduce estimated translation speed to 15 words per minute

https://gerrit.wikimedia.org/r/1185214

Change #1185214 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Reduce estimated translation speed to 15 words per minute

https://gerrit.wikimedia.org/r/1185214

Change #1186648 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/ContentTranslation@master] CX3 Build 1.0.0+20250909

https://gerrit.wikimedia.org/r/1186648

Change #1186650 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/ContentTranslation@wmf/1.45.0-wmf.18] CX3 Build 1.0.0+20250909

https://gerrit.wikimedia.org/r/1186650

Change #1186648 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX3 Build 1.0.0+20250909

https://gerrit.wikimedia.org/r/1186648

Change #1186650 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.45.0-wmf.18] CX3 Build 1.0.0+20250909

https://gerrit.wikimedia.org/r/1186650

Mentioned in SAL (#wikimedia-operations) [2025-09-10T13:54:25Z] <kartik@deploy1003> sbisson, kartik: Backport for [[gerrit:1186650|CX3 Build 1.0.0+20250909 (T374886 T394998 T399122 T399125 T399133 T403730 T404045 T404093)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-09-10T14:12:05Z] <kartik@deploy1003> Finished scap sync-world: Backport for [[gerrit:1186650|CX3 Build 1.0.0+20250909 (T374886 T394998 T399122 T399125 T399133 T403730 T404045 T404093)]] (duration: 24m 08s)

Nikerabbit set the point value for this task to 2.Sep 11 2025, 12:19 PM

Following further testing within the team, general feedback is time estimations are still a bit off and discouraging. Moving back for more fine-tuning

SBisson moved this task from Prioritized to Backlog on the LPL Hypothesis board.
SBisson moved this task from Incoming to Prioritized on the LPL Hypothesis board.
SBisson lowered the priority of this task from High to Medium.Tue, Jan 20, 4:01 PM