Page MenuHomePhabricator

Catch cases in which an internally saved translation is significantly shorter than a previously saved translation
Closed, InvalidPublic

Description

People complain about lost translations again, for example https://www.mediawiki.org/wiki/Topic:Soykmh0gsdacf99j

It would be useful if the software could guard itself against this by checking whether a translation that is being saved internally is significantly shorter than a previously saved version of the same translation. A few deleted words are probably not important, but a whole deleted paragraph is worrying, and several deleted paragraphs or a completely disappeared text is very likely data loss.

This could be used for:

  • prevention of data loss (just don't save this new problematic version and don't overwrite a previous good one)
  • at least, measuring how often does it happen (sometimes people report it and sometimes they don't), and setting a goal of reducing or eliminating the phenomena

It's challenging, but possible and important.

Event Timeline

Amire80 raised the priority of this task from to High.
Amire80 updated the task description. (Show Details)
Amire80 added subscribers: Arrbee, santhosh, Amire80 and 2 others.
santhosh claimed this task.

In the analysis, we did not see a case where some sections missed to save.
More importantly, wee are moving towards per section saving as part of ongoing parallel corpora work. The above approach no longer applies now.