CX2: Infrastructure for section-level progress calculation
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	santhosh
	Apr 4 2017, 6:11 AM

Description

With the new approach of translation units, each section translation unit can report its progress and translation controller need to sum up it and save. There is no progressbar in translationview to present the progress, but it is used for the dashboard.

Details

	Subject	Repo	Branch	Lines +/-
	Translation progress calculation	mediawiki/extensions/ContentTranslation	master	+563 -5

Customize query in gerrit

Related Objects
Search...

View Standalone Graph

This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Status	Assigned	Task
		· · ·
Resolved	santhosh	T152586 Reorganize the CX classes using OOjs/OOUI (tracker)
Resolved	santhosh	T162113 CX2: Infrastructure for section-level progress calculation
		· · ·

Event Timeline

santhosh created this task.Apr 4 2017, 6:11 AM

Amire80 moved this task from Needs Triage to CX2 on the ContentTranslation board.Jun 26 2017, 6:07 AM

Arrbee added a project: ContentTranslation-FY2017-18.Jun 27 2017, 3:21 PM

Arrbee moved this task from To Triage to CX-OOjs on the ContentTranslation-FY2017-18 board.

Pginer-WMF added a project: Language-2018-Jan-Mar.Feb 20 2018, 8:44 AM

Pginer-WMF mentioned this in T190279: CX2: Too much unmodified content warning.Mar 21 2018, 1:19 PM

Pginer-WMF mentioned this in T190283: CX2: Prevent publishing translations with too much unmodified content.Mar 21 2018, 1:59 PM

Pginer-WMF added a project: Language-2018-Apr-June.Mar 28 2018, 10:01 AM

Pginer-WMF removed a project: Language-2018-Jan-Mar.Mar 28 2018, 10:11 AM

Pginer-WMF moved this task from Backlog to Priority backlog on the Language-2018-Apr-June board.Apr 10 2018, 7:21 AM

It is also decided to report MT abuse (MT beyond a threshold) at section level.

santhosh removed santhosh as the assignee of this task.Apr 26 2018, 6:43 AM

Pginer-WMF raised the priority of this task from Medium to High.Jun 6 2018, 10:41 AM

Arrbee removed a project: ContentTranslation-FY2017-18.Jun 25 2018, 11:22 AM

I have been reading the ve code trying to understand where we could hook. I can imagine two possible approaches.

Diffing

We store the original text [1][2] in an attribute of the section node (and expose it in the data model). When progress is queried we apply a similarity algorithm on the current text and stored text. I don't think it will be useful to track this on sub-section level (e.g. sentence annotations).

[1] Only in case of MT provider or source text is used as basis, for scratch we don't need to, as is is 100% user generated content
[2] We can store only the plain text to save space OR start loading the stored MT value from the corpora

Pros

Can reliably calculate the MT progress at any time
Likely simpler to implement.
Can start by storing a hash and only providing boolean value whether the text is modified at all
Stateless, no need to deal with any events

Cons

Similarity algorithms such as Levenshtein distance can be slow – caching can be used
Increased use of network (compression helps a bit) and database storage (unless we start loading the MT section section when restoring)

Change counting

We hook into ve.dm.Surface events history or transact which are related to the undo/redo functionality and document changes. For each event, we identify the affected section and increase the change counter. The progress value is then calculated by subtracting the number of changes from 100% scaled to the section length. I.e. for one word section, one change should be enough to reach at least 50% use generated content.

Pros

Less additional storage is needed
Faster to calculate

Cons

More complex to implement:
- Undo stack works on the document level. For every change we would need to find the affecte section.
- When storing the change counter, if we store in the node itself, we should avoid generating an endless loop of changes. If stored elsewhere, will complicate saving/restoring code
- Undo should decrease changes, not increase
Not as reliable. Different kind of changes are treated as equal (adding link, vs. deleting a significant amount of text).

santhosh claimed this task.Jul 6 2018, 9:36 AM

Pginer-WMF added a project: Language-2018-July-September.Jul 6 2018, 9:55 AM

Pginer-WMF moved this task from Backlog to Priority backlog on the Language-2018-July-September board.

Pginer-WMF moved this task from Priority backlog to In Progress on the Language-2018-Apr-June board.Jul 6 2018, 11:13 AM

Change 444208 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] WIP: Progress calculation

https://gerrit.wikimedia.org/r/444208

gerritbot added a project: Patch-For-Review.Jul 9 2018, 12:12 PM

Pginer-WMF moved this task from Priority backlog to In Progress on the Language-2018-July-September board.Jul 10 2018, 6:55 AM

Pginer-WMF removed a project: Language-2018-Apr-June.

• Petar.petkovic moved this task from In Progress to In Review on the Language-2018-July-September board.Jul 11 2018, 10:45 AM

• Petar.petkovic mentioned this in T199823: Some sections are missed while processing saving queue.Jul 17 2018, 6:19 PM

santhosh mentioned this in T200416: CX2: Identify section types to exclude from MT abuse test.Jul 26 2018, 10:51 AM

Change 444208 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Translation progress calculation

https://gerrit.wikimedia.org/r/444208

• Petar.petkovic removed a project: Patch-For-Review.Jul 26 2018, 11:23 AM

• Petar.petkovic moved this task from In Review to QA on the Language-2018-July-September board.

@Santosh - when testing in cx2, I noticed that the calculation of the progress is relative to the amount of translation that is done. e.g.

Translate an article for a big portion of text
Check the progress; the progress bar will have a correct display according to the amount of translated text.
Return to the article and add something little - the progress bar will reset and will display the "new" progress, that little amount that was changed.

In T162113#4455126, @Etonkovidova wrote:

@Santosh - when testing in cx2, I noticed that the calculation of the progress is relative to the amount of translation that is done. e.g.

Translate an article for a big portion of text

Check the progress; the progress bar will have a correct display according to the amount of translated text.

Return to the article and add something little - the progress bar will reset and will display the "new" progress, that little amount that was changed.

That is a known problem. Set to be solved with T200503.

@santhosh, here is another scenario where progress calculation breaks:

Add two paragraphs
Switch the second paragraph to "Don't use MT"
Return to dashboard after saving

Result - X% translated (200% percent from MT):

Another case for incorrect calculation:

Screen Shot 2018-07-30 at 4.49.26 PM.png (219×833 px, 60 KB)

Etonkovidova moved this task from QA to In Progress on the Language-2018-July-September board.Jul 30 2018, 11:51 PM

Overall progress calculation patch is not yet merged https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/447583 - This has corrections for overall translation progress. What is merged is section level abuse detection.

Arrbee moved this task from In Progress to QA on the Language-2018-July-September board.Aug 3 2018, 7:06 AM

Pginer-WMF renamed this task from CX2: Progress calculation to CX2: Infrastructure for section-level progress calculation.Aug 6 2018, 8:20 AM

Pginer-WMF closed this task as Resolved.Aug 30 2018, 8:22 AM

Pginer-WMF moved this task from QA to Done on the Language-2018-July-September board.

Liuxinyu970226 unsubscribed.Aug 31 2018, 12:04 AM

	F24348173: Screen Shot 2018-07-30 at 4.49.26 PM.png
	Jul 30 2018, 11:50 PM

	F24347924: cx2-200-percent.png
	Jul 30 2018, 9:40 PM

CX2: Infrastructure for section-level progress calculationClosed, ResolvedPublicActions