Page MenuHomePhabricator

Design and develop API to save the source, initial MT and final translation at section level to parallel corpora table
Closed, ResolvedPublic

Event Timeline

santhosh created this task.Dec 2 2015, 6:16 AM
santhosh raised the priority of this task from to Normal.
santhosh updated the task description. (Show Details)
santhosh added subscribers: Pginer-WMF, santhosh, Amire80.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 2 2015, 6:16 AM
santhosh set Security to None.Dec 3 2015, 10:48 AM
santhosh edited a custom field.

Approach:

  • On edit events(mw.cx.translation.change), mark the section dirty
  • On save event handler( mw.cx.translation.save), collect all the sections that are dirty and post a json form of sectionid-sectiondata collections to API to save
  • After save remove the dirty mark from the translation sections.
  • While saving a translation section, collect its source section as well. But once source section is saved, mark it saved(some annotation on section markup will do). So that we save source sections only one
  • When we restore, the first saves on translation sections will include its source sections as well, since there is a chance that source article changed
  • The api should accept(POST) requests with more than one sectionid- section data so that we can send a collection of them to API.
  • In this iteration, the above approach will generate a tech debt of draft translation send twice. One for parallel corpora and one for draft translations. This can be addressed with a followup to terminate using cx_drafts table.
  • The JSON post data that the API accepts can have the type (source/translation) so that we can send mix of both in single request
  • In addition to `mw.cx.translation.save, we also want to send the initial translations to the corpora. For that we can use mw.cx.translation.postMT events
Amire80 moved this task from Needs Triage to CX7 on the ContentTranslation board.
santhosh moved this task from Backlog to In Review on the LE-CX7-Sprint 4 board.

Change 257283 had a related patch set uploaded (by Santhosh):
Parallel corpora: Implement storage

https://gerrit.wikimedia.org/r/257283

Change 257283 merged by jenkins-bot:
Parallel corpora: Implement storage

https://gerrit.wikimedia.org/r/257283

Nikerabbit moved this task from In Review to QA on the LE-CX7-Sprint 4 board.Dec 17 2015, 10:34 AM
santhosh moved this task from Backlog to In Progress on the LE-CX8-Sprint 1 board.Jan 18 2016, 6:58 AM
Amire80 moved this task from CX7 to CX8 on the ContentTranslation board.Jan 20 2016, 11:14 PM
Amire80 moved this task from CX8 to CX7 on the ContentTranslation board.

In this iteration, the above approach will generate a tech debt of draft translation send twice. One for parallel corpora and one for draft translations. This can be addressed with a followup to terminate using cx_drafts table.

See T124399: Migrate the draft translation restore mechanism to use data from cx_corpora table

santhosh closed this task as Resolved.Jan 22 2016, 10:25 AM

Parallel text at section level are capturing now. See https://www.mediawiki.org/wiki/Content_translation/Published_translations

santhosh moved this task from In Progress to QA on the LE-CX8-Sprint 1 board.Jan 22 2016, 10:25 AM
Arrbee moved this task from QA to Done on the LE-CX8-Sprint 1 board.Jan 27 2016, 12:08 PM