Page MenuHomePhabricator

Design and develop API to save the source, initial MT and final translation at section level to parallel corpora table
Closed, ResolvedPublic

Event Timeline

santhosh raised the priority of this task from to Medium.
santhosh updated the task description. (Show Details)

Approach:

  • On edit events(mw.cx.translation.change), mark the section dirty
  • On save event handler( mw.cx.translation.save), collect all the sections that are dirty and post a json form of sectionid-sectiondata collections to API to save
  • After save remove the dirty mark from the translation sections.
  • While saving a translation section, collect its source section as well. But once source section is saved, mark it saved(some annotation on section markup will do). So that we save source sections only one
  • When we restore, the first saves on translation sections will include its source sections as well, since there is a chance that source article changed
  • The api should accept(POST) requests with more than one sectionid- section data so that we can send a collection of them to API.
  • In this iteration, the above approach will generate a tech debt of draft translation send twice. One for parallel corpora and one for draft translations. This can be addressed with a followup to terminate using cx_drafts table.
  • The JSON post data that the API accepts can have the type (source/translation) so that we can send mix of both in single request
  • In addition to `mw.cx.translation.save, we also want to send the initial translations to the corpora. For that we can use mw.cx.translation.postMT events

Change 257283 had a related patch set uploaded (by Santhosh):
Parallel corpora: Implement storage

https://gerrit.wikimedia.org/r/257283

Change 257283 merged by jenkins-bot:
Parallel corpora: Implement storage

https://gerrit.wikimedia.org/r/257283

Amire80 moved this task from CX8 to CX7 on the ContentTranslation board.

In this iteration, the above approach will generate a tech debt of draft translation send twice. One for parallel corpora and one for draft translations. This can be addressed with a followup to terminate using cx_drafts table.

See T124399: Migrate the draft translation restore mechanism to use data from cx_corpora table