Page MenuHomePhabricator

Restore using updated content only when no user data is lost, using the original version as a fallback
Closed, ResolvedPublic

Description

When restoring an in-progress translation for which there is a new version of the source article we want to do the following:

  • If the new version of the source can be loaded without causing any data loss to the user translation, load the new version.
  • If the new version cannot be loaded without causing any data loss to the user translation, load the original version and show the warning (T168287).
    • We need to make sure that, as the warning promises, the user can restart the translation to use the newer version (knowing that this implies discarding all the work done so far).

If the user adds a paragraph to the translation, the new version may show updated contents for the original paragraph, but never remove the corresponding paragraph in the translation. If the original paragraph can no longer be mapped (e.g., it was deleted or changed significantly), then the old version (the one the user started the translation with) will be used instead to prevent the user content to disappear. In this way, the source article can be updated but only when it is safe to do so.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2019, 7:41 AM
Pginer-WMF triaged this task as Normal priority.Apr 12 2019, 7:41 AM
Pginer-WMF raised the priority of this task from Normal to High.Apr 12 2019, 7:53 AM
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF moved this task from Needs Triage to CX2 on the ContentTranslation board.

Matching content for restoring is done frontend, on client side. What is proposed in this ticket would look like this:

  1. User selects draft to continue translation
  2. Latest revision of that article is used for source
  3. We go through the list of saved paragraphs which we translated and try matching by comparing source paragraph with saved source for that paragraph
  4. When we find for the first time that any saved paragraph cannot be matched against any source section, we reload the page and append revision=12345 to the URL, where (12345) is the number of revision used to translate paragraph in the past
  5. Since page got reloaded, we go through the list of saved paragraphs again (repeat point 3)

@Pginer-WMF, this reload comes with downsides. I haven't checked, but I think all other UI elements are loaded already and loading incidator is shown when we do the step #3, and when reload comes after that, it becomes very clear. Plus, it takes some seconds of user's time.

@Nikerabbit, @santhosh Any ideas how to make this process work without reloading and without rewriting matching logic in PHP?

@Pginer-WMF, this reload comes with downsides. I haven't checked, but I think all other UI elements are loaded already and loading incidator is shown when we do the step #3, and when reload comes after that, it becomes very clear. Plus, it takes some seconds of user's time.

Ideally avoiding the reloading or limiting it to the document part (not the whole UI) would be preferred. In any case, the warning that is shown (T168287) would help to clarify why the reloading happened to the user, so it seems an acceptable initial solution if it is not easy to prevent it.

Another aspect to consider, is to prevent to create an infinite loop. We need to make sure that if we are not able to load the latest version, we load the original version, and that loading has to complete (i.e., other issues should not trigger another loading attempt).

Like Pau said, it would be nice to limit the "reloading" part only to the content area, not the whole page. That will likely require some rewriting in the init logic.

I'd also be okay with loading the original revision, and then doing a background check whether it could be "rebased" on a later version cleanly, and if so prompt the user in a non-disturbing way. I'm sure that was an option we considered, but I'm not sure whether that was discussed and/or discarded.

I'd also be okay with loading the original revision, and then doing a background check whether it could be "rebased" on a later version cleanly, and if so prompt the user in a non-disturbing way. I'm sure that was an option we considered, but I'm not sure whether that was discussed and/or discarded.

It is a good question to consider which should be the default behaviour and who to tax with an additional decisions.

The proposed approach is based on the expectation that the source article should be updated transparently when the changes are not problematic. For example, if the user notices a typo in the source article, then fixes it, and goes back to translate, getting the source updated transparently seems to meet the expectations. This approach is aimed to maximize the chances for the user to use the most recent content as long as it does not conflict with the current work done.

Reversing that seems more problematic. Users have to confirm for the updates, which discourages from using more up to date content. In addition, if that process is not blocking, users modifying the translation can generate new conflicts that may make it harder to resolve.

Change 520659 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Use original revision when latest one causes data loss

https://gerrit.wikimedia.org/r/520659

@Pginer-WMF two remarks:

  1. In my patch for this ticket, latest revision is used as source initially. This was behavior so far, and we used to stop there. Now, if there is any non-restored section, we load original revision, which was used to start translation. For example, say initial revision of source article has 10 sections and we translate all of them, but don't publish. Then, one section is deleted from source article, and when we try reloading the draft, that section, which we translated previously, is lost, so we drop latest revision and proceed with original one.
  2. When user clicks on draft inside dashboard, there's no revision param in URL. But if fallback to original revision happens, that param is appended and warning, which allows to restart translation, is registered. But, if you press reload button, then URL already has revision param, so we immediately start loading that specific revision and don't check if there is newer one, so warning isn't shown. This happens on reload and can also happen if user manually plays with URL, which is very unlikely.

Change 520659 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Use original revision when latest one causes data loss

https://gerrit.wikimedia.org/r/520659

@Pginer-WMF two remarks:

Thanks for the clear details, @Petar.petkovic. The behaviour described makes sense, and (2) seems perfectly acceptable.

Jpita added a subscriber: Jpita.Jul 5 2019, 3:04 PM

For "normal" cases this seems to be working fine: editing and deleting text on the original article after starting a translation.
With special cases I saw two issues:

@Petar.petkovic I'll run a few more tests to try to add some more information here

Jpita added a comment.Jul 7 2019, 12:43 PM

When I bring up an old translation from a deleted article I get his error.
Not sure it is related to this task or not.

Change 521384 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Properly trim strings when matching for included content

https://gerrit.wikimedia.org/r/521384

This was caused by minor bug in restoring content. Provided a fix for that.

When users translate articles, their progress is saved alongside their translated paragraphs. If that translation is restored later, progress is calculated again and the numbers are compared. In case of mismatch, the message is logged in console as error. We've been seeing that error for a long time and no action is taken to find out and solve the cause.

When I bring up an old translation from a deleted article I get his error.
Not sure it is related to this task or not.

This is now tracked in T227493. Also, the article you point to isn't deleted, second revision just adds proposal that admins should delete the article.

Change 521384 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Properly trim strings when matching for included content

https://gerrit.wikimedia.org/r/521384

Jpita closed this task as Resolved.Jul 15 2019, 9:43 PM