Page MenuHomePhabricator

After re-marking an updated page for translation, FuzzyBot does not react, or only ports over the previous update
Closed, ResolvedPublic

Description

NOTE: For a summary on related issues, patches and upgrade avenues, see T48716: Translation page does not contain the latest translations/last translation.

Example:

https://meta.wikimedia.org/w/index.php?title=Special%3ALog&type=pagetranslation&user=&page=Wikimedia+Highlights%2C+June+2013
https://meta.wikimedia.org/w/index.php?title=Wikimedia_Highlights,_June_2013/ml&action=history
https://meta.wikimedia.org/w/index.php?title=Wikimedia_Highlights,_June_2013&action=history

Consider the following three times the original page was re-marked for translation, each time incorporating a different change in the non-translateble part (which FuzzyBot should have ported over to the ml translation):

  • 18:15, 19 July 2013,
  • 18:29, 19 July 2013,
  • 00:48, 20 July 2013.

What happened was:

  • After the 18:15 update, FuzzyBot did not make any edits
  • After the 18:29 update, FuzzyBot edited the translated page to incorporate the 18:15 change
  • After the 00:48 update, FuzzyBot edited the translated page to incorporate the 18:29 change
  • The 00:48 update is still lacking in the translated page

See also

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:53 AM
bzimport set Reference to bz51731.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 54915 has been marked as a duplicate of this bug. ***

And I still don't have environment for testing replication lag issues :(

Can this be fixed please? Very very annoying bug. Causes many problems...

I'm hoping that once we have time to setup the replication lag environment to usable state again, we will find clues how to fix this problem.

I spent some time these 3 days trying to debug this as it's one of the most annoying bugs I encounter while working with translatable pages on Meta.

From my investigation, I think it's safe to conclude that this is not related to slave lag as I don't have a master/slave setup locally but am able to reproduce it. The job runner is configured as the default and also with default rate (no dedicated job runner like redis; jobs are run within the Special:RunJobs request itself).

In SpecialPageTranslation::markForTranslation(), TranslateRenderJobs for updating the translation pages are pushed to the queue first and then MessageUpdateJobs are pushed to the queue to update the translation units. These jobs can be run in any order but usually it's sequentially (I think). However, TranslateRenderJob depends on the translation units to be updated through MessageUpdateJobs before it is run because TranslateRenderJob retrieves the content for the translation page edit through the updates from that job.

To confirm this and to make it easier to debug, I locally hacked markForTranslation() to make the jobs be run() immediately instead of pushing them to the queue. If the order is changed so that getTranslationUnitJobs() is run before getRenderJobs(), this bug cannot be reproduced at all but is present if it's in the current order. Note that the /en page would be updated but this is because of the edits to the unit pages which trigger the hook making the TranslateRenderJob be run repeatedly on that page. We should probably stop doing that.

Thanks a lot for looking into this, it is very helpful, especially the part about reproducing which I have not been able to for some reason (not trying hard enough?). I hope we are going to start fixing this very soon now :)

Change 279635 had a related patch set uploaded (by Glaisher):
Make sure MessageUpdateJobs are run before TranslateRenderJobs when marking

https://gerrit.wikimedia.org/r/279635

Steps to reproduce:

  1. Create a translatable page and mark for translation
  2. Add a translation somewhere.
  3. Update the source page and mark for translation with invalidation enabled.
  4. Changes are not propagated to the translated page.

See https://meta.wikimedia.org/wiki/User:Glaisher/T53731 and https://meta.wikimedia.org/wiki/User:Glaisher/T53731/es (with history and logs). Also see https://meta.wikimedia.org/wiki/User:Glaisher/T53731/en
It cannot be reproduced if there was no change to a translation between the mark. I guess it's because TPParse::getText() filters for translated messages only?

Change 279635 merged by jenkins-bot:
Make sure MessageUpdateJobs are run before TranslateRenderJobs when marking

https://gerrit.wikimedia.org/r/279635

Once the patch is deployed, refresh-translatable-pages needs to be run on all wikis using Translate.

Nikerabbit assigned this task to Glaisher.

I am marking this resolved. Using the tracker for follow-up work instead.

Note: I was just looking at https://grafana.wikimedia.org/dashboard/db/job-queue-rate and noticed that the new job (TranslationsUpdateJob) might take some time (on avg. 4 min, I also see one that took an hour) to complete before the updates will be visible on the page but we can now be sure that the updates would be from the latest changes.

That doesn't seem to tell how much is queue time and how much is execution time. Also the avg might be misleading and median and 90% percentile could be more interesting.

Also MessageIndexRebuildJobs are prioritized, but these new jobs are not. That should probably be changed to prioritize this new Job type instead.

Change 283841 had a related patch set uploaded (by Glaisher):
Add TranslationsUpdateJob to translate job runner group

https://gerrit.wikimedia.org/r/283841

Change 283841 merged by Giuseppe Lavagetto:
Add TranslationsUpdateJob to translate job runner group

https://gerrit.wikimedia.org/r/283841