Page MenuHomePhabricator

Implement a stronger synchronization in RepoNG and Translate
Closed, ResolvedPublic

Description

Initial requirement

This is a follow-up to T48833 where I implemented repository-state synchronization.

For fully automated exports, even stronger synchronization is required: we should make sure that source changes are processed in translatewiki before we use a particular revision.

See https://translatewiki.net/wiki/Repository_management#Repository_state_synchronization for detailed description of the issues we have observed.

Current implementation plan

We're creating a group synchronization cache that tracks:

  • incoming changes to messages,
  • failed and passed message updates

Based on this we will display warnings to administrators who can then manually fix failed messages updates.

Eventually based on failure/failure resolution tracking we will stop import/exports of messages from Translatewiki.

Implementation / Documentation

Documentation related to this functionality is available here: https://www.mediawiki.org/wiki/Help:Extension:Translate/Group_management#Strong_synchronization

Things to do

  • Track incoming changes to messages from various groups: addition, modification and deletion
  • Identify incoming message changes that failed to properly update on the wiki
  • Track failed/timed-out messages updates and display them to the administrator
  • Allow administrators to mark failed/timed-out messages updates as "fixed".
  • Run MessageIndexRebuild job once there are no MessageUpdate jobs in the synchronization cache.
  • Warn translation administrators when they try to export message groups while they have:
    • synchronization errors
    • changes in review on Special:ManageMessageGroup
    • synchronization is in progress
  • Halt imports for message groups that have:.
    • synchronization errors
    • synchronization is in progress
  • Do not allow administrators to process changes from Special:ManageMessageGroups incase of unresolved failures.

Other minor to do

  • Timeout should be based on the number of messages to be processed for a group
  • Update translatewiki configuration to remove the --skip-group-sync-check
  • Logging when a group / message is marked as resolved by the administrator

Pending decisions

  • 1. What should happen when a group export does not happen due to sync issues; administrator has to retry again after sometime? They will have to ensure that they check the export logs.
    • Decision: Nothing special to do. It will be automatically re-attempted next time.
  • 2. What should happen if a group import does not happen due to sync issues; these are automatically run. Should we increase frequency of how often the import runs?
    • Decision: We've decided to block imports and the current frequency of every two hours is enough. People are not looking into it even that often necessarily.
  • 3. Exports and imports should not run simultaneously. This will be outside the scope of the group synchronization cache but something may want, to achieve "strong synchronization".
    • Decision: Yes, this needs to be done.
  • 4. Should exports be stopped if messages are waiting to be processed on Special:ManageMessageGroups? I would say that it should. We will have to check specifically for MediaiWiki / non-MediaiWiki exports.
    • Decision: Yes, this needs to be done.

Update log

  1. 20-01-2021: Changes for this patch caused a production error: T272428: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist
  2. 02-02-2021: https://phabricator.wikimedia.org/T182433#6797412
  3. 26-04-2021: https://phabricator.wikimedia.org/T182433#7033993
  4. 13-05-2021: https://phabricator.wikimedia.org/T182433#7084380
  5. 21-06-2021: https://phabricator.wikimedia.org/T182433#7165680
  6. 12-07-2021: https://phabricator.wikimedia.org/T182433#7204410

Patches

The list of Gerrit patches submitted for this task (including subtasks) can be found here: https://gerrit.wikimedia.org/r/q/topic:%22strong-sync%22+(status:open%20OR%20status:merged)

Outcome

We added safety measures that detects and prevent corruption of translation data during translation imports and exports at translatewiki.net. This allows us to fully automate translation exports (which was done in a separate task).

Details

SubjectRepoBranchLines +/-
translatewikimaster+3 -3
translatewikimaster+44 -10
mediawiki/extensions/Translatemaster+20 -1
mediawiki/extensions/Translatemaster+130 -8
mediawiki/extensions/Translatemaster+15 -0
mediawiki/extensions/Translatemaster+1 -1
translatewikimaster+1 -0
mediawiki/extensions/Translatemaster+10 -1
mediawiki/extensions/Translatemaster+41 -3
mediawiki/extensions/Translatemaster+18 -4
mediawiki/extensions/Translatemaster+26 -0
translatewikimaster+3 -3
mediawiki/extensions/Translatemaster+72 -15
mediawiki/extensions/Translatemaster+3 -3
mediawiki/extensions/Translatemaster+12 -0
translatewikimaster+2 -0
mediawiki/extensions/Translatemaster+6 -0
mediawiki/extensions/Translatemaster+48 -3
mediawiki/extensions/Translatewmf/1.36.0-wmf.27+48 -3
mediawiki/extensions/Translatemaster+112 -13
mediawiki/extensions/Translatemaster+2 -1
mediawiki/extensions/Translatemaster+3 -1
mediawiki/extensions/Translatemaster+9 -6
translatewikimaster+29 -0
mediawiki/extensions/Translatemaster+89 -0
mediawiki/extensions/Translatemaster+71 -0
mediawiki/extensions/Translatemaster+103 -0
mediawiki/extensions/Translatemaster+1 -3
mediawiki/extensions/Translatemaster+25 -2
mediawiki/extensions/Translatemaster+414 -418
mediawiki/extensions/Translatemaster+665 -0
mediawiki/extensions/Translatemaster+366 -1
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 606424 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Use sync cache in Special:ManageMessageGroups and MessageUpdateJobs

https://gerrit.wikimedia.org/r/606424

Change 635280 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Remove running of MessageIndex rebuild once groups are synced

https://gerrit.wikimedia.org/r/635280

Change 638137 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Add script to query the group synchronization cache

https://gerrit.wikimedia.org/r/638137

Change 646677 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Display groups in sync on ManageMessageGroups

https://gerrit.wikimedia.org/r/646677

Change 647007 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Add script to clear the group synchronization cache

https://gerrit.wikimedia.org/r/647007

Change 647195 merged by jenkins-bot:
[translatewiki@master] puppet: Add periodic run of completeExternalTranslation

https://gerrit.wikimedia.org/r/647195

Change 656873 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Strong sync: Fix issue with new messages not being removed

https://gerrit.wikimedia.org/r/656873

Change 656878 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Strong sync: Add remaining message count in completeExternalTranslation

https://gerrit.wikimedia.org/r/656878

Change 656892 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Strong sync: Fix incorrect queueing of MessageIndexRebuildJob

https://gerrit.wikimedia.org/r/656892

Change 656873 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Strong sync: Fix issue with new messages not being removed

https://gerrit.wikimedia.org/r/656873

Change 656878 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Strong sync: Add remaining message count in completeExternalTranslation

https://gerrit.wikimedia.org/r/656878

Change 656892 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Strong sync: Fix incorrect queueing of MessageIndexRebuildJob

https://gerrit.wikimedia.org/r/656892

Change 657229 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657229

Change 657229 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657229

Change 657306 had a related patch set uploaded (by Nikerabbit; owner: Abijeet Patro):
[mediawiki/extensions/Translate@wmf/1.36.0-wmf.27] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657306

Change 657290 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[translatewiki@master] Enable group synchronization flag

https://gerrit.wikimedia.org/r/657290

Change 657294 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] MessageUpdateJob: Check GroupSyncCache only if its FileBasedMessageGroup

https://gerrit.wikimedia.org/r/657294

Change 657306 merged by jenkins-bot:
[mediawiki/extensions/Translate@wmf/1.36.0-wmf.27] Add flag to toggle the usage of the group synchronization cache

https://gerrit.wikimedia.org/r/657306

Mentioned in SAL (#wikimedia-operations) [2021-01-20T13:20:45Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.27/extensions/Translate/: 20decbd5cc3de0af655b9419cf69fc442ab056a4: Add flag to toggle the usage of the group synchronization cache (T272428; T182433) (duration: 01m 10s)

Change 657294 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] MessageUpdateJob: Check GroupSyncCache only for FileBasedMessageGroup

https://gerrit.wikimedia.org/r/657294

Change 657290 merged by jenkins-bot:
[translatewiki@master] Enable group synchronization flag

https://gerrit.wikimedia.org/r/657290

Change 658791 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Add messages to interim cache when running safe imports

https://gerrit.wikimedia.org/r/658791

Change 658791 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Add messages to interim cache when running safe imports

https://gerrit.wikimedia.org/r/658791

Here's an update on what's done as of now:

  • We've added a group synchronization cache that tracks message updates - addition, modifications, renames and deletions.
  • We've added scripts that allow administrators to see what messages and message groups are being processed currently.
  • On Special:ManageMessageGroups we are displaying the message groups that are currently in processing.
  • A script has been put in place that runs periodically and identifies MessageUpdate job that are stuck or have timed out.

No decisions (such as blocking exports or imports) are made based on this tracking.

This has been deployed on Translatewiki for about 2 weeks. During this time we've identified issues and deployed fixes for them. The system appears to be reliable now, and we're ready to implement the next set of steps.

Change 676968 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] ProcessMessageChanges: Add flag to skip import on group sync error

https://gerrit.wikimedia.org/r/676968

Change 677274 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] ExportTranslationsMaintenanceScript: Add flag to skip group export

https://gerrit.wikimedia.org/r/677274

Another update on what's completed:

  1. On Special:ManageMessageGroups, we are now displaying groups and messages that are currently in processing.
  2. On Special:ManageMessageGroups, translator admins can now mark group / messages as resolved once it is determined why they failed synchronization.

Change 684954 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] SpecialManageGroups: Skip processing groups that have errors

https://gerrit.wikimedia.org/r/684954

Change 684956 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[translatewiki@master] Import/Export: Add skip-group-sync-check flag

https://gerrit.wikimedia.org/r/684956

Change 685398 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] Trigger MessageIndexRebuild after group sync is complete

https://gerrit.wikimedia.org/r/685398

Change 685553 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] GroupSynchronizationCache: Add in review state for groups

https://gerrit.wikimedia.org/r/685553

Change 676968 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] ProcessMessageChanges: Add flag to skip group sync check during import

https://gerrit.wikimedia.org/r/676968

Change 677274 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] ExportTranslationsMaintenanceScript: Add flag to skip group sync check

https://gerrit.wikimedia.org/r/677274

Change 684954 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] SpecialManageGroups: Skip processing groups that have errors

https://gerrit.wikimedia.org/r/684954

Change 685398 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] Trigger MessageIndexRebuild after group sync is complete

https://gerrit.wikimedia.org/r/685398

Change 685553 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] GroupSynchronizationCache: Add In Review state for groups

https://gerrit.wikimedia.org/r/685553

Change 684956 merged by jenkins-bot:

[translatewiki@master] Import/Export: Add skip-group-sync-check flag

https://gerrit.wikimedia.org/r/684956

Another update:

  1. export.php will no longer export groups that have synchronization errors, synchronization in progress or if they have changes to process on Special:ManageMessageGroup.
  2. processMessageChanges.php will no longer process groups that have synchronization errors, or have synchronization in progress
  3. MessageIndexRebuild job will now be triggered after group synchronization is complete.

Change 694406 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] GroupSync: Display languages names instead of the content

https://gerrit.wikimedia.org/r/694406

Change 694483 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] SpecialManageGroups: Use DeferredUpdates when adding jobs to queue

https://gerrit.wikimedia.org/r/694483

Change 694483 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] SpecialManageGroups: Use DeferredUpdates when adding jobs to queue

https://gerrit.wikimedia.org/r/694483

Change 695209 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] GroupSync: Add log when a group is marked as in sync

https://gerrit.wikimedia.org/r/695209

Change 695237 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[translatewiki@master] Add group_sync to list of log_files

https://gerrit.wikimedia.org/r/695237

Change 695237 merged by jenkins-bot:

[translatewiki@master] Add group_sync to list of log_files

https://gerrit.wikimedia.org/r/695237

Change 696489 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] GroupSyncCache: Increase group expiry when MessageUpdateJob completes

https://gerrit.wikimedia.org/r/696489

Change 694406 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] GroupSync: Display languages names instead of the content

https://gerrit.wikimedia.org/r/694406

Change 695209 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] GroupSync: Add log when a group is marked as in sync

https://gerrit.wikimedia.org/r/695209

Change 696489 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] GroupSync: Increase group expiry when MessageUpdateJob completes

https://gerrit.wikimedia.org/r/696489

Change 699841 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[translatewiki@master] Remove flag --skip-group-sync-check on imports / exports

https://gerrit.wikimedia.org/r/699841

Change 699929 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[translatewiki@master] Implement GlobalSyncLock

https://gerrit.wikimedia.org/r/699929

Regarding timeout, we've added code to increase the group expiry time when a MessageUpdateJob if the group is about to expire. The initial timeout is still set to 40 minutes. Related patch.

Change 700612 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] GroupSync: Add logs when a group or message is resolved

https://gerrit.wikimedia.org/r/700612

Change 700612 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] GroupSync: Add logs when a group or message is resolved

https://gerrit.wikimedia.org/r/700612

Change 699929 merged by jenkins-bot:

[translatewiki@master] Implement GlobalSyncLock

https://gerrit.wikimedia.org/r/699929

Change 699841 merged by jenkins-bot:

[translatewiki@master] Remove flag --skip-group-sync-check on imports / exports

https://gerrit.wikimedia.org/r/699841

We added a global synchronization lock to ensure that export / import scripts cannot be run at the same time. All of this is now deployed on Translatewiki.net.

Leaving this open for a few more days to monitor and track any issues.

Haven't noticed any issues during the past week. Marking this as done.