Page MenuHomePhabricator

Automatic renaming by Fuzzybot broken
Closed, ResolvedPublicBUG REPORT

Description

The automatic page move of translations by Fuzzybot due to a renamed message key is broken.

From IRC channel today (times are UTC+2):

[13:58]	rakkaus	(25 lines skipped) ","exception_id":"a320d953de64724b1221415f","exception_url":"/wiki/Special:ReplaceText","caught_by":"mwe_handler"} []
[14:00]	rakkaus	(5 lines skipped) ","exception_id":"242d7dce69bd40da5fd8b59a","exception_url":"/srv/mediawiki/targets/production/maintenance/runJobs.php","caught_by":"mwe_handler"} []
[14:02]	rakkaus	(1 lines skipped) [14-May-2020 12:01:47 UTC] PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 262144 bytes) in Unknown on line 0

Outcome

Translatewiki.net translation administrators can again use the message key rename support of Special:ManageMessageGroups instead of having to spend more time to perform those renames manually.

Event Timeline

Change 596475 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[translatewiki@master] Increase max memory for the job queue to 2.5G

https://gerrit.wikimedia.org/r/596475

Change 596475 merged by jenkins-bot:
[translatewiki@master] Increase max memory for the job queue to 2.5G

https://gerrit.wikimedia.org/r/596475

abi_ triaged this task as High priority.

While we identify what's causing the spike in memory usage, we've increased the memory limit to 2.5G. Will keep monitoring.

Looks like this is fixed. The last mass rename was performed well by FuzzyBot, see i.e. this export

Another one failed today.

Change 601448 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] [WIP] Store interim cache when processing incoming messages

https://gerrit.wikimedia.org/r/601448

Change 601597 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] [WIP] Run MessageIndexRebuild job after all MessageUpdateJob are done

https://gerrit.wikimedia.org/r/601597

Failed again. By FuzzyBot and ..

(1 lines skipped) [04-Jun-2020 13:10:03 UTC] PHP Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 262144 bytes) in Unknown on line 0

... as follow up action by me via Special:ReplaceText:

[15:13]	rakkaus	(1 lines skipped) [04-Jun-2020 13:13:19 UTC] PHP Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 32768 bytes) in /srv/mediawiki/tags/2020-06-03_14:26:35/vendor/monolog/monolog/src/Monolog/Utils.php on line 1

Change 604554 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Synchronization: Add class to track messages / groups in sync

https://gerrit.wikimedia.org/r/604554

Change 606424 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] [WIP] Use the group synchronization cache

https://gerrit.wikimedia.org/r/606424

Change 601448 abandoned by Abijeet Patro:
Store interim cache when processing incoming messages

Reason:
In favor of Ic48428bc0c320195701bbfe8a7acbd5317210d36

https://gerrit.wikimedia.org/r/601448

Change 601597 abandoned by Abijeet Patro:
Run MessageIndexRebuild job after all MessageUpdateJob are done

Reason:
This approach was using the delay feature of job queues that is not supported by all JobQueue backends.

https://gerrit.wikimedia.org/r/601597

We've added a group synchronization cache, which has the follow cache keys,

  1. Messages => Individual titles of running MessageUpdateJob, removed by the MessageUpdateJob when it finishes.
  2. GroupId => Array of messages titles that are in progress.
  3. GroupsInSync => Single key, with all the groups currently in sync.
  4. Sync time start for each group.

A script has been added that checks if there any MessageUpdateJob's are running, and if none are running, we start the MessageIndexRebuild job. The script also checks if a MessageUpdateJob has been running for over a certain period of time, we mark it as timed out.

Future improvements

  • Track what messages failed and inform administrators.
  • Update export scripts to not export messages if MessageUpdateJob is still running.

Possible issues

  • We've noticed that the update.php seems to be clearing out the Database cache; if that is the case, we will have to rethink this approach.

Change 604554 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Synchronization: Add class to track messages / groups in sync

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/ /604554

Moving this back to Quarterly backlog as we are currently not working on this. We will take a look at this once we've had a chance to take a look at T182433: Implement a stronger synchronization in RepoNG and Translate.

Change 635280 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[mediawiki/extensions/Translate@master] Remove running of MessageIndex rebuild once groups are synced

https://gerrit.wikimedia.org/r/635280

I did a lot of debugging on a translatewiki.net production canary and I found there is an infinite loop. For clarity as we discuss strong synchronization here as well, I filed it as T268840: Infinite loop in DeferredUpdates::tryOpportunisticExecute.

Assigning to Niklas since he is primarily working on this.

Nikerabbit changed the subtype of this task from "Task" to "Bug Report".Dec 16 2020, 3:22 PM
Nikerabbit edited projects, added affects-translatewiki.net; removed translatewiki.net.

There has been a couple of small message key renames after the fix, but not a large one. Leaving open for some more time to see if a such rename appears and to finish the clean-ups.

We had multiple big renames today and they did not fail.