Page MenuHomePhabricator

RefreshTranslatablePage script created empty translation pages on MediaWiki.org
Closed, ResolvedPublic4 Estimated Story PointsBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Result of running refresh-translatable.php script on MediaWiki.org

What happens?:

A bunch of empty category pages got created. See: https://www.mediawiki.org/wiki/Topic:Xuxrbsvzqnm6vff8

What should have happened instead?:

Empty translation pages for categories should not have been created.

Software version (skip for WMF-hosted wikis like Wikipedia):
MediaWiki 1.42
Translate: master branch

Other information (browser name/version, screenshots, etc.):
Looking at the message group stats for the page, I see that even though some languages have no translations, they appear to be 50% translated. See:
https://www.mediawiki.org/w/index.php?title=Special:MessageGroupStats&group=page-Category%3AAPIAfterExecute+extensions#sortable:3=desc

image.png (1×2 px, 320 KB)

Event Timeline

This happened on Commons as well, for example https://commons.wikimedia.org/wiki/Template:Cdw/he has been created by FuzzyBot as 50% translated (which corresponds to the fact that half of its translation units – two and three – were deleted by FuzzyBot, the other half having been deleted by an admin a few minutes earlier). On which wikis has this script been run?

This happened on Commons as well, for example https://commons.wikimedia.org/wiki/Template:Cdw/he has been created by FuzzyBot as 50% translated (which corresponds to the fact that half of its translation units – two and three – were deleted by FuzzyBot, the other half having been deleted by an admin a few minutes earlier). On which wikis has this script been run?

Pretty much all wikis where Translate extension is enabled other than Meta-Wiki. Full list here: https://phabricator.wikimedia.org/T299308#9333468

Change 982094 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] RenderTranslationPageJob: Delete translation page without translation

https://gerrit.wikimedia.org/r/982094

Nikerabbit set the point value for this task to 4.

Change 982094 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] RenderTranslationPageJob: Delete translation page without translation

https://gerrit.wikimedia.org/r/982094

Change 983195 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] RefreshTranslationPage: Delete empty translation page edited by Fuzzy bot

https://gerrit.wikimedia.org/r/983195

Change 983195 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] RefreshTranslationPage: Delete empty translation page edited only by FuzzyBot

https://gerrit.wikimedia.org/r/983195

Change 984199 had a related patch set uploaded (by Abijeet Patro; author: Abijeet Patro):

[mediawiki/extensions/Translate@master] RefreshTranslatablePage: Render translation page even if no translations exists

https://gerrit.wikimedia.org/r/984199

Nikerabbit changed the task status from Open to In Progress.Jan 8 2024, 7:55 AM

Change 984199 merged by jenkins-bot:

[mediawiki/extensions/Translate@master] RefreshTranslatablePage: Render translation page even if no translations exists

https://gerrit.wikimedia.org/r/984199

I'm currently running this script on commons to test the latest fixes:

mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki commonswiki --jobqueue
Queued 30736 refresh job(s) for 2168 translatable pages.

Some quick observations:

  • I see NonPrioritizedRenderTranslationPageJob jobs being run. The job backlog time for these is around 10 minutes.
  • I see RenderTranslationPageJob jobs backlog time also increase to about 10 minutes

Currently the latest deleted page is https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Earth_in_Albania_%26_Kosovo_2023/%C3%87mimet/sq – the Albanian-language subpage of an Albanian-language translation page… It looks like we didn’t consider them; please stop the script ASAP before a bunch of /en subpages of templates are deleted, breaking {{Autotranslate}}.

Currently the latest deleted page is https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Earth_in_Albania_%26_Kosovo_2023/%C3%87mimet/sq – the Albanian-language subpage of an Albanian-language translation page… It looks like we didn’t consider them; please stop the script ASAP before a bunch of /en subpages of templates are deleted, breaking {{Autotranslate}}.

The script has finished running, and all the jobs created by it have also run. Following pages were removed:

  1. https://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Earth_in_Albania_%26_Kosovo_2023/%C3%87mimet/sq&action=edit&redlink=1
  2. https://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Earth_in_Albania_%26_Kosovo_2023/Pyetje_t%C3%AB_shpeshta/sq&action=edit&redlink=1
  3. https://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Earth_in_Albania_%26_Kosovo_2023/Organizator%C3%ABt/sq&action=edit&redlink=1
  4. https://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Earth_2022_in_Belarus/be-tarask&action=edit&redlink=1
  5. https://commons.wikimedia.org/w/index.php?title=Template:Belgian_franc_banknote/i18n/bbc-latn&action=edit&redlink=1
  6. https://commons.wikimedia.org/w/index.php?title=Template:Featured_picture_candidates_overview/i18n/egl&action=edit&redlink=1
  7. https://commons.wikimedia.org/w/index.php?title=Help:Gadget-GlobalUsageUI/haw&action=edit&redlink=1
  8. https://commons.wikimedia.org/w/index.php?title=Help:MyUploads/haw&action=edit&redlink=1
  9. https://commons.wikimedia.org/w/index.php?title=Commons:Licensing/Justifications/haw&action=edit&redlink=1
  10. https://commons.wikimedia.org/w/index.php?title=Commons:PD_files/aa&action=edit&redlink=1
  11. https://commons.wikimedia.org/w/index.php?title=Commons:What_Commons_is_not/en-gb&action=edit&redlink=1
  12. https://commons.wikimedia.org/w/index.php?title=User:Wikimedia_Commons_Welcome/mcp&action=edit&redlink=1
  13. https://commons.wikimedia.org/w/index.php?title=Template:Cdw/he&action=edit&redlink=1
  14. https://commons.wikimedia.org/w/index.php?title=Help:Mpeg2dv.sh/haw&action=edit&redlink=1
  15. https://commons.wikimedia.org/w/index.php?title=Help:Contents/haw&action=edit&redlink=1

Currently the latest deleted page is https://commons.wikimedia.org/wiki/Commons:Wiki_Loves_Earth_in_Albania_%26_Kosovo_2023/%C3%87mimet/sq – the Albanian-language subpage of an Albanian-language translation page… It looks like we didn’t consider them; please stop the script ASAP before a bunch of /en subpages of templates are deleted, breaking {{Autotranslate}}.

Reason: This page was first marked for translation, then the source language was changed. It should be marked for translation again. If that is done, no translation units for source language are created, and hence stats say nothing "translated", and deletion logic kicks on.

See https://commons.wikimedia.org/w/index.php?title=Special:Log&page=Commons%3AWiki+Loves+Earth+in+Albania+%26+Kosovo+2023%2F%C3%87mimet

Closest task we have to avoid this is T70030: On changing page language, translated pages should be moved automatically. Maybe we should also prevent changing the language without unlinking the page first.

Reason: This page was first marked for translation, then the source language was changed. It should be marked for translation again. If that is done, no translation units for source language are created, and hence stats say nothing "translated", and deletion logic kicks on.

I see, thanks for debugging! It’s not that concerning then, although maybe the template namespace should be checked for such errors before running the script (especially on Meta, where non-English source languages happen every now and then), as deleting the source “translation” of a template can cause issues.

On Commons, the first four pages listed by @abi_ should simply be re-marked for translation, right? I may do it, but I don’t do it right now, in case they’re useful as test cases. (They are about contests that are over, so probably nobody wants to translate them anymore.)

On Commons, the first four pages listed by @abi_ should simply be re-marked for translation, right?

I assume that would fix them but I haven't verified it.

I ran the updated refresh-translatable-pages.php script on the following wikis:

25th Jan, 2024

  1. commonswiki
  2. advisorswiki
  3. amwikimedia
  4. azwikimedia
  5. bewikimedia
  6. betawikiversity
  7. brwikimedia
  8. bdwikimedia
  9. cawikimedia
  10. collabwiki
  11. testcommonswiki
  12. foundationwiki
  13. frwiktionary
  14. gewikimedia
  15. grwikimedia
  16. hiwikimedia
  17. idwikimedia
  18. legalteamwiki
  19. maiwikimedia
  20. nowikimedia
  21. otrs_wikiwiki
  22. plwikimedia
  23. ptwikisource
  24. punjabiwikimedia
  25. ruwikimedia
  26. sourceswiki
  27. specieswiki
  28. sewikimedia
  29. testwiki
  30. testwikidatawiki
  31. uawikimedia
  32. vewikimedia
  33. wbwikimedia
  34. wikimania2012wiki
  35. wikimania2013wiki
  36. wikimania2014wiki
  37. wikimania2015wiki
  38. wikimania2016wiki
  39. wikimania2017wiki
  40. wikimania2018wiki
  41. wikimaniawiki
  42. wikifunctionswiki
  43. mediawikiwiki
  44. outreachwiki
  45. metawiki
  46. wikidatawiki

I'll update this comment as I run the script on more wikis.

Started running the refresh-translatable-page.php script on mediawiki.org.

Queued 83276 refresh job(s) for 6484 translatable pages.

See logs here: https://www.mediawiki.org/wiki/Special:Log?type=delete&user=FuzzyBot&page=&wpdate=2024-01-24&tagfilter=&subtype=delete&wpFormIdentifier=logeventslist; All the entries that have: Page no longer has any translations as summary are the ones deleted via this script run.

Dashboard tracking the status of the jobs: https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=codfw%20prometheus%2Fk8s&var-job=RenderTranslationPageJob&from=now-3h&to=now&refresh=1m (Shows information about RenderTranslationPageJob and NonPrioritizedRenderTranslationPageJob)

Also tracking the effect of our changes made for T353229: Avoid blocking the prioritized job queue when running refresh-translatable-page.php script

Started running the refresh-translatable-page.php script on metawiki.org:

Queued 163234 refresh job(s) for 10595 translatable pages.

Completed running the refresh-translatable-page.php across all wikis. Leaving this open to identify any issues identified.

We can re-open if we notice something or react to new bugs as they are reported.