Page MenuHomePhabricator

Add project WPCleaner to translatewiki.net
Closed, ResolvedPublic

Description

Project setup checklist

Project information

Name: WPCleaner
Website: https://en.wikipedia.org/wiki/Wikipedia:WPCleaner
Logo:

  • Without text: File:Nuvola web broom.svg (Link)
  • With text:
  • Optional, SVG format recommended. Add filename in commons or translatewiki.net

Project description: WPCleaner is a tool designed to help with various maintenance tasks, especially repairing links to disambiguation pages, checking Wikipedia, fixing spelling and typography, and helping with translation of articles coming from other wikis.

Project page:

Project configuration (for translation admins)

Namespace: NS_MEDIAWIKI
Prefix: wpcleaner-
Validators:

  1. Validator for variables in the format - {0}, {1}
  2. GettextPlural
  3. GettextNewline

Event Timeline

Thanks @abi_ ! Tell me if you want me to do something

Change 605175 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[translatewiki@master] Add support for WPCleaner

https://gerrit.wikimedia.org/r/605175

@NicoV - Few questions,

  1. Please provide commit access to https://github.com/translatewiki user.
  2. Would you like translatewiki to directly push the changes to master or push to a different branch and create a PR to the master branch?
  3. What is the release policy for the project? How often are releases made? Essentially we don't want translations to go waste. Translated content is pushed out from translatewiki twice a week.
  4. After the project is imported to twn, we suggest that you add Message Documentation. It can be added via translatewiki.net. Message documentation added will be exported out to the project as a qqq.pot file.
  5. When committing the translation files, we will be adding the following header,
# This file is distributed under the same license as the wikicleaner package.
#

@abi_ Thanks

  1. I have invited translatewiki user to wpcleaner's repository : is it ok?
  2. Currently, my build script (Ant) generates WikiCleaner.pot file from Java source code, updates the .po files from the .pot file, creates the catalog files for Java from the .po files. Do I have to change anything in this process to work with translatewiki, or will translate wiki integrates itself seamlessly in this process? If there's no risk with using master directly, I'm ok for it.
  3. I release updates to WPCleaner depending on my available time. Currently, I release several updates each week. If updates are pushed for the translations, I can release a new version.
  4. Ok, I will check Message Documentation when the project is imported
  5. Ok for the header
  1. That is usually enough. We will be pushing translations today, so will have confirmation then.
  2. I think that should be fine. From my understanding, no additional build process is needed if when new translations are needed?
  3. Releasing periodically even if there are no feature / bug fixes just for translation updates is recommended. Since new translations maybe available every week, releasing that frequently maybe too much, so once every 2 weeks or even a month should be good.
  1. I think that should be fine. From my understanding, no additional build process is needed if when new translations are needed?

One thing I realized, you may have to update your build process to ignore the qqq.po file that Translatewiki will create.

I've configured the bot to push changes to a branch named translatewiki and then create a PR to the master branch. You can review it, and merge.

abi_ updated the task description. (Show Details)

@NicoV

While adding some language code maps, noticed that these are the languages defined in the build file. Things to note,

  1. Translatewiki might be pushing even more languages to the project (eg: Hindi (hi), Malyalam (ml)) based on how users submit translations. In such case, these languages will have to be added to the build.xml file. We recommend that we can CI check to ensure that all the files present in the translations folder, are added to the build.xml file.
  2. Do you want us to set an export threshold? The export threshold will ensure that languages that don't have a certain % of translations will not be exported.

@abi_

  1. The list of languages in build.xml is needed by the build process to know which .po files to handle (create/update the .po file, generate the Java Messages files). So, yes if you can CI check the list, it will be useful.
  2. The list of languages shown to users is also coded (in EnumLanguage.java) : it's used to avoid showing languages with too few translations. So I don't think the export threshold is necessary, unless you can use it to CI check that languages with enough translations are listed in EnumLanguage.

@abi_

Do you know how to ask for adding a Phabricator project for WPCleaner?
It's probably by creating a task here, but is there any special tags or anything else?

@abi_

Do you know how to ask for adding a Phabricator project for WPCleaner?
It's probably by creating a task here, but is there any special tags or anything else?

I would recommend creating a task like this - https://phabricator.wikimedia.org/T208625 or this https://phabricator.wikimedia.org/T165785

Change 605175 merged by jenkins-bot:
[translatewiki@master] Add support for WPCleaner

https://gerrit.wikimedia.org/r/605175

Change 607035 had a related patch set uploaded (by Abijeet Patro; owner: Abijeet Patro):
[translatewiki@master] WPCleaner: Fix translations folder path

https://gerrit.wikimedia.org/r/607035

Change 607035 merged by jenkins-bot:
[translatewiki@master] WPCleaner: Fix translations folder path

https://gerrit.wikimedia.org/r/607035

Hi @abi_

I merged the pull request that was generated by translatewiki, but apparently a lot of translations disappeared : translations for French (fr.po) were 100% complete, and now 335 missing translations and 1 error.

@abi_

I checked a few of the missing translations : apparently, translatewiki removed them when the msgid in .po file was split in several lines with a first empty line. I'm reverting the merge for the moment. I will also modify my build process to avoid splitting texts on several lines.

Original text

#: org/wikipediacleaner/Version.java:50
#, java-format
msgid ""
"I try to keep {0} up to date with Check Wiki, but if you find any discrepancy"
"(ies) in the detections, please let me know."
msgstr ""
"J’essaye de mettre à jour {0} avec Check Wiki, mais n’hésitez pas à "
"m’informer si vous découvrez des incohérences."

@abi_

I pushed a modification on WPCleaner build process to avoid texts being wrapped on multiple lines, maybe it will avoid the problem with translatewiki.
Do you want try another pull request ?

@NicoV, Thanks for checking this. I noticed the huge amount of changes, and was planning to check them tomorrow. We'll import the new changes from WPCleaner and push out the changes again tomorrow.

I tried to look at the diff to see what is going on, but did not find such examples (yet).

I did notice obsolete translations like this got removed, which is expected:

#~ msgid "&Contains"	
#~ msgstr "Obsahují&cí"

Checked one example you mentioned, and it's still there, just reformatted.

diff --git a/WikipediaCleaner/src/org/wikipediacleaner/translation/cs.po b/WikipediaCleaner/src/org/wikipediacleaner/translation/cs.po
index 7f429463..2dde5412 100644
--- a/WikipediaCleaner/src/org/wikipediacleaner/translation/cs.po
+++ b/WikipediaCleaner/src/org/wikipediacleaner/translation/cs.po
@@ -5058,16 +4777,8 @@ msgstr "Získávám rozcestníkové stránky"

 #: org/wikipediacleaner/gui/swing/worker/UpdateDabWarningWorker.java:149
 #, java-format
-msgid ""
-"An error occurred when updating disambiguation warnings. Do you want to "
-"continue ?\n"
-"\n"
-"Error: {0}"
-msgstr ""
-"Při aktualizaci varování ohledně rozcestníků došlo k chybě. Chcete "
-"pokračovat?\n"
-"\n"
-"Chyba: {0}"
+msgid "An error occurred when updating disambiguation warnings. Do you want to continue ?\n\nError: {0}"
+msgstr "Při aktualizaci varování ohledně rozcestníků došlo k chybě. Chcete pokračovat?\n\nChyba: {0}"

Do note that you have to use "ignore whitespace" option for this commit to see a nice diff. Possibly the line endings changed.

@Nikerabbit

When I look at the pull request from yesterday, I clearly see the problem.

See for example this URL, and search for "I try" : you will see that there's an empty translation in the pull request, while there was one originally

I still don't see any issue. Is this the message you are looking at?

image.png (973×1 px, 203 KB)

@Nikerabbit
Sorry, I sent the wrong link : check in fr.po, not in it.po...

Here's the correct link, I've added a comment on GitHub

PbTranslateWiki.png (664×980 px, 44 KB)

@NicoV

The issue was caused due to a human error. I ran the exports from translatewiki.net before all the existing translations from WPCleaner were imported into translatewiki.net which caused some translations to become empty. Since this was the first import from WPCleaner, we should have waited longer before starting the exports. I don't expect this to happen again in the future.

I've submitted another PR to WPCleaner that has the latest changes from translatewiki.net exported out.

@abi_ and @Nikerabbit

It seems a lot better for the PR, but there's still 1 error, with an extra "}" added to a message making it incorrect :

PbTranslateWiki.png (318×1 px, 30 KB)

I've fixed the .po files to remove the extra "}", and committed the result. Can you check if it's happening again after importing the last version of the code on GitHub?

@NicoV - This appears to be an issue with plural expansion / closing on our end. We will have to make a fix for this. Filed T256227: Incorrect plural expansion / closing for Gettext plurals to track the issue.

I'll also keep this issue open.

@NicoV - The issue with the extra "}" appears to be fixed now. Can you please confirm from your side?

Thanks @abi_

Yes, it seems fixed.

There are a few formatting differences between what I'm generating from my build script, and what translatewiki is generating.
Do you know how to achieve the same format ?
I'm using the GNU gettext tools in my build script, and I haven't been able to find configuration options to achieve the same format as translatewiki.

Differences I have seen :

  • translatewiki adds an extra line with only "" just before the Project-Id-Version line : is it possible to avoid creating this line?
  • strings with a \n in it are split on several lines with my build script, while translatewiki put everything on one line : is it possible to apply the same formatting?
  • translatewiki adds an extra empty line at the end of the file : is it possible to avoid creating this line?

Nico

@NicoV - I would recommend avoiding the generation of the .po files during the build process. Translatewiki.net will output the po files based on the latest pot files.

@NicoV - Thanks. Will leave this open for a while to track any further issues we notice.

Marking this as done for now. @NicoV, please feel free to create an issue if you encounter any problems.