Page MenuHomePhabricator

User-provided categories added to the translation are ignored when published
Closed, ResolvedPublic

Description

Content translation allows users to adjust the categories for the translated article. Users can add their own categories, but a regression is causing the user provided categories to be ignored when publishing. This was reported in this comment

In order to prevent future regressions, it would be great to apply some automatic testing.

Note: This task is about an issue that exists only in development (and maybe testing) environments but it doesn’t affect production websites. The issue is that when translating an article, user-defined categories are not saved properly, they are totally ignored.

Event Timeline

In order to prevent future regressions, it would be great to apply some automatic testing.

it would be great indeed but I wouldn't expect any progress in that field (UI automated tests anyway, maybe it's possible to implement unit tests) any time soon.

I don't remember testing any feature having anything to do with categories, so would be nice to find out if:

  1. this hasn't been working properly since I joined the foundation
  2. some how this was broken by a parallel unrelated patch

This way we can try to avoid it to happen in the future

In order to prevent future regressions, it would be great to apply some automatic testing.

I don't remember testing any feature having anything to do with categories, so would be nice to find out if:

  1. this hasn't been working properly since I joined the foundation

In case it helps to figure this out:

I tried to reproduce this issue but without any luck. That means I was able to add, save and publish new categories successfully for a translated article. I'm posting two screenshots that I took during and after the translation. As you can see, while initially there was only six categories added automatically, I was able to manually add another two (newly created) categories, that can be seen at the bottom of the second screenshot, in red text.

Screenshot_2020-05-13 Μετάφραση σελίδας - Βικιπαίδεια.png (1×1 px, 376 KB)
Screenshot_2020-05-13 Στράτος Βουλγαρόπουλος - Βικιπαίδεια.png (915×1 px, 193 KB)

In our production set up for wikipedia, I see that categories are added without any problem, but there is an issue with single wiki setups like the ones we use for development and testing(ContentTranslationTranslateInTarget=false). The category validation we do in ApiContentTranslationPublish::getCategories is strict about the namespace in target language. That will fail if we have ContentTranslationTranslateInTarget=false and will skip all categories while publishing.

For example, if target language is 'ml' and local or test wiki language is 'en', the namesapce text 'വർഗ്ഗം' in ml will not be passing the validation.

This can may hide many category related bugs in our usual testing and development workflow.

Ideally the catogory prefix in the category passed to publish api should be target language. Currently we are checking if it matches with namespace alias for current wiki language. Language.php has utilities to get namespace alias for a given language.

As an alternative, the English names are always available, so the UI could just use those canonical names. But this only works if categories are passed separately, not in the wikitext (where we do want to use the localised names).

Note: This task is about an issue that exists only in development (and maybe testing) environments but it doesn’t affect production websites. The issue is that when translating an article, user-defined categories are not saved properly, they are totally ignored.

Hm... My issue that is marked as duplicate affects production websites (T248275 Adding categories doesn't work for content translation). T248275 occurs when translating form en.wiki to pl.wiki.

We fixed those issues in April. If you can still reproduce issue with new translations, let us know.

Ideally the catogory prefix in the category passed to publish api should be target language. Currently we are checking if it matches with namespace alias for current wiki language. Language.php has utilities to get namespace alias for a given language.

To elaborate this:

  1. The browser should send categories without namespace to the publish api. examples: [Fruits, Animals] instead of [Category:Fruit, Category:Animal]
  2. The publish API should make sure to remove the namespaces from list of categories irrespective of Step1. Example Category:Fruit->Fruit
  3. The publish API should find the category namespace for target language and construct category titles. For example, if target language is es, this will be Categoria:Fruit and pass along with wikitext to publish.
  4. To be tested in a wiki where the wiki language is not same as target language of translation.

Currently, publish API is receiving categories like this: [Categoria: Fruit, Categoria: Animal]. Even if we implement steps 1, 2, that won't fix current issue, as the issue is that namespace index for prefix (in this case Categoria) is being checked against current wiki language (while prefix is given in target language). So I think we should fix this before fixing steps 1, 2.

Change 604753 had a related patch set uploaded (by Nik Gkountas; owner: Nik Gkountas):
[mediawiki/extensions/ContentTranslation@master] CX categories: Publish and store without prefix

https://gerrit.wikimedia.org/r/604753

Change 604753 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] CX categories: Send to backend without prefix but store with prefix

https://gerrit.wikimedia.org/r/604753