Page MenuHomePhabricator

CX2: Hidden categories are adapted and displayed
Open, NormalPublic

Description

Content Translation server returns array of objects representing adapted categories. Each object has source title, true/false adapted value and target title, if there is any.

Hidden categories are returned and some categories are returned multiple times. There is no info if given category is hidden, that can only be known after additional API call. Here you can see "Category:Pages containing links to subscription-only content" displayed three times. That category is hidden, but returned to client and has multiple instances.

CX1 adapts categories on client side, where ?action=query&prop=categories&clshow=!hidden is used to fetch categories. For "Painted turtle" page, from en to hr, four more categories are displayed:

CX1CX2

Event Timeline

Restricted Application added a project: ContentTranslation. · View Herald TranscriptMar 20 2018, 6:07 PM
Pginer-WMF triaged this task as Normal priority.Mar 21 2018, 7:25 AM

I think the key question is to determine whether tracking categories such as CS1 German-language sources in the example above are useful to be adapted in case they exist.

In favour of showing and adapting them:

  • These categories are visible when editing the article with the regular editors.
  • If they exist in different languages, the classification work done by the source community can be reused by the target one.

Against showing and adapting them:

  • While regular categories are about the topic and don't depend on how much content the user adds to the translation. Administrative ones present in the source (e.g., lacks references) content my not match the produced result (e.g., the user added additional references).
  • Communities may work in different ways, and even if a template exists the criteria for applying it may differ.
  • Some hidden categories seem to be added automatically based on templates in the article, thus we can assume the right ones for the target wiki will be added once the article is published.

Given the above, and unless anyone has additional information, I'm more inclined for hidden categories not to be sown or adapted, in order to keep things simple.

Once we stop parsing some sections as outline in T190254: Remove irrelevant sections from source article for translation many of these categories will stop appearing. We already avoid categories that are defined as part of templates(See https://github.com/wikimedia/mediawiki-services-cxserver/blob/master/lib/lineardoc/Builder.js#L52)

A lot of categories are introduced from Citations too. I think if we stop parsing reflist in pages, we can avoid a large chunk of them too. So better wait for T190254: Remove irrelevant sections from source article for translation before planning any development activities here.

Change 427644 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not gather categories under transclusions

https://gerrit.wikimedia.org/r/427644

Change 427644 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not gather categories under transclusions

https://gerrit.wikimedia.org/r/427644

@Pginer: additional things to consider on showing Hidden categories

Look, for example, at https://en.wikipedia.org/wiki/Campbell,_California. The Hidden categories provide valuable points on article's structural elements and potential problems.

@santhosh Currently in cx2-testing categories are not shown at all - I checked your example(and other articles) from the task "Painted turtle" translated from en to hr:

vs cx-testing

vs enwiki (production):

@Pginer: additional things to consider on showing Hidden categories

There are pros and cons on encouraging to propagate hidden categories. Some concerns are that there is no guarantee that the translated article will keep the aspects that made the source worth the hidden category, and the differences in processes/criteria among different wikis.

Look, for example, at https://en.wikipedia.org/wiki/Campbell,_California. The Hidden categories provide valuable points on article's structural elements and potential problems.

One of theses hidden categories is "Featured article", if such category exists on the target wiki we would be propagating it for the translation by default. This would be problematic if the translation is created as a shorter version of the article or the target wiki has more strict criteria on what is a "feature article" or requires an explicit process of a approval. Asking users to reassess all this, when they may not be familiar with the processes of both wikis seems prone to errors.

In conclusion, I think that visible categories are about the topic and it is safe to transfer them across languages. However, hidden categories are more related to processes of each wiki, where it is ok to reevaluate in each case.

Etonkovidova closed this task as Resolved.May 10 2018, 12:31 AM

Thanks, @Pginer-WMF for clarification.

Re-checked in cx2-testing- only visible categories are present.

This comment seems to suggest a regression. When translating w:en:Die Streuner -> w:ru:Die Streuner the category "Articles with hCards" was added to the translation.
We may want to verify and consider creating a follow-up ticket to fix the issue.

Pginer-WMF reopened this task as Open.Jun 14 2018, 10:21 AM
Pginer-WMF moved this task from Done to Backlog on the Language-2018-Apr-June board.

As described above (T190203#4278479) this seems not to be solved for some cases. We need to investigate which are those and iterate the solution.

Interestingly, that for "Painted turtle" not all hidden categories are displayed in cx2. In the original article - there are 21 categories and 6 hidden. cx2 shows 25 categories, so two hidden categories are excluded. cx-testing will correctly display 21 categories.