Page MenuHomePhabricator

Remove irrelevant sections from source article for translation
Closed, ResolvedPublic

Description

Pages with project specific content need not be presented as section to translate. For example a section like below in the top of an article is not relevant in the context of translating that article

image.png (122×1 px, 24 KB)

Another example:

image.png (448×315 px, 34 KB)

CX1 had a configuration file listing the selectors for this kind of removable sections. See https://github.com/wikimedia/mediawiki-extensions-ContentTranslation/blob/master/modules/source/conf/common.json#L2

In CX2, I propose to do this removal of irrelevant sections at cxserver while parsing the page content. This makes the client code less crowded. Also remove the need of serving the configuration file to every client.

Event Timeline

santhosh triaged this task as Medium priority.Mar 21 2018, 9:34 AM
santhosh moved this task from Needs Triage to CX2 on the ContentTranslation board.

Change 421868 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Remove irrelavant sections from parsed page

https://gerrit.wikimedia.org/r/421868

One easy method to find articles with timelines is to search insource:/\<timeline\>/. I tested on en:World Heritage Site.

Change 424574 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] CX2: Do not fetch configuration json

https://gerrit.wikimedia.org/r/424574

Change 424574 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] CX2: Do not fetch configuration json

https://gerrit.wikimedia.org/r/424574

I compared cx-testing and cx2-testing.
In cx-testing the project specific content is not present at all:

Screen Shot 2018-04-11 at 9.46.01 PM.png (477×915 px, 187 KB)

In cx2-testing - the abbreviated templates are displayed and can be "translated":

Screen Shot 2018-04-11 at 9.46.42 PM.png (513×926 px, 134 KB)

@santhosh - the current behavior in cx2-testing is correct? Or such templates are a subject to "removal of irrelevant sections at cxserver while parsing the page content"?

As somewhat extreme example - the article 'Routes in Train Simulator 2012' is currently nominated for deletion. Shouldn't a translator be warned on the translation page about it?
cx2-testing (displays the template)

Screen Shot 2018-04-11 at 9.56.08 PM.png (606×947 px, 77 KB)

cx-testing does not provide any information that the article is nominated for deletion:

Screen Shot 2018-04-11 at 9.55.52 PM.png (331×908 px, 53 KB)

In general template handling is a big work we have not addressed in CX2.

In this ticket, all the content that is not part of the article, but about the article(meta content ) is removed from page. Since they don't need to be translated.

In cx2-testing - the abbreviated templates are displayed and can be "translated":

In quick look, it seems these things should be removed. Need to check them. Since removing a particular meta content is just based on a json configuration now, I guess it would be nice to address them in new tickets, and add the entries to the json.

As somewhat extreme example - the article 'Routes in Train Simulator 2012' is currently nominated for deletion. Shouldn't a translator be warned on the translation page about it?

This is an important thing we can consider. But it would be a new feature in CX2. cc @Pginer-WMF

As somewhat extreme example - the article 'Routes in Train Simulator 2012' is currently nominated for deletion. Shouldn't a translator be warned on the translation page about it?

This is an important thing we can consider. But it would be a new feature in CX2. cc @Pginer-WMF

Makes sense. This can help users to avoid wasting time and propagating problematic content. I created a separate ticket for this (T192530).

Change 427630 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Remove Indicator elements from the parsed source article

https://gerrit.wikimedia.org/r/427630

Change 427642 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Add support for removing sections by transclusion name

https://gerrit.wikimedia.org/r/427642

Change 427630 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Remove Indicator elements from the parsed source article

https://gerrit.wikimedia.org/r/427630

Change 427642 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Add support for removing sections by transclusion name

https://gerrit.wikimedia.org/r/427642

Checked in cx2-testing - article's sections such as "Additional citation" (e.g. {{Refimprove|date=November 2016}} in "Astrology and the classical elements") and others are not present in the source article on CX page. A specific cases - an article was nominated for deletion - will be addressed in T192530: CX2: Warn the user when translating an article that is nominated for deletion.