Page MenuHomePhabricator

Remove irrelevant sections from source article for translation
Closed, ResolvedPublic

Description

Pages with project specific content need not be presented as section to translate. For example a section like below in the top of an article is not relevant in the context of translating that article

Another example:

CX1 had a configuration file listing the selectors for this kind of removable sections. See https://github.com/wikimedia/mediawiki-extensions-ContentTranslation/blob/master/modules/source/conf/common.json#L2

In CX2, I propose to do this removal of irrelevant sections at cxserver while parsing the page content. This makes the client code less crowded. Also remove the need of serving the configuration file to every client.

Event Timeline

santhosh created this task.Mar 21 2018, 9:33 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 21 2018, 9:33 AM
santhosh triaged this task as Normal priority.Mar 21 2018, 9:34 AM
santhosh moved this task from Needs Triage to CX2 on the ContentTranslation board.
Arrbee assigned this task to santhosh.Mar 27 2018, 7:08 AM
Arrbee moved this task from Backlog to In Progress on the Language-2018-Jan-Mar board.

Change 421868 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Remove irrelavant sections from parsed page

https://gerrit.wikimedia.org/r/421868

Nikerabbit removed a project: Patch-For-Review.

One easy method to find articles with timelines is to search insource:/\<timeline\>/. I tested on en:World Heritage Site.

Change 424574 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] CX2: Do not fetch configuration json

https://gerrit.wikimedia.org/r/424574

santhosh moved this task from QA to In Review on the Language-2018-Apr-June board.Apr 6 2018, 11:59 AM

Change 424574 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] CX2: Do not fetch configuration json

https://gerrit.wikimedia.org/r/424574

Arrbee moved this task from In Review to QA on the Language-2018-Apr-June board.Apr 10 2018, 7:04 AM

I compared cx-testing and cx2-testing.
In cx-testing the project specific content is not present at all:

In cx2-testing - the abbreviated templates are displayed and can be "translated":


@santhosh - the current behavior in cx2-testing is correct? Or such templates are a subject to "removal of irrelevant sections at cxserver while parsing the page content"?

As somewhat extreme example - the article 'Routes in Train Simulator 2012' is currently nominated for deletion. Shouldn't a translator be warned on the translation page about it?
cx2-testing (displays the template)

cx-testing does not provide any information that the article is nominated for deletion:

In general template handling is a big work we have not addressed in CX2.

In this ticket, all the content that is not part of the article, but about the article(meta content ) is removed from page. Since they don't need to be translated.

In cx2-testing - the abbreviated templates are displayed and can be "translated":

In quick look, it seems these things should be removed. Need to check them. Since removing a particular meta content is just based on a json configuration now, I guess it would be nice to address them in new tickets, and add the entries to the json.

As somewhat extreme example - the article 'Routes in Train Simulator 2012' is currently nominated for deletion. Shouldn't a translator be warned on the translation page about it?

This is an important thing we can consider. But it would be a new feature in CX2. cc @Pginer-WMF

As somewhat extreme example - the article 'Routes in Train Simulator 2012' is currently nominated for deletion. Shouldn't a translator be warned on the translation page about it?

This is an important thing we can consider. But it would be a new feature in CX2. cc @Pginer-WMF

Makes sense. This can help users to avoid wasting time and propagating problematic content. I created a separate ticket for this (T192530).

Change 427630 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Remove Indicator elements from the parsed source article

https://gerrit.wikimedia.org/r/427630

Change 427642 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Add support for removing sections by transclusion name

https://gerrit.wikimedia.org/r/427642

Change 427630 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Remove Indicator elements from the parsed source article

https://gerrit.wikimedia.org/r/427630

Change 427642 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Add support for removing sections by transclusion name

https://gerrit.wikimedia.org/r/427642

Arrbee moved this task from In Review to QA on the Language-2018-Apr-June board.May 2 2018, 7:03 AM
Etonkovidova closed this task as Resolved.May 9 2018, 11:57 PM

Checked in cx2-testing - article's sections such as "Additional citation" (e.g. {{Refimprove|date=November 2016}} in "Astrology and the classical elements") and others are not present in the source article on CX page. A specific cases - an article was nominated for deletion - will be addressed in T192530: CX2: Warn the user when translating an article that is nominated for deletion.