Page MenuHomePhabricator

CX2: Article considered as a whole translation unit, ignoring paragraphs and failing translation
Closed, ResolvedPublic

Description

When translating Trifles (play) from English to Russian, Content translation is not able to divide it into translation units based on paragraphs, as reported in this comment. Hovering the the first paragraph to add it to the translation shows the whole document highlighted:

When clicking on the paragraph to add it, the "Automatic translation failed!" message is shown, as a fallback the whole source article is copied over to the translation.

Inspecting the source article, the only apparent particularity is the presence of About and Italic title templates:

{{About|the play|the dessert|Trifle}}
{{Italic title}}
'''''Trifles''''' is a [[one-act play]]...

Further investigation may be needed to confirm the cause.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 13 2019, 2:59 PM
Pginer-WMF triaged this task as Normal priority.Feb 25 2019, 8:11 AM

The same issue happened when translating ...explosante-fixe... from English to French (while trying to evaluate T174271)

Pginer-WMF raised the priority of this task from Normal to High.Mar 1 2019, 2:03 PM
santhosh claimed this task.Mar 6 2019, 11:59 AM

Change 494701 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Remove 'Italic title' template from article content

https://gerrit.wikimedia.org/r/494701

@santhosh identified that template Italic title from English Wikipedia is causing section wrapping to break. The proposed solution is to add "Italic title" to the list of templates removed from article content. However, that means that template name is used to find the template in wikitext, which means it does not work for many other templates on other wikis.

The config file lists which templates to remove from the content. For example, templates "Wikipedia articles (for|with|needing|containing)" are removed, "Featured article" template is removed, "Short description" template is removed, etc. We clearly see how this is English Wikipedia-centric and doesn't work for other wikis.

The list does include a line Esborrany # Stub category in es.wikipedia , which seems intended for Spanish Wikipedia, but that template seems non-existent.

On gerrit, I wrote:

We should strive to find more maintainable and scalable solution or decide to solve the problems templates like this one create.

and @santhosh replied to these issues being pointed out with:

Yeah, T197859 and global templates...
I don't have any quick solutions for this. Also I don't know if there is an automatic way to determine a template is relevant for translation or not.
Agree that the current way is far from optimal. But I think for this ticket, this is the best I can do as quick solution.

The list does include a line Esborrany # Stub category in es.wikipedia , which seems intended for Spanish Wikipedia, but that template seems non-existent.

"esborrany" is the Catalan word for "draft". So I guess the category is from Catalan Wikipedia. Maybe the issue was caused by such category leaking into Spanish causing the confusion in the annotation.

On gerrit, I wrote:

We should strive to find more maintainable and scalable solution or decide to solve the problems templates like this one create.

and @santhosh replied to these issues being pointed out with:

Yeah, T197859 and global templates...
I don't have any quick solutions for this. Also I don't know if there is an automatic way to determine a template is relevant for translation or not.
Agree that the current way is far from optimal. But I think for this ticket, this is the best I can do as quick solution.

Even if a better solution is not feasible for the short time, it would be good to document what exactly made the Italic title template special. Maybe it is possible to detect the effects of such template and correct them. that could compensate the lack of a mechanism to skip equivalent templates across languages.

Change 494701 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Remove 'Italic title' template from article content

https://gerrit.wikimedia.org/r/494701

Mentioned in SAL (#wikimedia-operations) [2019-03-11T03:26:50Z] <kartik@deploy1001> Started deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878)

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMar 11 2019, 3:26 AM

Mentioned in SAL (#wikimedia-operations) [2019-03-11T03:30:51Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) (duration: 04m 01s)

Etonkovidova closed this task as Resolved.Mar 16 2019, 12:53 AM
Etonkovidova added a subscriber: Etonkovidova.

Checked "Trifles (play)", "...explosante-fixe..." and some other articles with {{Italic title}}. Also I checked for DISPLAYTITLE - all such articles seem to be divided into sections without any problems.