Page MenuHomePhabricator

MT error while translating the big reflist section
Closed, ResolvedPublic

Description

When translating Hugo Kołłątaj from English to Japanese, the References section is failing for Yandex and HTML translating MT engines.
The error reported is the content is too large to translate.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 31 2018, 11:17 AM

The root cause was the reflist contains template style as well. It is really unnecessary to machine translate the section if it is only a block level template.

santhosh triaged this task as Normal priority.Oct 31 2018, 11:18 AM

Change 470807 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not pass sections with block template to MT engine

https://gerrit.wikimedia.org/r/470807

This other ticket may be relevant since it affects the same area: T208471: CX2: Translating Reference list displays oo-ui-icon-puzzle

Change 470807 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not pass sections with block template to MT engine

https://gerrit.wikimedia.org/r/470807

Nikerabbit added a subscriber: Nikerabbit.

Assuming no further patches were planned, so moving to QA.

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-operations) [2018-11-06T04:42:16Z] <kartik@deploy1001> Started deploy [cxserver/deploy@ddb0031]: Update cxserver to 17f9a10 (T144467, T198699, T208386)

Mentioned in SAL (#wikimedia-operations) [2018-11-06T04:47:42Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@ddb0031]: Update cxserver to 17f9a10 (T144467, T198699, T208386) (duration: 05m 26s)

Petar.petkovic closed this task as Resolved.Nov 15 2018, 11:43 AM
Petar.petkovic edited projects, added CX-cxserver; removed ContentTranslation.
Petar.petkovic moved this task from Backlog to Parsing and annotation on the CX-cxserver board.
Petar.petkovic reopened this task as Open.Nov 15 2018, 12:10 PM
Petar.petkovic added a subscriber: Petar.petkovic.

I have resolved the ticket because the case of Hugo Kołłątaj from English to Japanese is no longer affected by this bug.

However, when translating Reince Priebus from English to Dutch, cxserver responds with status code 500 and error "Source text too long: en-nl". One observation is that the condition to avoid translating block templates added in 470807 is not satisfied.

Taking another look at this, and seems that only reason the case of translating references section of Hugo Kołłątaj from English to Serbian, with usage of Yandex, isn't getting 500 response code from cxserver for too long source text, is that HTML string is compacted from 56.223 characters to 8.752, which is below 10.000 limit.

In case of translating references section of Reince Priebus from English to Serbian, we do get "Source text too long" error. Compacting from 208.173 characters to 42.327 is not enough.

@santhosh, in neither case is condition to check for block template, added in patch 470807, satisfied in order to avoid passing that section to MT engine.

As mentioned in (T216583#4987715), one possibility for this kind of compound content is to split it in their smaller units behind the scenes and send multiple smaller requests.

There is also the more general question of whether references should be translated at all. T197688 explores options to let user decide, but we still need to better understand which is the general expectation.

Change 495653 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Avoid sending reflists to MT engines

https://gerrit.wikimedia.org/r/495653

@santhosh, in neither case is condition to check for block template, added in patch 470807, satisfied in order to avoid passing that section to MT engine.

I enhanced that logic to include block level reference lists as well.

Change 495653 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Avoid sending reflists to MT engines

https://gerrit.wikimedia.org/r/495653

Mentioned in SAL (#wikimedia-operations) [2019-03-14T07:18:50Z] <kartik@deploy1001> Started deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386)

Mentioned in SAL (#wikimedia-operations) [2019-03-14T07:22:40Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386) (duration: 03m 50s)