When translating Hugo Kołłątaj from English to Japanese, the References section is failing for Yandex and HTML translating MT engines.
The error reported is the content is too large to translate.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/services/cxserver | master | +1 K -16 | Avoid sending reflists to MT engines | |
mediawiki/services/cxserver | master | +748 -1 | Do not pass sections with block template to MT engine |
Related Objects
- Mentioned In
- T212577: Simplify default MT config
T203160: CX2: Highlight (and skip) references with a template that could not be adapted
T198699: Monitoring of MT services
T144467: Security review for Google MT for Content Translation
T206756: CX2: References by name disappear and produce missing reference errors when published - Mentioned Here
- rGCXSb16f4a101646: Avoid sending reflists to MT engines
T212577: Simplify default MT config
T197688: CX2: Control whether references are translated or not
T216583: [wmf.18] Large table cannot be translated - 'Automatic translation failed' is displayed.
rGCXS17f9a107f597: Do not pass sections with block template to MT engine
T144467: Security review for Google MT for Content Translation
T198699: Monitoring of MT services
T208471: CX2: Translating Reference list displays oo-ui-icon-puzzle
Event Timeline
The root cause was the reflist contains template style as well. It is really unnecessary to machine translate the section if it is only a block level template.
Change 470807 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Do not pass sections with block template to MT engine
This other ticket may be relevant since it affects the same area: T208471: CX2: Translating Reference list displays oo-ui-icon-puzzle
Change 470807 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Do not pass sections with block template to MT engine
Mentioned in SAL (#wikimedia-operations) [2018-11-06T04:42:16Z] <kartik@deploy1001> Started deploy [cxserver/deploy@ddb0031]: Update cxserver to 17f9a10 (T144467, T198699, T208386)
Mentioned in SAL (#wikimedia-operations) [2018-11-06T04:47:42Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@ddb0031]: Update cxserver to 17f9a10 (T144467, T198699, T208386) (duration: 05m 26s)
I have resolved the ticket because the case of Hugo Kołłątaj from English to Japanese is no longer affected by this bug.
However, when translating Reince Priebus from English to Dutch, cxserver responds with status code 500 and error "Source text too long: en-nl". One observation is that the condition to avoid translating block templates added in 470807 is not satisfied.
Taking another look at this, and seems that only reason the case of translating references section of Hugo Kołłątaj from English to Serbian, with usage of Yandex, isn't getting 500 response code from cxserver for too long source text, is that HTML string is compacted from 56.223 characters to 8.752, which is below 10.000 limit.
In case of translating references section of Reince Priebus from English to Serbian, we do get "Source text too long" error. Compacting from 208.173 characters to 42.327 is not enough.
@santhosh, in neither case is condition to check for block template, added in patch 470807, satisfied in order to avoid passing that section to MT engine.
As mentioned in (T216583#4987715), one possibility for this kind of compound content is to split it in their smaller units behind the scenes and send multiple smaller requests.
There is also the more general question of whether references should be translated at all. T197688 explores options to let user decide, but we still need to better understand which is the general expectation.
Change 495653 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Avoid sending reflists to MT engines
@santhosh, in neither case is condition to check for block template, added in patch 470807, satisfied in order to avoid passing that section to MT engine.
I enhanced that logic to include block level reference lists as well.
Change 495653 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Avoid sending reflists to MT engines
Mentioned in SAL (#wikimedia-operations) [2019-03-14T07:18:50Z] <kartik@deploy1001> Started deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386)
Mentioned in SAL (#wikimedia-operations) [2019-03-14T07:22:40Z] <kartik@deploy1001> Finished deploy [cxserver/deploy@3ba57a5]: Update cxserver to b16f4a1 (T212577, T208386) (duration: 03m 50s)